Denisova microcephalin status

I’m still doing quick mining of the Denisova sequence for obvious things. One of the simplest is the polymorphism in microcephalin (MCPH1) that Evans and colleagues Evans:2006 suggested may represent introgression from an archaic population.

The polymorphism, entered in dbSNP as rs930557, is a single nucleotide mutation that changes the ancestral aspartate to a derived histidine. The derived allele, called the “D” allele, is linked to a fairly long haplotype, which Evans and colleagues Evans:2005 attributed to recent positive selection during the past 35,000 years.

The evidence for possible introgression is the unusual paucity of recombination in the period before this putative selection commenced. I wrote about this in 2006 (“Introgression and microcephalin FAQ”). The polymorphism looks very old, by virtue of the high density of linked mutations around it at the sequence level. They estimated the tree root at around 1.8 million years ago. This is not too extreme compared to other loci (between twice and three times the average and well within the expected tail) but the suppression of recombination does seem unusual. Balancing selection on a region where recombination was physically difficult, such as a chromosomal inversion, would be one possible evolutionary history that could give rise to this pattern, but there is no sign of such a feature here. The other candidate is ancient, strong population structure. That was the interpretation favored by Evans and colleagues Evans:2006, and one that still seems likely.

Evans and coworkers suggested Neandertals as a possible population from which the derived allele had originated. This seemed likely on the basis of its widespread geographic distribution outside Africa. But the relevant nucleotide of MCPH1 is now known from at least two Neandertals, neither of which show the derived allele. Martina Lari and colleagues Lari:2010 showed that a skull fragment from Mezzena rock shelter, Italy, had the ancestral MCPH1 allele. The Neandertal genome data published by Green and colleagues last year Green:draft:2010 also shows the ancestral allele here. As Lari and colleagues noted, this doesn’t prove that no Neandertals carried the allele, but leaves us wanting any positive evidence of it.

The Denisovans might seem an unlikely source for the derived MCPH1 variant, because the genetic contribution of this population to most living Eurasians was at most slight. But as we pointed out in a 2006 paper two-s, even a small amount of gene flow would be enough to transfer an adaptive variant into the later human population. Once it gets in, an adaptive allele grows in numbers because of its selective advantage – no large amount of admixture is necessary. So it would be very difficult to rule out any ancient population as the origin of the allele, without genotyping the ancient bones.

For the Denisova genome, that’s where we have come. The public data release includes three sequence reads across rs930557, all of which include the ancestral (G) nucleotide. That’s not complete evidence about the individual’s genotype nor does it exclude the presence of the derived allele in the population. But there it is, for what it’s worth.

After the population model presented by Reich and colleagues Reich:Denisova:2010, I think it may be time to revisit the topic of genetic exchanges and deep-rooted genealogies. Green and colleagues Green:draft:2010 actually applied a test for Neandertal regions based on a comparison of genealogical tree depth in African versus non-African genetic samples. The implicit assumption was that deep-rooted trees are more likely to reflect ancient population structure. That remains true, and yet there are many loci like MCPH1 that have very deep roots without yet any clear sign of their presence in the Neandertals or Denisovans. Some of those trees have deep roots inside Africa, and may reflect ancient African population structure. For others, I don’t know. We’re working to extend the sample of deep roots beyond the dozen reported by Green and colleagues, to a much broader cross-section of (shorter) chromosomal intervals.

Maybe there were yet other ancient populations that remain unsampled, contributing to the genealogical depth of some gene loci outside Africa. One of our current challenges is the disconnect between the genome data and fossil and archaeological comparisons. We’re working to apply some archaeologically-informed models of population structure to genomic variation from living humans, to find the hidden traces of ancient population structure. As I noted (“The Denisova genome FAQ”), the signs of interbreeding with Denisovans were apparent in the existing samples from Papua New Guinea, even before the ancient genome was available. Smaller fractions of intermixture will be harder to find, but we now know what to look for, and we’ll soon have much larger samples to work with.

Clearly we have a lot of work ahead of us. An average four percent contribution of some archaic human population to living people implies that substantially more than four percent of loci will be affected by such interbreeding on one person or another. The fraction of affected loci could be as large as 100% (all Neandertal genes persisting in somebody living today). To the extent that it is actually smaller, this fraction will provide substantial information about population history. To do this comparison well, we’ll need much larger samples of genomes of living humans – substantially beyond the 1000 Genomes Project.