A new study of genetic introgression and human ancestry

Fed up on hobbit news? Well, I'm going to do my best this week to scoop the science journalists, covering stories in paleoanthropology that ought to get some more attention but might be drowned out by otherwise hobbitrocious stories.

I'll start with a story in which I have a special interest -- a new paper by Jeff Wall, Kirk Lohmueller, and Vincent Plagnol, titled, "Detecting ancient admixture and estimating demographic parameters in multiple human populations."

A couple of years ago, Wall and Plagnol (2006) looked at a sample of genes in the "Environmental Genome Project. At that time, the sample consisted of 135 genes in 12 Yoruba and 22 CEPH individuals. It's not a large sample by today's 3.9-million genotype standards. But the EGP sample has one important thing going for it -- with resequencing data, we have access to a much larger number of mutational differences at very small map distances from each other. Tight linkage between sites means that we can use the genealogical properties of samples to examine much more ancient events. The HapMap gives us a vast number of genotypes from a large sample of individuals, but the density of loci is quite low -- an average of nearly 1000 base pairs between loci. The EGP doesn't sample as many loci, but it gives a denser representation of the variation at each locus. Only this kind of sample is sufficient to test for genetic ancestry of modern human populations in ancient populations of the Middle Pleistocene.

Plagnol and Wall applied a simple admixture model to these data, and found that the complete out-of-Africa replacement model did not adequately explain the variation in the European-derived sample. Instead, they found that a model with 5 percent admixture of some non-African Middle Pleistocene ancestral population was a much better fit for the current diversity of European gene trees. In other words, the low variation of recent humans cannot be explained by a small population in a single ancient population; instead, there must have been several populations, partly isolated from each other, one or more of which gave regionally-specific alleles to modern Europeans. Multiregional evolution fits those observations very well -- this is not one or two introgressive genes, and there is no specific evidence of selection on them (although selection may be responsible).

A number of people picked up on that study in the course of later work. Gregory Cochran and I discussed it in our own 2006 paper about genetic introgression. In late 2005, Dan Garrigan and colleagues had published their own analysis of a pseudogene region on the X chromosome, called RRM2P4. Garrigan reviewed this work together with Mike Hammer (2006) and again with Sarah Kingan (2007). Early last year, I also reviewed the evidence together with Cochran, Henry Harpending and Bruce Lahn (2008).

We and many other people are following up on this research, trying to discover the ancestry of human populations beyond the simple out-of-Africa replacement scenario. In the new study, Wall and colleagues extend their analysis to a more recent release of the EGP, including 222 genes, and adding 24 Chinese individuals to the 12 Yoruba and 22 CEPH individuals. It's a simple paper and relatively short. In a word, they find that their data reject the simple out-of-Africa replacement scenario, and that the genetic variation of coding genes in their sample must be explained in part by long-standing population structure.

It's not proof that the Neandertals, or any other particular group of ancient humans, survived and passed their genes on to more recent people. This is a study of the genes of recent human populations, and it merely concludes that their ancestors could not have lived in a single small population. Maybe every Neandertal became extinct, and present-day Europeans got this genetic variation from somewhere else. But it is logical to figure that non-Afircan populations may have been among the contributors to present non-African peoples -- particularly since the statistical test focuses on region-specific gene frequencies. The study also finds evidence that today's African population has a complex ancestry -- a kind of multiregional scenario playing out inside Africa (or potentially involving gene flow back into Africa from elsewhere).

Testing for admixture

Wall and colleagues reasoned that an allele coming in from an ancient, partially isolated human population would vary in a distinctive pattern. Because of the long history of partial isolation in an ancient subpopulation, they expected that such an allele would come in with multiple mutational differences from the non-introgressive allele. And if it came in from some non-African population, it ought to show relatively strong differences in frequency between populations. So they devised a statistic, mathematically combining FST and a linkage measure -- the idea being to detect alleles that differentiate populations and that are surrounded by large sets of tightly linked polymorphisms.

This kind of pattern might also occur under positive selection. But a new mutation under positive selection would start out weakly linked to nearby polymorphisms, each of which already exists at some substantial frequency in the population. An introgressive allele might be linked to several other unique mutations that happened during the long period of limited gene flow between ancient populations. And a new mutation would not tend to be surrounded by high FST polymorphisms, until it got to be very common in the population -- up above 50 percent. In contrast, an introgressive allele coming into the population with several nearby mutations would generate a cluster of relatively high FST polymorphisms even at low frequencies. It may not be a perfect test for any individual locus -- there's a lot of uncertainty. But applied to more than 200 loci, it should be possible to test the hypothesis that "archaic admixture" is zero.

Wall and colleagues do test that hypothesis, and they are able to refute it strongly for each of the three groups. Living European and Chinese samples refute the out-of-Africa replacement model with p<0.01. The Yoruba sample refutes the hypothesis of panmixia in ancient Africans at p<0.0000001.

The authors also provide a supplementary table with a list of genes that may be candidates for introgression. I didn't see any really obvious genes on the list, but each of them bears some examination. I expect that we will be able to use more detailed analytical techniques to look at the regions around these genes and see what is going on. Or at least, in the next couple of years more and more resequencing data will become available, allowing us to test the same hypotheses with larger samples.

It's worth pointing out that nothing in the approach of Wall and colleagues implies that any of the putative introgression occurred under natural selection. I've argued that introgression may have occurred under selection in ancient humans, but so far few other people have looked at the question with the idea of ancient selection in mind. No doubt we can improve a bit on the methods in the paper if we are willing to make some assumptions about the evolutionary dynamics involved in Late Pleistocene populations.

Lingering uncertainty

So what's not to like about this study? After all, here we have what appears to be strong evidence against an exclusive out-of-Africa replacement. It suggests that the ancestry of recent Europeans and Asians owes something to the Middle Pleistocene populations of those regions, and gives an estimate of that contribution consistent with what we know so far about the Neandertal genome.

But I have to approach this study as critically as I would any other piece of population genetics. In this case, there is a clear weakness to their model. The authors tested for significance of a single parameter, which they call "archaic admixture." Consider their Figure 1, a schematic of their population model:

Population model schematic from Wall et al. 2009

Is "archaic admixture" significantly different than zero? Well, you can see that must depend on the values of no less than six other parameters. When did the European population start growing significantly -- was it after the Last Glacial Maximum? During the Neolithic? The Aurignacian? How about the African population? Was there really a long bottleneck in the ancestry of Europeans?

The reason why I'm so critical of population models used in genetics is simple. The authors of studies almost never try to make the simplest effort to justify these kinds of parameters against the archaeological or fossil record. Their conclusions -- in this case, the significant finding of ancient admixture -- depend on some range of values for these other parameters.

Now, Wall and colleagues take a fundamentally different approach than I would use. I would draw upon non-genetic sources of information about these parameters, to increase confidence about the others. In contrast, they performed a broader range of simulations, attempting to find maximum likelihood estimates for all the parameters simultaneously.

The problem with that approach is that it's hard to say that some other parameters may not have been more important. Consider recent positive selection. As I mentioned above, a recent positively selected mutation could in principle create a pattern like that described for an introgressive allele -- at least under the statistics used in this paper. The chances are low for any randomly chosen mutation under positive selection, because a new positively selected mutation isn't likely to be linked to other rare mutations -- it's much more likely to be linked to common polymorphisms. But if we actually have many hundreds, or even thousands, of recently selected alleles (as we do in humans), then there is a pretty good chance that some of them will look like introgression under the test used here. Another scenario that could mimic introgression under this statistical approach is long-standing balancing selection.

There are probably too many genes on these lists for all of them to reflect selective balances or recent positive selection -- there are a lot of recently selected genes, but few of them will have the specific kind of linkage that would show up as significant in this study. But I think the authors could do more to validate the demographic model against non-genetic evidence. Besides that, there is plenty of morphological evidence for gene flow among these ancient human populations. The authors would be well-served to work more directly with the morphological record of human evolution -- when they write that:

To our knowledge, the question of ancient admixture in other parts of the world has been relatively neglected by the evolutionary genetics community

it is both true and sad. There is abundant anatomical evidence addressing the issue of genetic continuity or gene flow in parts of the world other than Europe.

UPDATE (2009-05-08): Dienekes also looks at the paper, and suggests that finding evidence for ancient population structure in Europe and East Asia may be no big deal, because it may simply derive from population structure within Africa before the putative out-of-Africa migration. I'd have to review the data to be sure, but it seems to me there are two arguments against that explanation:

  1. The East Asian and European comparisons come up with different genes showing evidence of putative introgression. There's not a lot of overlap between the sets. If this were merely ancient East African genes, we'd expect the populations outside Africa to have the same ones. And the numbers had actually been cut down by the serial founder effect scenario (Chinese having undergone more and larger bottlenecks), then we'd expect China to have a subset of the European introgressive genes. I wouldn't go out on a limb about this without looking at the actual frequencies of the supposed ancient alleles, but the pattern isn't consistent with Europe and China being drawn randomly from the same ancient African population.

  2. The entire point of the out-of-Africa replacement idea is to draw humans from an unstructured ancient population. Humans have to be inbred to explain the low genetic variation today. A long bottleneck in Africa is one explanation for this inbreeding -- but the bottleneck has to have been severe, down to an effective size around 10,000, and it has to be very long. A long history of population structure within Africa works against that bottleneck -- population structure featuring several partially isolated populations would prevent the kind of inbreeding that a long bottleneck could create. If Wall and colleagues are correct, we would have to scrap the long bottleneck idea and come up with some other explanation for high inbreeding. There are some others, as I've pointed out before.

There are other arguments against exclusive continuity outside Africa, and in favor of some significant -- perhaps overwhelming -- gene flow from Africa into the rest of the world during the late Pleistocene. But no other argument is exclusive of some continuity outside Africa. And if we don't need the bottleneck anymore, accepting some continuity is the reasonable explanation for the facts that don't fit, including the observations in this paper and the morphological and archaeological evidence suggesting continuity.


Evans PD, Mekel-Bobrov N, Vallender EJ, Hudson RR, Lahn BT. 2006. Evidence that the adaptive allele of the brain size gene microcephalin introgressed into Homo sapiens from an archaic Homo lineage. Proc Nat Acad Sci doi:10.1073/pnas.0606966103

Garrigan D, Kingan SB. 2006. Archaic human admixture: A view from the genome. Curr Anthropol 48:895-902. doi:10.1086/523014

Garrigan, D., Mobasher, Z., Severson, T., Wilder, J. A., Hammer, M. F. 2005b. Evidence for archaic Asian ancestry on the human X chromosome. Mol. Biol. Evol. 22:189-192. doi:10.1093/molbev/msi013

Hardy, J., Pittman, A., Myers, A., Gwinn-Hardy, K., Fung, H. C., de Silva, R., Hutton, M. and Duckworth, J. 2005. Evidence suggesting that Homo neanderthalensis contributed the H2 MAPT haplotype to Homo sapiens. Biochemical Society Transactions 33:582-585.

Hawks J, Cochran G. 2006. Dynamics of adaptive introgression from archaic to modern humans. PaleoAnthropology 2006:101-115. Open access

Hawks J, Cochran G, Harpending HC, Lahn BT. 2007. A genetic legacy from archaic Homo. Trends Genet doi:10.1016/j.tig.2007.10.003

Plagnol, V., Wall, J. D. 2006. Possible ancestral structure in human populations. PLoS Genet. 2:e105. doi:10.1371/journal.pgen.0020105

Wall JD, Lohmueller KE, Plagnol V. 2009. Detecting ancient admixture and estimating demographic parameters in multiple human populations. Mol Biol Evol (early online) doi:10.1093/molbev/msp096

Zietkiewicz, E., Yotova, V., Gehl, D., Wambach, T., Arrieta, I., Batzer, M., Cole, D. E., Hechtman, P., Kaplan, F., Modiano, D., Moisan, J. P., Michalski, R., Labuda, D. 2003. Haplotypes in the dystrophin DNA segment point to a mosaic origin of modern human diversity. Am. J. hum. Genet. 73:994-1015.