African population structure and Neandertal population mixture

Green and colleagues, in their paper describing the Neandertal genome sequence, concluded that some genetic mixture between Neandertals and the contemporary African population must explain the genetic diversity of today's non-African populations.

But they briefly discussed one other possibility -- that African population structure by itself might lend the appearance of Neandertal admixture. I've now seen a number of people raising this possibility in press accounts of the findings.

I find it surprising that this hypothesis made it into the final paper, because it's so easy to refute.

The hypothesis is that Neandertals diverged from an already-regionally-diversified African population some 300,000 years ago, with no subsequent interbreeding.

If that's true then there should be no Neandertal genes that coalesce with human genes more recently than 300,000 years ago -- a bit over 4.6 percent the time of the average coalescence of human and chimpanzee genes.

I'll assume throughout a human-chimpanzee genetic coalescence of 6.5 million years, the same number used in the paper by Green and colleagues, though the true number may vary from this estimate to some extent. We know which differences between humans and chimpanzees are human-derived, by comparison with other primate genomes. So we can refer to the proportion of the human-derived substitutions, compared to chimpanzees, that are either present or absent in Neandertals. This gives a way of estimating the genetic similarity of Neandertals with recent humans, without having to count the many false-positive changes in the Neandertal sequence.

Figure 3 in the paper reports the genetic difference between 100-kb windows in the Neandertal genome versus the human reference sequence, as a proportion of the human-derived substitution number within each window. The Neandertal sequences have a high number of cases with complete sequence identity with the human reference, considering only those human-chimpanzee SNPs. Indeed, in the low percentage categories, the Neandertals are closer to the human reference sequence than the San individual.

This category of high-gene-identity windows is presumably due to the fact that the (European ancestry) human reference genome includes Neandertal-derived genes. In other words, it's another consequence of population mixture.

But let's examine the alternative hypothesis, that humans within Africa already had established substantial regional population structure by 300,000 years ago, when the Neandertals diverged from northeast Africans. In that scenario, the Neandertal genes should never have diverged from any recent non-Africans less than 300,000 years ago. Again, this is approximately 4.6 percent of the average coalescence time of human and chimpanzee genes.

Now, look at figure 3:

Figure 3 from Green et al. 2010

The modal difference between the human reference and the Neandertals is around 12 percent of the human-chimpanzee genetic difference, which would correspond to a coalescence time of human and Neandertal genes around 780,000 years. Not much lower than the estimate in the paper of 825,000 years -- this difference is due to the shape of the distribution, as the mode is smaller than the mean.

Now, what we care about is those very low categories, where the human-Neandertal difference is less than 2 percent of the total time to the human-chimpanzee common ancestor. How likely is a 100-kb interval to have this pattern, if the true coalescence time must be more than 300,000 years ago?

Well, this isn't a formal analysis, but the back-of-the-envelope answer is obvious -- it's very unlikely. The average 100-kb interval has more than 500 substitutions on the human lineage. In a given 300,000-year period, there should be more than 22.5 of them. The probability of observing fewer than 10 in such an interval is 0.0026. The probability of observing zero is 1.7 times 10-10.

So seeing the results in figure 3 just aren't credible under the hypothesis of ancient African population divergence and no Neandertal-human mixture after 300,000 years. Not unless there are some problems hiding in the data that would interfere with these comparisons. Those bins with little difference between the Neandertals and the human reference genome really have to be explained by coalescent times much younger than 300,000 years -- and those can only be there by population mixture.