Which population in the 1000 Genomes Project samples has the most Neandertal similarity?

Last December I began writing about an analysis of introgression in the 1000 Genomes Project samples ("Neandertal introgression, 1000 Genomes style"). I left everybody in a bit of suspense, partly because my writing computer was unexpectedly replaced before winter vacation, and partly because of my extensive travel in January.

I'm catching up this week before I go to Ann Arbor, Michigan next week for a talk and visit with many friends. It's a good time to give readers some status updates on the analyses because the release of the high-coverage Denisova genome today will allow us to do some very deep checks on some of the comparisons we've carried out.

Picking up where I left off, in the last post I emphasized that the individual genomes represented in the 1000 Genomes Project samples in Europe and East Asia have a surplus of derived SNP alleles that they share with the Vindija Vi33.16 genome. That surplus compared to genomes in the African population samples represents the evidence for Neandertal ancestry in those populations.

Comparison of shared Neandertal derived variants in African, Chinese and European samples

Admixed populations, including African-Americans and Puerto Ricans, shared Neandertal derived SNP alleles in the fraction expected for their African and non-African fractions of ancestry.

Comparison of shared Neandertal derived variants in ASW, YRI and CEU samples

As I also pointed out, the population samples in Europe and East Asia are not identical in the number of these shared derived variants. The difference between individuals can be caused by differences in the fraction of their genealogy that traces to Neandertals. The difference may also be caused by other aspects of the individuals' genealogy, if for example some aspect of population history has led to discrepancies in the fraction of ancient variations these people share with a Neandertal genome by incomplete lineage sorting.

Here is the comparison of East Asian samples (Japanese, Han Chinese in Beijing, and Han Chinese originating in South China) and European samples (Tuscans, British, Finn and CEU samples, along with a handful of Spanish):

Comparison of shared Neandertal derived variants in East Asian and European 1000 Genomes Project samples

The Europeans average a bit more Neandertal than Asians. The within-population differences between individuals are large, and constitute noise as far as our comparisons between populations are concerned. At present, we can take as a hypothesis that Europeans have more Neandertal ancestry than Asians. If this is true, we can further guess that Europeans may have mixed with Neandertals as they moved into Europe, constituting a second process of population mixture beyond that shared by European and Asian ancestors.

As we look more closely at the particular gene regions shared between each individual and the Neandertal, we will be able to consider the approximate time that they shared an ancestor for each gene region. That will allow us to distinguish incomplete lineage sorting (ILS) from introgression, although the two will overlap to some extent. We will rely on that test to examine hypotheses about the time and place of population mixture.

The difference between Europeans and Asians when we lump all the samples together is not as interesting as the differences we can see among the samples within each of those regions. For example, here are British people compared to Tuscans:

Comparison of shared Neandertal derived variants in British and Tuscan samples

The Tuscans have the highest level of Neandertal similarity of any of the 1000 Genomes Project samples. They have around a half-percent more Neandertal similarity than Brits or Finns in these samples. The CEU sample is slightly elevated compared to Brits and Finns as well.

It is tempting to interpret these differences as a north-south cline in Neandertal ancestry. I wouldn't jump too quickly on this idea, because Holocene population movements in Europe are now known to have covered up or erased a substantial fraction of the Upper Paleolithic gene pool. If we have a bonus of extra Neandertal ancestry in southern Europe, we need to explain how that cline persisted across subsequent history. Still, the difference is statistically very strong and deserves some explanation.

Likewise, the populations within East Asia have some differences in Neandertal similarity. Here is the comparison of Han Chinese, with the Beijing versus South China origins separated out:

Comparison of shared Neandertal derived variants in CHB and CHS samples

North China has a bit more Neandertal, on average, than South China according to these samples. These are all identified as ethnic Han Chinese, so I expect that the comparison would be much more interesting if some minority populations had been examined. The "cline" here seems opposite in direction compared to the European case. I can add that the Japanese sample is largely intermediate between the CHB and CHS, with an average closer to the Beijing sample.

If there was one thing that surprised me in the comparisons, it was this:

Comparison of shared Neandertal derived variants in Luhya and Yoruba samples

Yoruba have substantially more Neandertal similarity than Luhya. This may seem counter-intuitive, because the geographic location of Luhya in East Africa might seem better placed for Neandertal similarity to appear, whether through ancient population structure and ILS or through recent gene flow or backmigration into Africa of Neandertal descendants.

Instead, it looks like the Yoruba are the recipients of Neandertal genes, whether by means of ancient population structure or introgression and recent trans-Saharan gene flow. I personally think both factors are involved, but again their relative importance will be determined by comparing individual gene regions.

In this vein, it is useful to outline the hypothesis of differential ILS within African samples. We now know from examination of genetic variation within Africa today that some of today's diversity can be traced to ancient population structure in Middle Pleistocene African populations. For example, Neandertals could be more closely related to some African populations than others today because Neandertals actually exchanged genes with some ancient African populations. Or Neandertals might have sprung from one African population among many who lived 250,000 years ago. If some of these ancient populations persisted and contributed genes to different present-day African populations, those populations would share different fractions of genes with a Neandertal genome.

I expect we will learn a substantial amount about African population structure in the MSA by using these Neandertal-similar regions of the genome. It's like having a probe that can trace the movement of people across Africa more than 100,000 years ago. As we combine the archaic genome data with our growing picture of diverse lineages in Africa today, we may discover ancient populations that are not apparent archaeologically. Again, genetics is giving us a totally new picture of the diversity and population dynamics of ancient people.

Next: Which Neandertal-derived variants are shared between regions, and which are unique to one region? I touched on this question last spring by using genotype data. Now, we have sequences capable of telling us much more.