Finding more Neandertal genes, chromosome 19 edition

When I last wrote about the Neandertal genome, I showed that across the X chromosome, Europe and China have different Neandertal genes. There is overlap between the two, but as a generalization few Neandertal haplotypes that are common in Europe are also common in China, and vice-versa. I described the basic method for finding Neandertal haplotypes in recent people last month (“Neandertal segments of X chromosomes”).

Almost all of the Neandertal haplotypes found in the X chromosomes of recent people are relatively rare, occurring in fewer than 10 percent of individuals. The largest fraction of Neandertal haplotypes occur in only a single person in the HapMap samples.

But is this a pattern that occurs on the autosomes, or does it reflect X chromosome dynamics in some way?

That’s not a hard question to answer, and I went looking first at chromosome 19. The number of haplotypes is fewer, because chromosome 19 is shorter than the X. The overall pattern is the same. Most Neandertal haplotypes are rare in the HapMap samples, and relatively few are common in both the CEU and CHD samples.

Neandertal haplotypes on chromosome 19 histogram in CEU and CHD HapMap samples

I put the origin at the rear; CEU (European ancestry in Utah) number of copies goes toward the left, CHD (Chinese immigrants in Denver) toward the right. You can see that most of the cases are clumped on the extreme edge of both axes. There are not higher counts in CHD; the two axes are at different scales because of one extremely common region in Europeans, as noted below.

I’ve received a few comments on the 3-d histograms. I don’t like them much, either, and I’m looking for an alternative. This one in particular is miserable; because it’s out of scale. I’d like to plot these in 2-d using shading to denote bin counts. Unfortunately I haven’t found a quick and dirty program that will do this in 2-d, and I’ve got too wide a range of bin counts for a bubble plot to do it without a lot of tweaking. So I’m stuck with these for now. I can either write about them and share them or spend my time finding a better graphing solution.

I’ve done a few more comparisons. When we look for Neandertal 10-SNP haplotypes in CEU versus TSI (the sample from Tuscany), we find mostly the same haplotypes in both samples. A haplotype in 10 copies in CEU is certain to be in TSI, and vice-versa.

Neandertal haplotypes on chromosome 19 histogram in CEU and TSI HapMap samples

Number of copies in CEU goes across the bottom, TSI back into the picture. This is such a striking difference from the CEU-CHD comparison. It’s very comforting to me, because this is totally the expected pattern – CEU and TSI should have the same things, because they share most of their population history! I will mention that for the X chromosome, CHB and JPT have a similar pattern, they mostly share the same stuff. This helps lend some significance on the finding below that GIH is also pretty different from all these other samples.

You can see that there is one locus where CEU has more than 100 copies (the little cluster there indicates that this haplotype extends over more than 10 SNPs, in fact it’s 13 SNPs with possibly 2-3 flanking SNPs forming a decay pattern on either side; the total length is around 150 kb. There are more than 80 copies in Tuscans, and more than 40 in Gujaratis, but only a single copy in the Chinese sample. Three genes lie in this interval but none point to any obvious hypothesis (to me, at least), about why the Neandertal haplotype would be especially common in western Eurasia. I note it because this is the first Neandertal haplotype I’ve found with a frequency up over 20 percent or so; this one is about 60 percent in CEU and 50 percent in TSI.

The Gujarati (GIH) sample adds its own distinct twist. There is some overlap between GIH and CEU, and some overlap between GIH and CHD. But by and large the same pattern obtains as between Europe and China: India has its own Neandertal common variants, not widely shared with either CEU or CHD. For example, here’s the CHD comparison; CHD going toward left, GIH toward right. The basic pattern is that most cases are clusted on the edge of the graph, few are scattered across most of the area, and there’s no consistent pattern among them. Still, the highest-frequency GIH case is the same as the high-frequency haplotype noted in CEU and TSI above.

Neandertal haplotypes on chromosome 19 histogram in CHD and GIH samples

These examples should demonstrate pretty clearly that this is not solely an X chromosome phenomenon; basically we’re looking at the effects of drift in small ancient populations after they mixed with Neandertals.

I did have an excellent question today after my talk where I discussed this pattern – how do we know that this isn’t separate mixture events giving rise to different Neandertal-derived variants in different recent humans?

That’s not a trivial question to answer, and I don’t think we could easily rule out the hypothesis in the abstract. But the fact that these populations have very similar fractions of Neandertal contribution overall does suggest a single history of mixing. I’ll give this some more consideration as I look across the rest of the genome.