No Neandertal in you?

Elizabeth Pennisi has a news article in today's Science with the headline, "No Sex Please, We're Neandertals." It covers a couple of talks by Svante Pääbo at a Cold Spring Harbor meeting.

I'll get to the headline in a second; first I want to point out the more interesting paragraph at the end of the piece:

In a side project, Pääbo and his graduate student Johannes Krause have examined 30,000- to 38,000-year-old human fossils from Uzbekistan and the Atlai region of southern Siberia whose identities were a mystery. When the researchers compared the bones' mitochondrial DNA with that from more than a half-dozen Neandertals, they found that the Asian fossils were clearly Neandertal. "It tells us that Neandertals were much more widespread than we thought," says Pääbo.

It's not entirely unexpected that the biological population of central Asia was Neandertal-like during this time range; many researchers have long classified the Teshik-Tash child from present-day Uzbekistan as a Neandertal. I wouldn't go so far, but there are anatomical similarities between this specimen and European Neandertals that suggest gene flow right across central Asia.

What's interesting about the mtDNA result is that the Neandertal mtDNA lineage is defined by a number of distinctive mutations, all of which took some time to occur on that branch. The variation within Neandertals so far is quite limited -- much like the variation within recent humans is limited. And the date of separation of the Neandertal and recent human clades is also relatively recent -- between around 350,000 and 700,000 years ago.

So Neandertals were a population that was circumscribed to a small amount of mtDNA variation, like recent humans. This we already knew. But the large geographic extent of their mtDNA clade shows that they were a geographically widespread population -- from Spain to the border of China -- with a very small amount of mtDNA variation. Like recent humans.

And like recent humans, relatively rapid genetic dispersals appear to have been possible over long distances. That's not the stereotype of Neandertal population dynamics we're used to reading.

Now, about that interbreeding thing.

In one of last fall's Neandertal genome papers, Green and colleagues (2006) reported that the putative Neandertal sequence included an unusually high number of human-derived SNPs -- that is, polymorphisms in humans where both the Neandertal and some humans carried a derived mutation, while other humans carried the ancestral nucleotide.

These human-derived SNPs are important because they are likely to be relatively recent; and mutations that recently emerged in humans should be less likely to be found in a Neandertal. That is, unless Neandertals were interbreeding with the ancestors of living people. This isn't quite the same thing as Neandertals being the ancestors of living people; the comparison doesn't test for the direction of gene flow, which conceivably was one-way gene flow into Neandertals. Still, it was pretty striking evidence for Neandertal-human genetic interactions (as I pointed out in my FAQ post), if it was true.

But there was some doubt about the conclusion of gene flow. For one thing, the sequence might be contaminated by DNA from a recent human. There still is no way to tell from these sequencing techniques whether a given fragment of DNA from the fossil is actually endogenous to the fossil, or whether instead it is a contaminating sequence from some living (or recently dead) person. There's still no solution to this problem, beyond the claim that the sequence contains a small proportion (maybe less than 6 percent) of mtDNA contaminant sequences from recent humans. But 6 percent contamination could put a lot of human-derived SNPs in the sample, making it look like gene flow existed where none actually did.

The second reason for doubt was that databases of human SNPs are biased toward common alleles. That is, when people are looking for genetic markers (usually for medical research), they tend to exclude very rare polymorphisms and focus on ones where the alleles are nearer to 50 percent frequency. This is called an ascertainment bias. The bias is a problem for the Neandertal comparison because common alleles are more likely to be older than rare alleles. Which means that the human-derived SNPs in the human databases are probably somewhat older on average than theory would predict in the absence of this ascertainment bias.

In other words, these human-derived SNPs are interesting because they ought to be recent, but in fact the sample of SNPs that we have is likely to be older than they ought to be.

Now, this might make a difference to the hypothesis of Neandertal-human gene flow, or it might not. There is a pretty simple way to find out whether it makes a difference -- just work out the ages of the human-derived SNPs in humans.

Apparently, this isn't what the research did. Instead, they decided to limit the human comparison to two individuals -- attempting to zero-out the ascertainment bias.

So David Reich of Harvard Medical School in Boston and James Mullikin of the National Human Genome Research Institute in Bethesda, Maryland, have now compared SNPs in new Neandertal sequences to random SNPs obtained from one African and from one European. The result: "There's no indication of gene flow," Pääbo reported. Pääbo and his group got the same result when they examined variation in the Y chromosome, looking for signs of Homo sapiens DNA embedded in the Neandertal sequence.
It may never be possible to prove beyond doubt that interbreeding did not occur. "But if I were to make a guess, I would say more sequence will just confirm [these results]," says Noonan. "It convinces me."

The Y chromosome is expected; the recent human coalescent is so recent that a descendant sequence would be very unlikely to be found in this Neandertal.

Much depends on how the "random" SNPs were obtained, so I can't evaluate until I see more details. For example, if they were obtained by resequencing the same million base pairs in two humans as has been recovered from the Neandertal sequence, that would probably work. On the other hand, if they were obtained by bootstrapping already-existing SNPs from two HapMap individuals ... well, that's probably not the best idea.

And we are still left with the contamination problem. The thing is, contamination predicts that there ought to be an excess of these human-derived SNPs in the Neandertal sequence. Some of them should be in that sequence because of contamination, if for no other reason. So if they aren't finding any evidence of them in their comparisons, hmm...

In any event, none of these comparisons really address the most likely reason for gene flow from Neandertals into recent humans (or vice versa), which is selection. If the number of introgressing genes was relatively modest, we wouldn't expect to see a large number of human-derived SNPs in the Neandertal sequence, even though the gene flow between the two populations was highly important to their fitness. I've gone into this before, and of course it was the subject of one of my papers last year.

References:

Pennisi E. 2007. No sex please, we're Neandertals. Science 316:967. doi:10.1126/science.316.5827.967a