Neandertal DNA

Sample sizes and the "Neandertal haplogroup"

I have an excellent e-mail question about last week’s Neandertal mtDNA paper, which has provoked a lot of commentary.

I just skimmed over your comments on the recent paper and I have a couple questions. First, how many Neanderthals did they receive mitochondrial DNA from? I think I read somewhere that it was fewer than ten.

Second if that is true, what the hell does it mean? I wouldn’t try and predict anything based on even fifty humans from that long ago much less 8 or 9 in genetic terms. I don’t think that anyone else would either unless they are grandstanding. You can’t prove a negative so they really can’t say that no modern humans have any Neanderthal DNA. Did they study Neanderthals from Asia? I just think they don’t have a good enough sample and until we can resequence a Neanderthal nucleus and bring the little tyke to term and wait for him or her to marry then wait for those kids to have kids will we really be sure we’ve got the goods.

Krause et al. (2007) list 15 Neandertal partial mtDNA sequences. Ten of these at that time presented relatively long portions, including the central Asian Okladnikov and Teshik Tash specimens, Mezmaiskaya, Feldhofer 1 and 2, Vindija 75 and 80, Scladina, Monte Lessini, and El Sidrón 1252. The same paper lists five additional specimens for which only a very short sequence had been recovered (just enough to diagnose as part of the Neandertal clade), including Vindija 77, El Sidrón 441, Engis 2, Rochers de Villeneuve, and La Chapelle-aux-Saints.

We do not know that every Neandertal belonged to the same mtDNA clade as those 15 sequences. Some of them may have looked different, possibly including the new clade otherwise present in later Upper Paleolithic and living people. But based on the 15 sequences we have, we can say that a large fraction of Neandertals must have carried the “Neandertal haplogroup.” Exactly how large a fraction depends on what we are willing to believe about contamination, preservation, and the randomness of our sample.

Now, let’s consider the question: Can we predict anything about Neandertal evolution and relationships based on this small, possibly unrepresentative sample of mtDNA?

The answer is that it doesn’t matter very much whether we have 5 sequences or 500. If 15 out of 15 specimens from different sites across Europe preserve a single mtDNA haplogroup, we can’t say it was universal, but we can say it was common. If 40 out of 50, or 400 out of 500 specimens had the same haplogroup, that would increase the precision, but not change the basic fact: Neandertals had at least one common haplogroup that is now so rare it has never been found in a sample of 100,000 or more people. We deserve some explanation.

The possible explanations are:

  1. Random genetic drift
  2. Accelerated genetic drift due to demographic turnover
  3. Population extinction and replacement
  4. Natural selection


Drift

Random genetic drift is fairly easy to refute, although it might not appear so at first. In favor of drift: There were few Neandertals, and the population size of the succeeding Upper Paleolithic, up through the Last Glacial Maximum, was also small—the best estimates are on the order of 2000 people for Western Europe and 5000 for continental Europe to the Urals (Bocquet-Appel et al.2005). There would have been perhaps twice or more that number across the entire Neandertal range. The effective population size represented by this population would have been smaller; perhaps 3000–5000 for Neandertals and Aurignacian-era people, only half, or around 2000, females. Genetic drift in this small mtDNA population would have been much stronger than for autosomal genes, and very much stronger than in most recent human populations.

But when we plug these numbers into a model of random genetic drift, it starts to appear very unlikely that drift alone could explain the observations. Let’s assume (falsely) that our Neandertal genetic samples all dated to 40,000 years ago, and the female effective size was 2000 individuals between then and 15,000 years ago, and that the population of Neandertal country were a random mating pool. Following these assumptions, on averageall the mtDNA genomes at 15,000 years ago would descend from only 4 or 5 ancestral copies in the population 40,000 years ago. If these five ancestral copies were, by chance, a different haplogroup from the 15 copies we’ve already found, then drift could explain the data.

However, this still doesn’t appear very likely. So far, every one of the Neandertals shares a single haplogroup. The frequency of this haplogroup was apparently very high, making it very unlikely that all five ancestral copies would have belonged to some other haplogroups of which we have never found any trace.

Notice that this argument does not depend very much on the number of Neandertal mtDNA sequences that we have found. The fact that there are 15 helps to constrain the frequency of the haplogroup within the population 40,000 years ago, in our model. That frequency is unlikely to be less than around 85%, assuming random sampling. But suppose there were only five. We would still know that the Neandertal haplogroup was very common in its population, even if we thought it was only 50%. It would still be unlikely to draw four or five ancestral copies and have all of them be some other haplogroup that we haven’t found.

This gives us a considerable confidence margin against drift. We need it. After all, the Neandertals were not randomly sampled at a single time, and it is possible that some of them actually carried a human-like mtDNA sequence, which we now falsely interpret as contamination. But even with these shadows hanging over us, it would still be unlikely that none of the ancestors of today’s mtDNA variation were like the Neandertal haplogroup.

Also, the population was not a random-mating pool. When we add geographic structure to the story, which tends to reduce the importance of genetic drift, we find that the possibility that drift alone is almost zero, and it remains very unlikely that a single migration of modern humans interbreeding with Neandertals under random drift could explain the observations, either (Currat and Excoffier2004).

Extinction

It is at this point that most geneticists turn to the hypothesis of complete Neandertal extinction. They have a point. Genetic drift apparently cannot explain what we have observed, In their point of view, if genetic drift alone cannot explain the Neandertal mtDNA disappearance, then the only other random process at hand is extinction.

I think that hypothesis is false. It does not account for morphological similarities between Neandertals and later people, genetic evidence that suggests a strong ancient population structure with introgression, or with the apparent behavioral continuity in the Upper Paleolithic.

Happily, I don’t have a commitment to random processes. Instead, I think that the mtDNA evolution of Europe was driven by nonrandom processes of demographic turnover and selection.

Demographic turnover

Here we come to an important point. No one believes that later Europeans evolved from earlier Neandertals by a random process of genetic drift. Yet that is precisely the hypothesis that most studies have set up to refute. Without question it is valuable to set up boundary conditions under the hypothesis of random genetic drift. But the time has come to investigate more interesting models.

Personally, I am surprised that more complicated metapopulation dynamics have not gotten more attention as an explanation for the Neandertal mtDNA results. Population sources and sinks are a hot topic in biology, and you would think that anthropologists would have picked up on this. To my knowledge, the only time anyone has examined a population sink model was in 2001, when Milford Wolpoff and I worked with mathematician Per Enflo on such an idea for Neandertals (Enflo et al.2001). This idea deserves a fuller treatment (I think I’ll suggest it as a project for one of my classes this year!).

In a nutshell, a population sink is a region where the average rate of reproduction is below replacement levels. This region can remain populated only if individuals migrate in from other places. The places that reproduce above replacement are called population sources. The continual migration from sources to sinks creates a genetic gradient. Individuals sampled at any given time in the population sink are overwhelmingly likely to have ancestors not in the sink but in one or more source populations.

Europe today is a population sink. The population of the continent does not produce enough children to replace itself, and immigration from other parts of the world is high. There are several reasons to suggest that Europe may have been a population sink in prehistory as well. In Neandertal and Upper Paleolithic times, climate fluctuations created unique challenges in Europe, where caloric expenditures were high and food harder to obtain than some other regions.

Continual migration into Europe would provide a simple explanation for why none of today’s mtDNA haplogroups derive from the European Neandertals. The mtDNA population of 15,000 years ago had a few ancestors 40,000 years ago, and none of these ancestors lived in the sink population—all came from the source population in Africa or West Asia. The Neandertal mtDNA variation would have been a short-lived phenomenon, continually being turned over from source populations. Some Neandertal genes would have survived in Europe for hundreds of thousands of years, but some would have come in with more recent migrants from the population source.

There are points that argue against this source-sink hypothesis. The Neandertal-human divergence time for mtDNA is not very different than that estimated for the autosomal genome. If a European population sink had made genetic drift more powerful, that should have affected mtDNA more than the autosomes, so we might expect a more recent mtDNA divergence. Still, there is nor reason why the source-sink dynamic need have been constant over Neandertal evolution, and there may have been multiple sources in the Pleistocene, not only Africa and West Asia. Investigating the boundary conditions of the source-sink model and its correspondence to autosomal genetic results would be helpful.

I should note that mtDNA is not special. Neandertals had lots of traits that are now very rare. The horizontal-oval, or “bridged” mandibular foramen is a prominent example. Out of the relatively small sample of Neandertal mandibles, half have this derived form. Fewer than one percent of recent European mandibles have this form. As for mtDNA, a once-common variant is now very rare. And as for mtDNA, we deserve some explanation. A source-sink model would appear consistent with the continued evolution of such traits during the Upper Paleolithic—a time when the extinction and replacement hypothesis predicts no change in these characters.

Natural selection

The other nonrandom hypothesis is natural selection, which would presumably have favored one or more modern human types while eliminating the original Neandertal haplogroup. I won’t say much about that hypothesis here, since I discussed it in my initial post about the whole-mtDNA-genome sequencing. Selection has a leg up over the other hypotheses now because it seems like there’s good evidence it happened.

Still, selection on mtDNA alone could not explain the total pattern of observations about Neandertals. Physical traits that were once frequent in Neandertals were much less common or absent in later Europeans, and some continued to reduce in frequencies over time. To explain these changes, we must invoke either selection on other traits, or continued demographic turnover in the post-Neandertal population (probably more immigration into Europe) or both.

So selection on mtDNA has never been a sufficient or necessary hypothesis, even if we assume that other genes carried by Neandertals still survive. But given the current evidence that suggests something distinctive about the mtDNA of recent humans, natural selection may receive renewed attention as a factor in the disappearance of the Neandertal mtDNA haplogroup.

References


   Bocquet-Appel JP, Demars PY, Noiret L, Dobrowsky D. 2005. Estimates of Upper Palaeolithic meta-population size in Europe from archaeological data. J Archaeol Sci 32:1656–1668. doi:10.1016/j.jas.2005.05.006.

   Currat M, Excoffier L. 2004. Modern humans did not admix with Neanderthals during their range expansion into Europe. PLoS Biol 2:e421.

   Enflo P, Hawks J, Wolpoff MH. 2001. A simple reason why Neanderthal ancestry can be consistent with current DNA information. Am J Phys Anthropol 114:S62.

   Krause J, et al. 2007. Neanderthals in central Asia and Siberia. Nature 449:902–904. doi:10.1038/nature06193.

Complete Neandertal mitochondrial sequence, and selection on human (not Neandertal) mtDNA

In the current Cell, the Max-Planck group, in coordination with 454 Life Sciences, report the sequence of a complete Neandertal mtDNA. I'm out of town right now, so I'm writing fairly quickly, and I haven't seen any of the reporting. Keeping that in mind, I wanted to set out a few of the interesting things about the paper.

I've been waiting a long time for this sequence to come out. I know they've had the basic data for a long time, since the mtDNA copy number is very high, the 454 process kicks out a lot of mitochondrial sequence. The reward for the wait is that Green and colleagues have done a very careful job of comparative analysis, with some very interesting results.

If I leave something obvious out, please forgive me, since I'm just dashing this as quickly as I can.

Where we left off...

All previously reported sequences of Neandertal mtDNA have been fragments of the control region. The control region of the mtDNA (hypervariable regions I and II) is very helpful for working out phylogenetic relations among recent humans. True to its name, it varies a lot, and its high mutation rate allows a fine discrimination among lineages that have differentiated only within the recent past.

The high mutation rate of the hypervariable regions also means that closely related populations have accumulated many differences. That's very convenient for identifying Neandertal mtDNA, where only small fragments (up until recently) have been practical to obtain. A small fragment of the mtDNA control region is sufficient to assess whether a specimen is like other known Neandertal sequences or not. Up to now, this has been an important way of authenticating Neandertal DNA sequence results --- although it has the obvious drawback that it might falsely exclude some genuine sequences that really do look like the modern human form.

So far every Neandertal mtDNA sequence looks like a member of the same mtDNA clade. (More carefully, every specimen with good biological preservation that has produced DNA has yielded at least some mtDNA sequences that form a clade distinct from all recent humans. Others are presumed to be contamination -- which I have no reason to doubt.) No recent human -- out of the many thousands that have been sampled so far -- has produced a mtDNA control region sequence like any known Neandertal. The two populations, so far as we can tell, possessed distinct mtDNA clades.

Divergence time

A complete mtDNA sequence provides a lot of sites, which allows a more precise estimate of the divergence time between recent human and Neandertal mtDNA lineages. The paper reports this time as 660,000 years ago, with a confidence interval from 520,000 to 800,000 years ago. That range of dates substantially overlaps with the prior estimates of divergence time, and is a pretty good match to the initial estimate based on a single HVR1 sequence in 1997.

The availability of a complete sequence has also removed a remaining piece of ambiguity from earlier comparisons. Because the hypervariable regions are so variable, it has always been the case that comparisons of hundreds or thousands of recent humans have included some pairs of individuals who are really divergent in their control region sequences. The result: some people living today are more different from each other than Neandertals are from recent people.

Now, that particular fact is not meaningful in a cladistic sense. Neandertal sequences share derived mutations, as do recent humans. But the concept of a "range" of genetic divergence has confused comparisons. Comparing the control region alone, it may appear that Neandertals were not so very different from living humans, even if they have a few derived mutations that no longer exist. As long as some humans were also very different from each other, it remained possible that the tree had been wrongly reconstructed. An equally parsimonious tree (or even a more parsimonious one) might link the Neandertal clade with some modern human, even if not a recent European. When comparing humans to chimpanzees and more distantly related primates, the hypervariable regions are somewhat saturated with mutations, meaning that parallel mutations between different species are very common. This makes it even harder to reconstruct the tree of mtDNA relationships based on the hypervariable regions alone.

Comparing the complete mtDNA genomes of a Neandertal and many recent humans presents a very different picture. Humans are all more similar to each other, when comparing the complete mtDNA genome, than any human is to a Neandertal. And in fact the Neandertal sequence is three or more times as different, on average, from us as we are from each other. This change from the earlier picture is a purely statistical one: more sites, with a more regular mutation rate. But it makes a clearer picture, and one that supports the phylogenetic model more clearly.

Selection on COX2?

Even though the control region is so helpful for analysis of recent humans, and easy identification of Neandertals, it's only a small fragment of the complete mtDNA. The mitochondrial genome is inherited as a single unit, so different mutations on a single mtDNA are co-inherited with each other. That means that the diversity of the noncoding control region is shaped by both genetic drift (due to demography) and selection. The selection includes purifying selection on coding sites across the entire mtDNA genome, and the possibility of positive selection on one or more ancient mutations.

I believe that positive selection on mtDNA in ancient humans has a lot of indirect support (and I wrote as much here). To give a brief list:

  • Mitochondrial haplotypes in living humans correlate with functional variation in disease, longevity, and performance -- all areas that have undergone recent biological shifts in humans.
  • Some mtDNA haplotypes in humans appear to have been under recent positive selection, as indicated by their geographic distributions.
  • Some mtDNA haplotypes have vastly changed in frequencies within the past few thousand years, as evidenced by ancient DNA samples.
  • Nuclear genes involved in mitochondrial function have been under recent positive selection.
  • MtDNA from Neandertals is completely absent today, despite the other evidence for genetic survival of that population. This combination is very unlikely if mtDNA was neutral.

So I think that positive selection is not only a reasonable hypothesis, it is extremely likely. But that is not to say that it has been demonstrated. Others might say that my final reason, that positive selection can explain the apparent contradiction between mtDNA and other data (such as skeletal comparisons and apparent nuclear introgression), is a case of wishful thinking. They might argue that all this other evidence of Neandertal-modern gene flow is an illusion, and not a problem to be explained.

I don't think they're right, but in the spirit of honest advertising, that's what they think.

It would be unreasonable for me to expect that a Neandertal mtDNA genome would provide strong evidence of positive selection on the human lineage. Finding such evidence would require repeated selected substitutions, probably within a single gene. Otherwise there would never be statistical evidence of positive selection. The available tests for positive selection in a two-genome (or in this case, two-clade) comparison are very weak.

Only a single selected mutation would be sufficient to explain the complete replacement of Neandertal mtDNA by an advantageous modern human type. No test of selection is powerful enough to refute neutrality based on a single selected site in a comparison of two mtDNA genomes. And repeated selection on a single gene just doesn't seem as likely as one or a few instances of selection, potentially on many mtDNA coding regions.

So imagine my surprise, when reading this paper, when I discovered that they found repeated substitutions on a single mtDNA gene in the human lineage, and statistical evidence of positive selection!

The gene is cytochrome oxidase subunit 2 (COX2). Using the chimpanzee mtDNA sequence as an outgroup, there were 18 human-specific and 20 Neandertal-specific nonsynonymous coding substitutions. Out of the 18 human-specific substitutions, 4 were in COX2. Only three synonymous substitutions occurred in humans for this gene (the ratio 3:4 differs from the ratio for other mtDNA coding regions, 54:14). In contrast, Neandertals had no coding substitutions -- every difference between Neandertal and human sequences is inferred to have occurred in ancient humans. These data are unlikely unless COX2 was recurrently selected in ancient humans.

More evidence will be necessary to establish positive selection. The paper includes multiple comparisons of different genes, so a significant result for this one is necessarily weakened by the multiple-comparisons correction.

But in a very interesting part of the paper, the authors did a functional analysis of the human-specific changes in COX2. Functional analysis of coding sites has come a long way in the last few years. Last fall, we saw it applied to the Neandertal-specific mutation of the MC1R gene. It was the functional analysis that argued that the mutation likely resulted in a red hair phenotype. These functional analyses consider the position of a mutation within the protein sequence, the extent to which that part of the protein interacts with other proteins, and whether the coding changes are otherwise conserved in other species.

Here is the paper's conclusion about COX2:

Another interesting observation is that COX2 stands out among proteins encoded in the mitochondrial genome as having experienced four amino acid substitutions on the modern human mtDNA lineage. Further work is warranted to elucidate the functional consequences of these amino acid substitutions. However, all these substitutions are in regions of the protein that, based on the crystal structure, do not have any obvious function, and they are variable among primates. Hence, they may represent either minor adaptive advantages, perhaps of regulatory relevance, or have no significant functional consequences for mitochondrial function. Unless other evidence for their importance becomes available, we see no need to invoke positive selection to account for the evolution of COX2 on the human lineage (Green et al. 2008:423).

To me, a very persuasive finding is that each of the four human-specific mutations of COX2 is also found in some other primate species. In other words, where humans differ from chimpanzees and Neandertals (and generally, gorillas and orangutans), humans are like baboons or macaques. The authors of the paper read this finding as evidence that the changes have little functional importance. But I see this as a suggestion that these substitutions are functionally salient. Different primates have different energetic and dietary constraints, and it should be no surprise if they exhibit functional convergences in mtDNA. Humans evolved four separate sites, within the last half-million years, to be similar to some cercopithecoids and different from most other hominoids. Neandertals exhibited no evolution in this gene. This makes sense under a hypothesis of mtDNA selection in accordance with functional requirements, which we have good reason to believe were different in humans and Neandertals.

But as the authors say, we need more evidence about the function of these genes. I think the comparative evidence now supports the hypothesis of selection very strongly, and is consistent with the pattern of evidence from the nuclear genome and from the anatomy of early Upper Paleolithic Europeans.

Contamination

This paper advances our understanding of contamination within the Neandertal sequences. The authors acknowledge Wall and Kim's (2007) interpretation of a high contamination rate in the earlier reported nuclear genetic data off the 454 platform, and provide additional information to support a relatively high contamination rate:

Contamination with extant human DNA is the other dominant source of erroneous Neandertal sequences. Given the high coverage and the fact that the best estimate of the contamination rate here is 0.5% (with an upper 95% confidence limit of 0.87%), we do not expect contamination to affect the mtDNA sequence assembly to any appreciable level. Under the assumption that the Neandertal mtDNA sequence is reliable, it is a useful tool for gauging contamination when sequencing the Neandertal nuclear genome. Previously, assays to determine contamination within Neandertal fossil extracts were limited to the HVRI, which carry few positions where extant humans differ from Neandertals. By contrast, the complete Neandertal mtDNA now offers 133 such positions. This enables a reliable estimation of mtDNA contamination by analyzing sequence reads from 454 libraries, rather than by PCR-based assays of the DNA extracts. For example, when we do this in a small preliminary data set initially published from this fossil (Green et al., 2006), 10 of 10 sequences are classified as Neandertal. However, in further unpublished sequencing runs from that library, 8 out of 75 diagnostic sequences derive from extant human mtDNA, suggesting a contamination rate of ˜ 11% (CI = 4.7%–20%). This is in agreement with the suggestion (Wall and Kim, 2007) that contamination occurred in that experiment. That library was constructed outside our cleanroom facility and before the introduction of the Neandertal-specific key, which is crucial for the detection of contamination by other 454 libraries, and was therefore not used for the subsequent Neandertal genome sequencing project (Briggs et al., 2007). However, with the help of the mtDNA presented here, such levels of contamination are now easily detectable from 454 sequencing runs (Green et al. 2008:424).

So the mtDNA from the same sequence library as the previously reported 1 Mb of Neandertal nuclear genome shows a high contamination rate. That's really disappointing, since it means we have no data to work with. We'll just have to wait.

OK, that's all I have time to post; more later...

References:

Green RE and 24 others. 2008. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing. Cell 134:416-426. doi:10.1016/j.cell.2008.06.021

The mtDNA sequence of Paglicci 23

Is there anything surprising about finding the Cambridge Reference Sequence in Paglicci 23?

UPDATE follows at the bottom.

Last week, a short article in Science by Rachel Mackelprang and Edward Rubin discussed some of the recent advances in ancient DNA extraction. Of most interest is the paragraph that discusses ways to probe for particular genes while avoiding some drawbacks of PCR amplification:

Microarray-based hybridization, coupled with high-throughput sequencing of recovered DNA, has recently been used to capture thousands of targets in parallel from modern DNA samples. With these strategies, a DNA sample is directly applied to an array of specifically designed oligonucleotide probes immobilized on a chip. Complementary fragments hybridize to the probes while the remaining nonbound DNA is washed away. The hybridized DNA can then be eluted from the chip and sequenced, resulting in enrichment of targeted genomic regions (11). Alternatively, chip-synthesized oligonucleotide probes have been released from the chip and used to capture molecules in solution (12). A purely solution-based method, where sets of probes are designed against a reference genome and used as a bait to "hook" corresponding sequences from a DNA pool (13), has been used to recover specific regions of nuclear DNA from Neandertal and cave bear genomic sequence libraries (1). These various capture approaches hold promise for economically investigating the same sequence in multiple different samples as well as examining multiple independent molecules of an allele isolated from a single sample.

The short review also mentions problems with contamination and some of the results that indicate contamination of the Neandertal sequences, which I've discussed before (Complete Neandertal DNA files).

FOXP2 is really recent, it really did introgress (if it's not contamination)

That's the thrust of a technical comment by Graham Coop and colleagues, now online in Molecular Biology and Evolution. The letter refers to the extraction of FOXP2 from two Neandertal specimens from El Sidrón, by Johannes Krause and colleagues, reported last year (I wrote about the paper here).

First, the bad news. The current letter raises the prospect of contamination. Notably, the controls applied by Krause et al. (2007) may be relatively weak evidence against contamination, because of polymorphism within large human comparative samples. The tests rely on the assumption that there is little DNA from living humans in the samples. But if we cannot distinguish Neandertal from human DNA with great accuracy, then we will be mistaken some proportion of the time. Krause et al.'s test, based on derived human alleles absent from the Neandertal genome draft, can still go wrong if the human contaminants happen to have all the ancestral (non-derived) human alleles.

Well, that seems to be the story these days with Neandertal DNA extraction. No test of contamination is good enough. (And remember, that every "test" of contamination is really a procedure for excluding the hypothesis that ancient sequences are identical to recent ones.)

Now, the more interesting news. Coop and colleagues verify that the selective sweep affecting human FOXP2 was indeed recent -- they estimate 42,000 years ago:

To demonstrate this, we estimated the time of the most recent common ancestor (tMRCA) of the selected haplotype (see Figure 1), using an approach sometimes called phylogenetic dating (Thomson et al. 2000; Hudson 2007). This method does not make assumptions about demography and selection, but only requires that the mutations in the intron be neutral or nearly neutral. Taking this approach, we obtained a mean tMRCA of 42 Kya (see SOM for details). While there is considerable uncertainty associated with this estimate, it is surprisingly recent if selection took place over 300 Kya (see SOM). In other words, the selective scenario proposed by the authors cannot account readily for patterns of variation in modern humans. Given that we have no power to detect a beneficial substitution that occurred over 250 Kya, (cf. Sabeti et al. 2006) yet we see a footprint of positive selection at FOXP2, the conclusion of a recent selective sweep at FOXP2 is not surprising (Coop et al. 2008:3-4).

FOXP2 is in one of the ENCODE regions, so its variation is pretty well known. This is not a problematic case: it has a very limited amount of variation around it, and has a strong excess of rare alleles, both signs of a recent sweep.

Coop and colleagues suggest that the beneficial human allele spread into Neandertals (or vice versa) by low levels of gene flow coupled with its selective advantage -- in other words, introgression.

They do allow for an alternative -- perhaps the two amino-acid-coding mutations were not the target of selection, but instead some linked locus. This would not erase the necessity of gene flow from Neandertals, but would question whether this gene flow had involved the FOXP2-language scenario, since it might be some linked gene unrelated to language.

(CORRECTION (2008/04/18): If selection were on a linked site, then Neandertals might share the human-derived amino acids as a result of ancient shared ancestry with humans, while the linked selected sweep might be absent in Neandertals, not necessitating any gene flow.)

I doubt this hypothesis of a linked sweep, since the two sites with human-derived substitutions are otherwise very strongly conserved among mammals. This looks like a credible target for recent selection. But the hypothesis of selection on a linked site cannot presently be tested.

So that's the story. It seems very likely that Neandertals got the language gene from us, or us from them, long after many other genes in the two populations diverged. I write "many" rather than "most" because we haven't really been able to assess the proportion of derived alleles shared by humans and Neandertals. The completion of the draft sequence may help, but I'm afraid that the specter of contamination is going to keep on being raised whenever a part of the Neandertal draft genome looks humanlike.

(via Dienekes)

References:

Coop G, Bullaughey K, Luca F, Przeworski M. 2008. The timing of selection at the human FOXP2 gene. Mol Biol Evol (in press) doi:10.1093/molbev/msn091

Introgression encore

Although I've had a number of papers come out this year, there are two in particular that I've been working on for quite a long time. Both papers began their gestation in the summer and fall of 2005. Each of the two papers explicates a major pattern for the action of natural selection in human evolution -- to my mind, at least, the most important two. Each was a long project, requiring the integration of mathematical, theoretical and informatic resources, and researchers scattered across the country.

Both papers were submitted earlier this year to different journals, and in several instances revisions and decisions about them were made within a week of each other.

Now, the two papers are being published online, both within a week of each other.

The first to appear is our review of genetic introgression and modern human origins, now online in Trends in Genetics.

Gregory Cochran and I published a number of theoretical considerations about introgression last year (Hawks and Cochran 2006, described in this post). That paper included a very comprehensive review of adaptive introgression among natural populations, focused on mammals, citing more than 170 references. But we had relatively little to say about the genetic evidence for introgression in human evolution, because the key paper from Bruce Lahn's lab (Evans et al. 2006) had not yet been published.

We have included some of that evidence in our current review. It is a shorter, more compact paper than last year's. That means that it leaves out a number of details, but it allows us to bring the molecular evidence and population genetic theory together.

In that form, it is possible to discuss some of the interesting predictions we might make about Neandertal-human population dynamics. For instance, why are two of the candidate introgressive alleles related to the brain? Our final section, "What did archaics have to offer?" takes on this question:

Adaptive alleles from archaic humans present a paradox. We recognize archaic humans by their morphology, and their morphology has mostly disappeared. Therefore, if moderns still retain adaptive alleles from archaic humans, those alleles almost certainly were not correlated with traits that we recognize as archaic. Instead, they must be related to phenotypes that we cannot recognize easily in archaic human fossils.

This is a crucial fact. We already know that Neandertal anatomies disappeared. But what makes a "Neandertal" anatomical feature? Clearly, we recognize it precisely because it is rare today.

If we are going to look for introgressive alleles, we have to look outside of this acquisition bias. The brain is a promising area on this score -- we know little about its variation in fossil humans.

In the final section, we allowed ourselves some speculation about the dynamics of modern human origins and dispersal:

Cosmopolitan populations like modern humans are generally a threat to endemics, but this threat intensifies during range expansions and population growth. In endemic species, alleles that promote outbreeding can be selected merely because the cosmopolitan species is expanding, aiding the collapse of former reproductive boundaries. Certainly, the distinctive morphological adaptations of archaic humans lost some of their selective advantage with the increasing technical sophistication of the early Upper Paleolithic (35 000 - 15 000 years ago). This must especially have been true of populations like the Neanderthals, whose skeletal and muscular specializations required a high energy budget. In an adaptive context, Neanderthals and other archaic humans were like endangered endemics, suffering from relatively high mortality and high energetic costs. Possibly, the only remaining adaptive strategy for them was mixture with the more cosmopolitan modern humans.

This is of course speculative, but I think it is valuable because it attempts to place ancient human populations in the context of modern conservation biology.

We often read that various human groups were "endangered" at one time or another, including the Neanderthals. But I have not seen anyone take the next logical step, which is to discuss the ways that endangered species actually interact with their congeneric competitors. When the interaction between populations includes interbreeding, interesting dynamics may emerge.

Does this mean that humans and Neandertals were distinct species who intermixed by hybridization?

I wrote about that question last year, concluding:

There will never be any tidy solution to the species problem, because all species have unique evolutionary histories and constraints. Given these difficulties, the species status of archaic Homo populations is basically an intractable problem. That is, I am happy to suggest that archaic Homo populations correspond to classical subspecies, and as far as I know, no evidence strongly contradicts that position. But I can recognize that some people will never agree with this assignment. And from the perspective of their evolution, it just doesn't matter. Evolutionarily important gene flow occurs between mammal species, subspecies, and populations.

The last sentence is the most important point. Those who want to put Neandertals into a distinct species (Homo neanderthalensis) generally believe that there was no evolutionarily significant gene flow between them and modern humans. But the opportunity for evolutionarily significant gene flow is always there, irrespective of whether the populations are species, subspecies, or even genera. Remember Bos-Bison introgression.

You can't simply define the problem of Neandertal-modern interactions away by giving them different names. And in reference to the "paradox" pointed out above, there is no defining the problem away by pointing to morphological differences. If we recognize that Neandertals are hominids, that is quite enough to suggest that gene flow was possible between them and their contemporaries.

The only thing left is to quantify the amount. The genetic observations thus far suggest that a predominant fraction of the gene pool of living humans descends from a relatively homogeneous ancient population. Since Late Pleistocene humans were geographically differentiated, this means that one ancient population disproportionately expanded at the expense of others. Genetic comparisons allow us to infer that the expanding population was initially African. This expanding population received introgressive alleles, both from other African populations and from Eurasian ones.

But that was not the end of the story. Introgressive alleles succeeded or failed on the basis of selection on them. The expanding population continued to grow in numbers. And the stage may have been set for something even more interesting...

More reading

"Why introgression?" discusses why introgression is a useful concept, compared to the simpler "gene flow."

"Introgression and microcephalin FAQ" addressed the MCPH1 genealogy.

"Neandertal introgression, anatomically" reviewed the paper by Soficaru et al. (2006) on the Pestera Muierii skull.

"The inevitability of introgression" announced and gave some details from our 2006 paper.

References:

Evans PD, Mekel-Bobrov N, Vallender EJ, Hudson RR, Lahn BT. 2006. Evidence that the adaptive allele of the brain size gene microcephalin introgressed into Homo sapiens from an archaic Homo lineage. Proc Nat Acad Sci 103:18178-18183. doi:10.1073/pnas.0606966103

Hawks J, Cochran G. 2006. Dynamics of adaptive introgression from archaic to modern humans. PaleoAnthropology 2006:101-115. Open access

Hawks J, Cochran G, Harpending HC, Lahn BT. 2007. A genetic legacy from archaic Homo. Trends Genet (early online) doi:10.1016/j.tig.2007.10.003

The "flame-haired" Neandertals

NEANDERTHAL:LEARNTHEDNA

What better to match an Irish tongue, than red hair? In fact, with language and red hair Neandertals would seem to be excellent candidates for a little scheme I have copying the encyclopedia...

Botticelli Venus as Neandertal

Well, that's the story, anyway. How true is it? It's been two weeks since the headlines, I've had a chance to reflect, and there's more news coming. Time to get this off my desk -- so let's unleash another Hawks FAQ:

The amazing talking Neandertals

This week, Johannes Krause and colleagues from the Max Planck Evolutionary Anthropology institute announced that they had tickled FoxP2 out of two Neandertal specimens from El Sidrón, Spain. The bones were excavated in sterile (clean-cave?) conditions, immediately frozen and then shipped to Leipzig, where extracts were taken in clean-room conditions.

Here's an FAQ about what they found.

Why is this paper really important?

Isn't it obvious? It's important because it demonstrates that more than one Neandertal is suitable for nuclear genome recovery. We will know about genetic variation in Neandertals, sooner rather than later. These two bones come from different individuals, because the Leipzig group found two different mtDNA sequences in them. Together with the Vindija Vi 33.16 specimen in the original Neandertal genome papers, this makes three nuclear genome Neandertals. There will be more.

It also shows the possibility of probing ancient skeletons for specific genes. Here, they went in looking for Y-DNA, X-DNA and particular sites on FoxP2, and they found them. That is definitely the way to go if you want to test a biologically significant hypothesis fast -- otherwise, you just have to wait until the sequence comes up in your genome project.

However, I question the value of probing for individual genetic variants in this way. Every probe takes a bit of sample, which might be more efficiently used in whole-genome sequencing. We have 25,000 genes, and every one is potentially interesting. Every small sample used to assay only one of those genes may destroy many sequences from the others. It would be one thing if samples were trivial and easily replaced, but they obviously aren't.

Still, we will certainly see additional probes for genes that are of particular interest. I wouldn't be surprised to see MC1R results soon, to probe whether there were pigmentation variants in Neandertals. The same has already been done for woolly mammoths.

So, Neandertals had the human-specific FoxP2 form. Did they talk?

I think the genetic observation leans toward that direction, but doesn't really change our understanding. Consider:

Neandertals have a hyoid bone with humanlike anatomy, as did the Atapuerca people at more than 300,000 years ago, even though A. afarensis did not. So something related to vocalization evolved in humans by the Middle Pleistocene. Although Neandertal vocal tracts may not have been identical to recent humans, there is nothing about them that would preclude speech. The only paleoneurological observation about language puts a developed Broca's area on the KNM-ER 1470 endocast, Homo habilis.

Like other Middle Paleolithic/MSA people, their technology required more information to learn than earlier, Lower Paleolithic industries, leading to regional differentiation and more task-specific facies. Late Neandertals made use of some technology otherwise used only by Upper Paleolithic modern humans. Their hunting methods must have required cooperation and may have been impossible without a more sophisticated communication strategy than used by other primates.

All of these things argue for some kind of Neandertal language irrespective of FoxP2.

Then again, most of the arguments against humanlike language facility in Neandertals also have nothing to do with FoxP2, either. The slow technological progress, limited collection strategies, the rarity of any artistic or symbolic expression, their high mortality rate, and -- of course -- the fact that they no longer exist have all been considered as evidence that Neandertals lacked some essential aspect of "behavioral modernity." If language is a prerequisite for the modern human pattern of behavior, then Neandertals may not have talked, at least not in the way we do.

I think the FoxP2 story has really confused people much more than necessary. But in this case, the confusion is the same that results from every other gene study: when the press says we've found a gene "for" something, what it ought to say is that we've found an allele that affects something.

No macromutation happened. Language did not spring full-formed into the mind of some ancient African. All members of Homo used communication systems including some (possibly minimal) elements of language, and the evolution of the human brain, along with technological changes throughout the Paleolithic, reflect the evolution of communication. Human language evolved -- like all things -- over a long time, and like all complex phenotypes it required a series of mutational changes. Many of these mutations became fixed during recent human evolution, some may still be changing in frequency today. Language evolution is probably a continuing process.

That means that it must have involved many more genes than FoxP2 -- which after all experienced only two amino acid substitutions in all of human evolution. I would imagine the number of genes involved in language evolution is more than 500, and I wouldn't be surprised if it were much more. In that context, it seems quite silly to say FoxP2 is the "critical" evolutionary change for anything.

Then you agree with Language Log. They told me that FoxP2 isn't a "language gene."

The case is strong that the two FoxP2 coding substitutions in humans were selected because of their role in language. The gene sequence is strongly conserved in most mammals, and shows similar changes in some other species with unusual vocal adaptations, such as echolocating bats (Li et al. 2007). Its expression pattern delineates areas related to vocalizations in both humans and birds, and the pattern itself differentiates between song-learning versus nonlearning bird species (Haesler et al. 2004, Teramitsu et al. 2004, Webb and Zhang 2005). And of course, mutations to FoxP2 can result in specific language impairment (SLI) in humans.

Still, that case is only circumstantial. We know that FoxP2 was under selection, that it became fixed in humans, probably during the Late Pleistocene, and that breaking the gene changes brain development and damages language skills. But we don't know what a human would be like with the chimpanzee form of the protein. We don't know whether both of the human-specific amino acid substitutions have a different effect than one. Most important, we don't know what other genetic changes may have been necessary backgrounds for selection on FoxP2.

This means Neandertals were really modern humans, right?

This study should put an end to the "sudden mutation" model of modern human origins.

There was not a single mutation that made the critical difference in the ancestry of today's people. There was no cognitive Rubicon leading to modern human evolution. I would analogize the process as a slow-motion avalanche: at first a few small sands began to tumble, and then selection on a large number of genes became inevitable. FoxP2 is one of those genes, and as yet we don't know whether it was near the beginning or near the end of the process.

But it is clear that the process began before the Neandertals were gone. Some aspects of behavioral complexity did begin to evolve rapidly sometime after 70,000 years ago. This rapid evolution was multiregional in context -- it was not limited to a single human population. In particular, it was not limited to Africans: the last Neandertals clearly manifested technological and behavioral strategies otherwise defined as "behaviorally modern" (d'Errico 2003). There's a reason why the Neandertal-produced Châtelperronian industry of France and Spain was historically considered the first Upper Paleolithic industry.

But we have undergone light-years of change since the last Neandertals lived. This is not a question of "modern human origins" anymore. We can now show that living people are much more different from early modern humans than any differences between Neandertals and other contemporary peoples. I think that "modern humans" is on its way to obsolescence. What matters is the pattern of change across all populations. Possibly that pattern was initiated by changes in one region but the subsequent changes were so vast that the beginning point hardly matters.

We all know that the Neandertal genome is riddled with contamination from modern humans. Isn't the null hypothesis that we have a modern human sequence here because it is a modern human?

Well, as you know, I'm not all that convinced that contamination explains the interpretive discrepancies between last year's genome papers. But still, this study has done some things to address the problem of contamination.

It is notable that Green et al. (2006) found 25% modern human mtDNA in one of the El Sidrón bones: this shows that even "sterile" excavation, immediate freezing and extraction under clean-room conditions cannot exclude contamination. There is at the moment nothing more that can be done. We will always have the problem of a contamination fraction in ancient Neandertal skeletons. So we have to judge each study by the extent to which we can exclude contaminants with statistical analysis.

For this study, Krause et al. (2007) developed a test of nuclear DNA contamination: they identified seven gene variants that differ between the recovered Vindija Vi 33.16 nuclear genome and all known living humans. In other words, these are human-derived mutations that are absent from the only known Neandertal nuclear genome. Then, they probed the El Sidrón bones for these sites. They found only the ancestral form in their extracts of both bones -- presumably because no human contaminants were present in their samples.

That seems like a pretty good indication that the other sites in their sample represent the true gene variants of the ancient Neandertals. I wouldn't go so far as to say that contamination is ruled out, but it seems like these are good results.

Did FoxP2 introgress into Neandertals?

It sure looks that way to me. Let's consider the evidence:

FoxP2 recently fixed in humans. According to Enard et al. (2002:871):

Under a model of a randomly mating population of constant size, the most likely date since the fixation of the beneficial allele is 0, with approximate 95% confidence intervals of 0 and 120,000 years.

Now, Enard et al. (2002) noted that human populations have grown over time, and that they are not randomly mating, so that this date estimate might be too recent. Allowing for population growth since "10,000--100,000 years ago," they asserted that fixation of FoxP2 must have happened "during the last 200,000 years of human history." But this is not quite accurate. Unlike genetic drift, positive selection can and often does fix genes rapidly in a growing population. It simply doesn't matter that the human population has been rapidly growing: FoxP2 may still have just become fixed yesterday.

Last year, Green and colleagues (2006) considered that the Neandertal-modern population divergence time might have been quite recent, depending on the ancestral population size. According to the estimates of Wall and Kim (2007), the Green et al. data are consistent with a Neandertal-modern population divergence time as recent as 30,000 years ago. Of course, that date would predict substantial admixture between contemporary Neandertal and non-European populations -- they would have been exchanging genes up to the very lifetimes of the last Neandertals. According to those data there would be nothing surprising about Neandertals and living people sharing the human-derived FoxP2 allele. But as mentioned above, Wall and Kim (2007) used the recent divergence estimate as evidence that the Neandertal genome data from Green et al. must be contaminated.

So, if we cannot trust the data, then we have to fall back on some other estimate of the divergence date. Noonan and colleagues (2006) estimated a divergence date between Neandertals and modern populations between 170,000 and 570,000 years ago. If we accept that, then the confidence intervals of the Neandertal-human divergence and the FoxP2 selective sweep might barely overlap. Might. But I will note that a minimal overlap between the 95% confidence intervals of two point estimates does not mean that they are not significantly different. Only if the expected value of one estimate falls within the 95% confidence interval of the other do they fail to be significantly different. It is pretty unlikely that the most recent FoxP2 sweep is older than 170,000 years ago and the Neandertal-modern population split is as recent as 170,000 years.

That is, unless the "split" time reflects widespread genetic introgression.

The current paper (Krause et al. 2007) goes to some contortions to try to establish that the FoxP2 sweep could really have been older than 300,000 years ago (where they place the lower confidence limit on the N-M split):

The third scenario is that the selective sweep started before the divergence of the ancestral populations of Neandertals and modern humans around 300,000-400,000 years ago

Let me just say that I was surprised to read this explanation in a paper from this group. One of the main arguments they have been posing as a scientific value of the Neandertal genome sequencing is that conventional methods don't detect selection at 300,000-400,000 years ago. But here, they consider such an ancient mutation to be the most likely hypothesis. This seems like quite a shift just to avoid the unpleasant idea of Neandertal introgression. Ooooh -- can't have those Neandercooties!

In reality, there is no reason to think the fixation of FoxP2 happened as early as 300,000 years ago, and indeed the very high frequencies of the linked derived alleles (over 97% for six of the linked alleles) suggest strongly that the sweep probably happened within the last 100,000 years -- otherwise, subsquent genetic drift should have caused these linked derived alleles to show more dispersion in their current frequencies. The same features that make the inference of selection so strong at FoxP2 -- it is far (>286 kilobases) from the nearest gene and it includes many high-frequency derived alleles in addition to reduced polymorphism -- make it very unlikely that the selective sweep was ancient.

So, considering that the El Sidrón samples both share the human-derived amino acid substitutions on the same haplotype as modern humans, complete with all the high-frequency derived SNPs, it seems almost certain that the gene introgressed into Neandertals from modern humans.

Or, there's one other option. One of the El Sidrón bones includes a relatively rare (in humans) ancestral SNP allele at one of those linked sites where the derived allele is at very high frequency in humans. One explanation: the selected mutation arose in Neandertals and introgressed into other humans. That would explain why this Neandertal didn't have a SNP variant on its FoxP2 haplotype that later became very common in humans: Neandertals had the new FoxP2 first.

What about that Y chromosome thing?

The El Sidrón bones both tested positive for the Y chromosome site assayed in the study. That means they were both male (duh!). But more important, the Y chromosomes of both individuals lacked the human-specific derived mutation that the researchers tested for. Since all human males yet surveyed have this human-derived mutation, this means that a Y chromosome variant has fixed in modern humans that Neandertals did not have. Since the entire nonrecombining portion of the Y chromosome is completely linked, we can infer that the entire modern human Y chromosome has undergone at least one fixation not shared with the ancestors of these Neandertals.

Here's the text (from page 2):

Both Neandertals yielded products for Y chromosomal primer pairs, indicating that they were males. Strikingly, all 15 Y chromosomal products for the five assayed positions show the ancestral allele. This includes two polymorphisms that define the deepest split among current human Y chromosomes (Y2 and Y4, Figure S1) as well as two polymorphisms that cover less common African Y chromosomes (Y3 and Y5, Figure S1). These Y chromosome results must derive, then, either from Y chromosomes that fall outside the variation of modern humans or from the very rare African lineages not covered by the assay (Figure S1). For our purposes, this result shows that neither the maternally inherited mtDNA nor the paternally inherited Y chromosome shows evidence of gene flow from modern humans into Neandertals or of subsequent contamination of their mortal remains.

That's not such a big surprise. Already we knew that the fixation of the human Y chromosome was very recent -- probably within the last 70,000--100,000 years, and possibly even more recently. Every man on earth shares recent Y chromosome mutations that were completely absent in Middle Pleistocene humans. That is one radical recent evolutionary change.

The paper elsewhere suggests that this absence of the human-derived Y chromosome in Neandertals as evidence that they did not contribute other genes to us. I could not disagree more.

The very recent fixation of the Y chromosome in an expanding human population is extremely unlikely to have resulted from genetic drift. Drift does not eliminate rare variants as quickly in an expanding population. Instead, one or more Y chromosome mutations must have been positively selected, resulting in the fixation of the entire NRCY in recent humans.

In that context, the Neandertal result is quite expected: they had an earlier Y chromosome lacking one or more mutations later selected in the other ancestors of living people.

References:

Briggs AW, Stenzel U, Johnson PLF, Green RE, Kelso J, Prüfer K, Meyer M, Krause J, Ronan MT, Lachmann M, Pääbo S. 2007. Patterns of damage in genomic DNA sequences from a Neandertal. Proc Nat Acad Sci USA doi:10.1073/pnas.0704665104

d'Errico F. 2003. The invisible frontier. A multiple species model for the origin of behavioral modernity. Evol Anthropol 12:188-202. doi:10.1002/evan.10113

Green RE, Krause J, Ptak SE, Briggs AW, Ronan MT, Simons JF, Du L, Egholm M, Rothberg JM, Paunovic M, Pääbo S. 2006. Analysis of one million base pairs of Neanderthal DNA. Nature 444:330-336. doi:10.1038/nature05336

Haesler S, Wada K, Nshdejan A, Morrisey EE, Lints T, Jarvis ED, Scharff C. 2004. FoxP2 expression in avian vocal learners and non-learners. J Neurosci 24:3164-3175. doi:10.1523/JNEUROSCI.4369-03.2004

Krause J, Lalueza-Fox C, Orlando L, Enard W, Green RE, Burbano HA, Hublin J-J, Bertranpetit J, Hänni C, Fortea J, de la Rasilla M, Rosas A, Pääbo S. 2007. The derived FoxP2 variant of modern humans was shared with Neandertals. Curr Biol 17:1-5. doi:10.1016/j.cub.2007.10.008

Li G, Wang J, Rossiter SJ, Jones G, Zhang S. 2007. Accelerated FoxP2 Evolution in Echolocating Bats. PLoS ONE 2(9): e900. doi:10.1371/journal.pone.0000900

Noonan JP, Coop G, Kudaravalli S, Smith D, Krause J, Alessi J, Chen F, Platt D, Pääbo S, Pritchard JK, Rubin EM. 2006. Sequencing and analysis of Neanderthal genomic DNA. Science 314:1113-1118. doi:10.1126/science.1131412

Wall JD, Kim SK. 2007. Inconsistencies in Neanderthal genomic
DNA sequences. PLoS Genet 3:e175. doi:10.1371/journal.pgen.0030175.eor

So is it contamination, or what?

The abstract of the paper by Jeffrey Wall and Sung Kim is terse:

Two recently published papers describe nuclear DNA sequences that were obtained from the same Neanderthal fossil [1, 2]. Our reanalyses of the data from these studies show that they are not consistent with each other and point to serious problems with the data quality in one of the studies, possibly due to modern human DNA contaminants and/or a high rate of sequencing errors.

That is a pretty sharp contrast with the story earlier this summer that profiled the damage to the ancient sequences coming off the 454 platform. That study, by Adrian Briggs and colleagues, showed that most misincorporated nucleotides were near the ends of fragments, and promised that reliable sequences could be obtained by correcting for this damage.

The PBS science showdown

I like my science-show narrators disembodied. I especially like it if they sound like Liev Schreiber. Alec Baldwin, I don't mind, although I always feel like next he's going to tell me about the caterpillar drive.

So all three of the new pilots that PBS aired this month as candidates for the "new" science show basically turned me off. Two of them -- the Wired and Science Investigators shows -- are definitely tuned for a younger, hipper demographic. At least, a younger hipper demographic that wants to see young pseudoreporters wandering clumsily through labs, ponds, and other exotic locales. I'm not convinced there's any space between "young and hip" and "dorky" in the science-show universe.

The two of them beat 22nd Century pretty easily. Get this description from Scientific American's Nikhil Swaminathan:

Robinson co-hosts this show with a reincarnation of Aldous Huxley--I kid you not; though it's an actor playing the role, obviously--who's job it is "to remind us that all progress is not a step forward," and Orlanda Bell, an astral projection (half human, half machine), who presumably the show's producers have created to spread what will likely turn out to be half truths from the future. ("Almost all deadly diseases have been eradicated," being one of the first phrases out of her mouth.) Basically, it's like the dad from Punky Brewster bickering with a hologram that could have looked like anyone--so, obviously they chose Designing Women's Annie Potts--over whether we're all going to grow up to be cyborgs.

Yes it is that bad. It's like over-the-top Crossing Jordan bad. Maybe they should have had a seance and brought back Edgar Cayce instead.

I will say that Science Investigators probably had the best chance for my attention, since they went to 454 Life Sciences to talk about Neandertal DNA. But they definitely lost it when their discussion ran to how easy it would be to clone a Neandertal, if only it weren't so ethically wrong. As if!

Anyway, if you want to see more about the shows, Nikhil Swaminathan of Scientific American has been blogging them (that link goes to the last installment, which links to the other two). I basically have the same opinion, that Wired had the most potential of the bunch.

But if it were me, I would just kill this and ScienceNOW! and triple the amount of Nova. Or make more Scientific American Frontiers -- somehow that show makes the magazine format with short stories work in a way that these clunkers didn't manage. What's more, Alan Alda manages to be hip without being dorky.

The inevitability of introgression

I'd like to draw your attention to my new article on genetic introgression from archaic humans, written with Gregory Cochran. The article is in PaleoAnthropology, and is completely open access.

I can't say enough good things about this process and the value of having open access research results, which can be downloaded free anywhere on the planet.

A search for "introgression" here on the weblog will bring up a lot of relevant material, including the introgression and MCPH1 FAQ, a quick note about the importance of introgression in wild species, an opinion about why "introgression" doesn't imply "speciation", and the all-important Neandertal genome FAQ. I've been writing about the subject a lot, because we've been thinking about it a lot.

If you read nothing more, this is the most important quote (p. 104):

If the modern human population expanded at a rate of 1 percent per generation, then an introgressive allele with s = 0.01 (i.e., a 1 percent fitness advantage) would have a 95 percent probability of fixation in modern humans, with only 74 archaic-modern matings. For an allele with a 5 percent fitness advantage, the corresponding number of events would be only 24.

Here, I don't want to repeat all of what I've written already, but I want to jot down some of the reasons why our new paper is worth reading:

  1. The central point of the paper is exceedingly simple. Haldane demostrated in 1927 that the fixation probability of a single copy of a new adaptive allele is 2s. This means that if archaic humans had any alleles that would have been adaptive for modern humans, it would take only a very small amount of interbreeding for modern humans to pick up these alleles, with a near-100 percent likelihood.
  2. One may point out that if this simple genetic observation were accurate, then natural populations ought to display many examples of introgression. In fact, they do. We have laid out a very extensive review of instances of introgression among natural populations. We focused on cases where the introgressive gene had adaptive importance. This included a large number of instances of introgression from wild to domesticated species and vice-versa, which are well-known from breeding experiments. However, there have been a growing number of examples of adaptive introgression between different natural populations as well. The use of more nuclear markers has begun to uncover many, but importantly many species have adaptive introgression of mitochondrial DNA. Those European mice are not unique -- the phenomenon is widespread.
  3. The neatest example we drew upon was the extended phylogenetic history of cattle-bison introgression. It's too long to quote, but it may by itself be worth reading the paper. The geographic and ecological differentiation of cattle may be a strong parallel to the different Pleistocene populations of Homo.
  4. In case you think bovines are too weird to apply to hominids, we also review many domesticated mammals from Eurasia that have very strong east-west biogeographic differentiation with substantial introgression in recent times, in many cases involving two or more wild progenitor species. Ecological change -- including domestication -- appears to be the biggest factor underlying adaptive introgression in animals. One of the most important mechanisms in wild populations is the absorption of endemics by cosmopolitan species through introgressive hybridization. Both mechanisms may have driven modern human origins.
  5. The simple predictions for adaptive genes differ greatly from the predictions for neutral genes. We expect that introgression was centrally important for the evolution of adaptive features of modern humans, both within and outside of Africa. This does not conflict with the observation that the ancestry of a neutral locus is predominantly or even exclusively African. Indeed, our paper suggests that the ecological circumstances surrounding an African population dispersal may have strongly favored the introgression and subsequent redispersal of Eurasian alleles.
  6. One of the big reasons why our paper differs from earlier work is that we consider genetic effects rather than species definitions. There is a long literature on species concepts that -- to varying degrees -- discuss mammalian hybrids. I especially recommend work by Trent Holliday in a 2003 review of species concepts and a forthcoming book chapter, a long series of articles by Clifford Jolly (culminating in a 2001 review article, Darren Curnoe and Alan Thorne in a series of articles (e.g., 2001). Analogy with the systematics of other taxa will always be important in paleoanthropology, because we cannot observe the reproductive behavior of extinct hominids. All these studies and many others agree that some amount of interbreeding between regional populations of archaic humans would have occurred. In this context, the importance of introgression is now in the realm of direct quantification rather than analogy. It makes little difference whether hominids were more like baboons or more like some other model. Humans are the one primate species for which adaptive introgression is now most amply documented.

We briefly discuss in this paper several loci that demonstrate introgression in humans, but we have reserved a more extensive review for another forthcoming paper.

There is a lot of action on this front right now, because our knowledge of variation across the genome has become ripe for it. In short, with 25,000 genes to work with, there are unquestionably many that have drawn their adaptive nature in modern humans from some archaic population. It remains to be discovered just how many there are, and what proportion of them come from different archaic populations.

We think that this is one of the two major forces underlying the emergence of modern humans, and one that underlines the enormous evolutionary potential of our species. As we conclude:

The notion that a single small population of incipient modern humans had the perfect genetic combination for ultimate success seems quite improbable. Instead, the long coevolution of modern anatomy and behavior in contact with archaic humans, even as those archaic populations appeared to diminish, provided a rich source of adaptations for the expanding modern population. With current genomic techniques, we are beginning to find these archaic genes. We expect that they will prove central to the story of modern human origins.

References:

Hawks J, Cochran G. 2006. Dynamics of adaptive introgression from archaic to modern humans. PaleoAnthropology 2006:101-115. Free full text

Breeding nutritional Neanderwheat

On the topic of introgression, this article by Reuters' Will Dunham is a good illustration:

A team led by University of California at Davis researcher Jorge Dubcovsky identified a gene in wild wheat that raises the grain's nutritional content. The gene became nonfunctional for unknown reasons during humankind's domestication of wheat.
Writing in the journal Science on Thursday, the researchers said they used conventional breeding methods to bring the gene into cultivated wheat varieties, enhancing the protein, zinc and iron value in the grain. The wild plant involved is known as wild emmer wheat, an ancestor of some cultivated wheat.

Introgression between domesticated crops and their wild relatives is one of the most common ways to introduce desirable traits into agricultural production. For the most part -- even when they are classified as different species -- domesticated crops can be crossed with wild progenitors.

Emmer wheat is itself a tetraploid, presumed to be a hybrid of two wild diploid grasses. Tetraploidy (and other -ploidies) come to mind when talking about plant hybrids, because it often happens that new plant species originate from such crosses. There's nothing so exotic about emmer-domesticated wheat introgression, since wheat is also tetraploid. The ability to breed in characteristics is simple Mendelism:

"We didn't do it by genetic modification. The normal wheat crosses perfectly well with the wild wheat. So we just crossed it after normal breeding," Dubcovsky said.

Breeders can take the selection coefficient all the way up to 100 percent if they want, and so ensure the fixation of a desirable allele like this one. Natural popluations usually don't have this privilege -- and advantageous alleles are often lost, despite being favored by selection. A process carried out methodically by breeders reduces to a scattershot in nature. But it is an orderly scattershot in one way: the most strongly selected alleles have the highest chance of making it.

Neandertal genome FAQ

With the release of the initial two papers describing chromosomal DNA sequences from a Neandertal, I thought I would put together some frequently asked questions and answers to them. I actually have been frequently asked most of these questions this week -- mostly by journalists -- so I think this is a good list.

I'll be following up over the next few weeks with additional details, particularly as some of our own work moves forward. I've left some loose ends dangling here deliberately -- sometimes for the sake of brevity, in other cases because they await further developments.

UPDATE (11/17/2006): I'm editing through this, making changes here and there to make things clearer. So as this progresses, it won't be identical to the initial version, although changes will be minor.

There are two papers in two journals, by two different teams of people. What's the difference?

Both teams used samples from the same specimen, Vindija (Vi) 80 -- so in principle, they are sequencing the same genome. The difference between the two comes from their methods of sequencing the DNA.

The Rubin group (Noonan et al. 2006) is using a metagenomics method based on the creation of a clone library from the ancient DNA. To make a clone library, DNA from a sample is cut with a restriction enzyme, which cuts the DNA at every place that it displays the same short sequence (usually 4- or 6-bp sequences, such as "ATTA"). The short fragments of DNA are mixed together and bound to vectors that can be maintained and replicated in cells. This is the "cloning" process, and the "library" consists of all the short fragments, which (hopefully) overlap each other so that they can be reconstructed.

People have made libraries for a long time. For example, the entire mRNA complement in a given tissue type may be made into a library of complementary DNA (cDNA). Once the library is made, it can be probed with short, labeled DNA sequences to assess whether a given gene is expressed in that tissue type. Or contrariwise, after cDNA from the library is sequenced, it can be used to design probes to find where in the genome it came from.

The unique aspect of the metagenomic approach is that all DNA sequences from a sample will be included in the library, potentially seqeunced, and ultimately reconstructed with computers into separate genomes. Usually cloning is preceded by an amplification step (generally using PCR), which selects and amplifies DNA of particular interest for cloning. But metagenomic methods skip this amplification -- because they cannot predict in advance what they are looking for. One of the most important early applications of metagenomics has been to reconstruct the genomes of microbes that cannot be cultured. Even though these organisms are not amenable to keeping in laboratory colonies, their genomes can be reconstructed by sampling their environments -- for example, soil or pondwater.

Or fossils. For the Vindija 80 fossil, the extract includes only around 6 percent identifiable "primate" DNA sequences. Out of the roughly 20 percent that are identifiable at all, over half are microbial.

I suppose if you were interested in the long-term microbial decomposition of fossil bone, you could do your disseration on those. For the rest of us, the final step is to let the computer spit out the humanlike sequences, which are assumed to be the Neandertal DNA plus some proportion of human contamination.

In contrast, the 454 group (Green et al. 2006) used a method called bead-based emulsion PCR. That is a mouthful, so it bears some explanation (for which I'm paraphrasing material from Margulies 2005 and Ronaghi 2001).

The "polymerase chain reaction," or PCR, is a method of replicating many copies of a DNA sequence from a single template. Usually to do PCR, you design a "primer," which is a short sequence of DNA that causes the target sequence to be preferentially replicated by the DNA polymerase. With a number of heat cycles and sufficient primer, you end up with a whole lot of copies of just the piece of DNA that you want.

This is, of course, exactly why standard PCR is so problematic for ancient sequences. There, you can't get exactly what you want, because it is broken into tiny bits and damaged. You would be happy to get anything. But if you amplify everything together in one giant vat, then the less damaged sequences will be the ones that amplify preferentially, and these are going to be worthless to you because they all represent contaminants of various kinds, like microbial DNA or modern human sequences.

The 454 method attaches all the tiny bits of sequence to tiny beads and separates these beads into oil droplets within a water suspension. The oil droplets are the "emulsion" part, and by separating them in this way, the process can employ PCR while keeping all the tiny sequences seperate from each other. Because they are kept separate, one good sequence can't swamp out all the others in the solution. The PCR products all stick to the bead so that after they come out of the emulsion the copies of different sequences are still separate.

After PCR, the DNA is broken down into single strands, still attached to their beads, and the beads are deposited on a fiber-optic slide assembly. The slide has tiny wells that are optically connected to a light-sensing CCD, which is essential for the "pyrosequencing" step. Nucleotides flow across the slide and into these wells one after another (T, A, C, then G). When the DNA polymerase connects one of these nucleotides to the single-strand DNA in a well, it releases a molecule of pyrophosphate (PPi).

That's when the magic happens. The solution also contains luciferase -- the enzyme that makes fireflies glow. With some additional chemistry, the PPi gives a burst of energy to the luciferase, which then emits a spark of light. The CCD picks up the light, which is a signal that the nucleotide was incorporated into the sequence.

Since nucleotides are added only every few seconds, a clever person with a notebook could reconstruct the sequence of the DNA fragment in each well. The real trick is that the fiber-optic slide contains well over a million wells, all being sequenced simultaneously. As the CCD picks up the series of flashes from every cell, the system is tracking many megabases of DNA in every run.

At present, this is the fastest method of DNA sequencing on the planet. It can construct the complete genome of a microbe in a couple of hours.

If the 454 sequencing method is so much faster, then why would anybody ever want to build clone libraries?

The claim is that the library approach is superior as a way to probe for specific genetic loci. For instance, here's a passage from p. 1071 of the Pennisi article:

[Rubin] envisions several libraries, each from a different Neandertal. Researchers would pull out the same fragment from each library to compare with each other and with living people. A pilot project has already demonstrated probes that ferret out specific target sequences, so the team needn't analyze the billions of bases shared by Neandertals and living humans, or among different Neandertals. "We will be able to identify and confirm sequence changes in more than one Neandertal without having to sequence several Neandertals to completion," Rubin says. "Seeing the same change in multiple Neandertals will give us confidence that we got [the sequence] right.

This sounds similar to the study earlier this year that found Mc1r variants in different mammoths, but in fact that study used direct PCR rather than cloning (I suppose because they have a heck of a lot more mammoth tissue to work with!).

It's not obvious to me that this is really that much of an advantage. I mean, it's certainly true that we really want to sample some genes (like MCPH1) from several different Neandertal fossils. But I don't see any point to drilling into fossils for this purpose without also sequencing their full genomes.

Now, somebody will say, "Well, sequencing the full genome of every fossil is just too expensive. We can limit to work on just a few genes much more cheaply, and we can use the same samples later to sequence other genes, or whole genomes."

Personally, I don't see the rush. These fossils were in the ground for 40,000 years, and they're not going anywhere. If we can sequence whole genomes cheaply in 10 or 20 years, and additionally have better means of dealing with contamination, I don't see why we just shouldn't wait. Training graduate students in metagenomics is not a good enough reason to work on these rare fossils.

One may say that the same samples will be sufficient for later sequencing of whole genomes, or other genes, or Neandertal athlete's foot fungus, or whatever, but in my experience it somehow never works out that way. Somebody is always coming back to grind up, dissolve, or laser ablate more bone.

In fact, if I were looking to make the next advance in metagenomics, I would take some of that mammoth flesh, mix in some elephant blood, and find ways to resolve the parts of the resulting mix. That would be something.

Are you saying you are against destructive sampling of these fossils?

Not at all. In fact, I think that genomics gives the most compelling reason ever for grinding up more bones.

There is just a huge quantity of information from DNA sequences; far more than from the morphology -- especially for samples like bone fragments or isolated teeth.

Heck, if the devil came to me and said I could have the full genome sequence of every fossil if I would agree to their destruction, I think that would be a good bargain!

But it's pretty clear that we're not in that situation. We can have our cake and eat it too -- and the longer we wait, the cheaper and less destructive this is likely to be. And frankly, just one Neandertal genome is going to give us plenty to work on for a long time.

But then, I was trained as a fossil guy, and I'm used to working with a few bits and pieces. It gives me a natural advantage!

They say there's no significant evidence of interbreeding. Yet you told us last week that there is significant evidence of interbreeding. What gives?

A few years ago I gave a talk where I laid out what I saw as the problems interpreting nuclear DNA sequences from Neandertals. Now, this was long before we had any reasonable prospect of getting such sequences, so it was purely based on knowledge about human genetic variation. As I saw it then, there were two problems:

  1. Human mtDNA is really variable, with greater than 1 percent sequence divergence between people, and much higher in some places. In contrast, human nuclear DNA has less than one base pair in a thousand different between copies. To get a reasonable picture of variation among people, you need long nuclear sequences so that you will find polymorphisms. But ancient DNA is broken into short little sequences that are very difficult to reconstruct. With mtDNA, this is less of a problem because it is clonal and a person basically has one sequence in many copies. But most nuclear DNA (all autosomal DNA) exists in two, possibly different copies. So reconstructing long enough sequences to study polymorphisms is very difficult.
  2. The coalescence age of human mtDNA is only a couple hundred thousand years, so sampling ancient humans is sort of likely to result in sequences that lie outside this range of variation -- and with Neandertals, that is precisely what happened. But nuclear loci have coalescence ages on the order of 600,000 to 2 million years or older. With these dates, the diversity among living people must significantly predate any divergence of archaic humans for most nuclear genetic loci. This means that Neandertals ought to have shared a high proportion of polymorphisms that are still variable in humans. Since we can expect that Neandertals will not be very genetically divergent for these nuclear genes, compared to the genetic differences among living people, we can conclude that no gene is likely to tell us very much about the phylogenetic relationships of an ancient Neandertal with living people.

These two problems are still stumbling blocks for interpreting Neandertal sequences. But the research teams found a very clever way to circumvent them, by using genomics approaches instead of genetic approches.

If you've been scratching your head wondering exactly why "genomics" has a buzz, then this is a good example.

Because of projects like the HapMap and the chimpanzee genome project, we know a lot (not everything, but a lot) about human genetic polymorphisms and our genetic differences from chimpanzees. In fact, we have databases of human single nucleotide polymorphisms (SNPs), and human-chimpanzee comparisons. For each SNP, some humans have an ancestral nucleotide -- generally the one that chimpanzees have. Other humans have a derived nucleotide -- the one that appeared in some ancient human, and different from chimpanzees.

For the most part, derived SNP alleles are recent. A few of them are very old, and these tend to be found at high frequencies (because the person who originated them had lots of descendants in that time). But many more of them are recent, found in a relatively small number of people today, who descend from a common ancestor during the past couple hundred thousand years.

If Neandertals diverged from humans over 200,000 years ago, and they didn't interbreed after that time, then the Neandertal genome should have relatively few derived human SNPs. In contrast, if the two populations continued to interbreed after 200,000 years ago, they might share fairly many of these derived SNPs.

Hence, we have a potential test for Neandertal-human genetic interactions.

Noonan et al. (2006) looked for these derived SNPs and found very few of them. They concluded that there was no significant evidence of Neandertal-human interbreeding, although their statistical test couldn't rule out as much as 25 percent admixture (for reference, Plagnol and Wall 2006 estimated only 5 percent ancestry from all archaic humans, not only Neandertals).

Green et al. (2006) also looked for derived SNPs. They had a much bigger sample of DNA to work with, so they ought to have a stronger test. Here's what they wrote (p. 334):

Using the SNPs that overlap with our data from two large genome-wide data sets (HapMap, 786 SNPs and Perlegen, 318 SNPs), we find that the Neanderthal sample has the derived allele in 30% of all SNPs. This number is presumably an overestimate since the SNPs analysed were ascertained to be of high frequency in present-day humans and hence are more likely to be old. Nevertheless, this high level of derived alleles in the Neanderthal is incompatible with the simple population split model estimated in the previous section, given split times inferred from the fossil record. This may suggest gene flow between modern humans and Neanderthals. Given that the Neanderthal X chromosome shows a higher level of divergence than the autosomes (R.E.G., unpublished observation), gene flow may have occurred predominantly from modern human males into Neanderthals. More extensive sequencing of the Neanderthal genome is necessary to address this possibility.

If this observation holds (i.e., if it is not influenced by contamination, and the ascertainment function does indeed show this to be an excess of derived SNPs), then it is one of the strongest pieces of evidence for genetic intermixture of Neandertals and modern humans. Note that there are two avenues for this gene flow -- either from the ancient ancestors of modern humans into Neandertals, or out of Neandertals into early modern humans. I'm sure we will hear more about this when they have more sequence.

In the meantime, the other source of evidence about Neandertal-human genetic interaction is the genomic variation of living people. Last week's paper on MCPH1 (discussed here) is a good example of what that evidence looks like. The key feature is that if you troll through the genome, you begin to notice some loci with interesting genealogies. The interestingness is a combined signature of recent selection and ancient population structure.

Looking for genes like MCPH1 in the Neandertal genome is a no-brainer. We probably won't find a lot of them, because the Neandertals were a small subset of the ancient human population.

There is one further problem. We can recognize these interesting loci in living people because they lie on relatively long haplotypes with little recombination. The inference is that such an allele must have begun from a very low copy number around 30,000 years ago, presumably because it was introduced from some archaic population. But the SNPs that are presently linked to the selected site were probably polymorphic within the archaic population, not fixed on a long haplotype. Unless we know exactly which SNP is the selected site on a human allelic variant, we may have some trouble telling whether an archaic genome has the allele. And as I note below, a large proportion of SNPs are going to be missing from the draft Neandertal genome even when it reaches an average 1x coverage.

This just means that evidence from the genomics of living people and from the Neandertal genome won't mesh together seamlessly. There remains some complexity interpreting these relationships.

The divergence date of Neandertal and human sequences is estimated at around 520,000 years ago. What does that mean?

First, what it doesn't mean. It doesn't mean that the human and Neandertal populations diverged 520,000 years ago. I noted above that the estimate of the genetic divergence time comes from the proportion of chimpanzee-human differences for which the Neandertal shares the human allele. But of course, some living humans have the ancestral, chimpanzee-like allele for many polymorphisms, so this comparison of polymorphisms is not saying that Neandertals were like chimps. Instead, we are just disregarding the Neandertal-specific evolutionary events.

I'm sticking with the 520,000 year genetic divergence estimate from Green et al. (2006), instead of the older estimate from Noonan et al. (2006), because of the vastly larger sample in the Green paper. Still, most of the discussion does not hang too critically on the precise date; although the date changes the interpretation by degrees.

The real interesting observation is the Neanderal-human genome draft difference compared to the human-human difference. Here's a passage from p. 354 of Green et al. (2006):

We analysed the DNA sequences generated from a contemporary human using the same sequencing protocol as was used for the Neanderthal. Although ancient DNA is degraded and damaged, this comparison controls for many of the aspects of the analysis including sequencing and alignment methodology. In this case, 7.1% of the divergence along the human lineage is assigned to the time subsequent to the divergence of the two human sequences. The average divergence time between alleles within humans is thus 459,000 years with a 95% confidence interval between 419,000 and 498,000 years. As expected, this estimate of the average human diversity is less than the divergence seen between the human and the Neanderthal sequences, but constitutes a large fraction of it because much of the human sequence diversity is expected to predate the human-Neanderthal split. Neanderthal genetic differences to humans must therefore be interpreted within the context of human diversity.

They don't specify where this "contemporary human" was from. The draft human genome is a chimera made up of anonymous people from different populations. That means that wherever the "contemporary human" is from, it will be the same region as represented by some part of the draft genome, but not all. So the divergence between these two mystery sequences is likely to be greater than average within a single population, and less than average between different populations.

Keeping that in mind, the human-Neandertal difference is startlingly close to this human-human difference measurement. The Neandertal is only 10 percent more different from the draft human genome than these two human sequences are from each other.

It seems very likely that we will find pairs of living human populations where the average genetic divergence is older -- maybe much older -- than this human-Neandertal divergence. For instance, it seems almost certain that the great genetic variability among living African groups will exceed this human-Neandertal difference.

Some geneticists have noted that European and Asian populations seem to be a genetic "subset" of African populations, at least for many genetic loci. With these kinds of numbers, it looks like Neandertals may be a subset of living human diversity in the same sense. I've never much liked that formulation, because "subset" is never really an accurate description of the genetic relationships. But if the seat of living human diversity is Africa, adding Neandertals to the mix may not change that pattern at all.

As Green and colleagues note, most of the genetic divergence between humans and Neandertals, and between humans and other living humans, is actually much older than the divergence of these populations from each other.

At one limit (that is, assuming complete isolation of humans and Neandertals after some date), the population divergence time depends on the effective size of the population that was ancestral to living humans and Neandertals. It is basically not possible to obtain a good estimate of this ancestral effective population size from the current Neandertal data -- mainly because good estimates depend on heterogeneity in divergence times among loci, which we can't infer for the short Neandertal sequences.

Both papers assume that this ancestral effective population size was small -- even smaller than the long-term human effective population size of around 10,000 individuals. A smaller effective size for the human-Neandertal ancestral population is fairly unlikely, though, since it must have been distributed across large parts of Europe and Africa at a minimum. More likely, the effective size was close to 10,000, just as in humans, since the human effective size is inferred to have been that small over at least the past million years.

If you're reading the term "effective population size" for the first time, don't worry. It doesn't mean "population size", and it has mainly a technical genetic meaning. It is sort of important that the Neandertal sequence supports this particular effective size over the long term, but it will take another post to explain why.

As noted above, the populations may never have been isolated. The derived SNP evidence might suggest that there was never any population divergence, or at least no long period of complete isolation, between humans and Neandertals. We'll have to wait and see.

Why does this bone have such a low level of contamination compared to other Neandertals?

I should start by pointing out that "contamination" here means "modern human sequence". All fossil bones are loaded with exogenous DNA, like bacterial and fungal genomes that invaded after the animal died. From a certain point of view, those exogenous genes are contaminants -- we are generally not interested in their sequences, and sorting them out from the endogenous Neandertal DNA is a real nuisance. But because we have a reference genome from humans to compare with the sequences from the ancient bone, we can sort out these bacterial and other exogenous sequences. So although they do "contaminate" the bone, they don't distort our picture of the sequence.

The real problem is that there are contaminating sequences from recent humans in the ancient bones. These sequences come from excavators, anthropologists who studied the bones, museum personnel, graduate students who cleaned and prepared the bones for sequencing, other samples from the labs doing the work, and who knows where else.

I have been asked many times why they can't eliminate this contamination. For example, why can't they just clean the bone, or take samples from deep inside the bone, or take samples from deep inside of teeth, or use a clean room, yada yada yada.

The answer is that they do wash the bones, and they do eliminate the outer surface, and they do take samples from deep inside of bones, and they do work in a clean room, with ultraviolet lights and positive air pressure so that DNA can't get sucked into the room, and rubber gloves and bunny suits, and the whole nine yards. And the bones are still contaminated, deep inside them.

Now, you may imagine anthropologists picking their noses with the bones, and using them as chopsticks, and putting them up to their ears to hear them breathing, and all manner of other things. The truth is, I have no idea how the contamination gets in there, and neither does anybody else. It's just there, and apparently we can't avoid it.

The extraction team looked at lots of Neandertal specimens, with one question in mind: How much human contamination does this bone have? To answer this question, they amplified mtDNA sequences, and assessed what proportion of transcripts were Neandertal-like and what proportion were human-like. Vindija 80 stood out as having a very low proportion of human-like transcripts -- less than 2 percent. So they inferred that there was little contamination of the sample by recent human DNA, and are working under the assumption that the nuclear genome is contaminated in a similar low proportion.

As for why this particular bone has such low contamination, well, nobody really knows that either. Svante Pääbo speculates that it is because Vi 80 was originally identified as fauna and hasn't been handled much. He might well be right. Which would bring us back to the nose-picking chopstick bone theory, I suppose.

If Vindija 80 was put in a box with fauna, it can't be very diagnostic. This high preservation seems very unusual. How do they know it was a Neandertal?

The radiocarbon date is 38,310 +/- 2130, and they found very high preservation of a Neandertal-like mtDNA sequence. If you think that fails to answer the question, well...

How can they deal with the damage to ancient DNA sequences?

One of the things that has become