Complete Neandertal mitochondrial sequence, and selection on human (not Neandertal) mtDNA

In the current Cell, the Max-Planck group, in coordination with 454 Life Sciences, report the sequence of a complete Neandertal mtDNA. I'm out of town right now, so I'm writing fairly quickly, and I haven't seen any of the reporting. Keeping that in mind, I wanted to set out a few of the interesting things about the paper.

I've been waiting a long time for this sequence to come out. I know they've had the basic data for a long time, since the mtDNA copy number is very high, the 454 process kicks out a lot of mitochondrial sequence. The reward for the wait is that Green and colleagues have done a very careful job of comparative analysis, with some very interesting results.

If I leave something obvious out, please forgive me, since I'm just dashing this as quickly as I can.

Where we left off...

All previously reported sequences of Neandertal mtDNA have been fragments of the control region. The control region of the mtDNA (hypervariable regions I and II) is very helpful for working out phylogenetic relations among recent humans. True to its name, it varies a lot, and its high mutation rate allows a fine discrimination among lineages that have differentiated only within the recent past.

The high mutation rate of the hypervariable regions also means that closely related populations have accumulated many differences. That's very convenient for identifying Neandertal mtDNA, where only small fragments (up until recently) have been practical to obtain. A small fragment of the mtDNA control region is sufficient to assess whether a specimen is like other known Neandertal sequences or not. Up to now, this has been an important way of authenticating Neandertal DNA sequence results --- although it has the obvious drawback that it might falsely exclude some genuine sequences that really do look like the modern human form.

So far every Neandertal mtDNA sequence looks like a member of the same mtDNA clade. (More carefully, every specimen with good biological preservation that has produced DNA has yielded at least some mtDNA sequences that form a clade distinct from all recent humans. Others are presumed to be contamination -- which I have no reason to doubt.) No recent human -- out of the many thousands that have been sampled so far -- has produced a mtDNA control region sequence like any known Neandertal. The two populations, so far as we can tell, possessed distinct mtDNA clades.

Divergence time

A complete mtDNA sequence provides a lot of sites, which allows a more precise estimate of the divergence time between recent human and Neandertal mtDNA lineages. The paper reports this time as 660,000 years ago, with a confidence interval from 520,000 to 800,000 years ago. That range of dates substantially overlaps with the prior estimates of divergence time, and is a pretty good match to the initial estimate based on a single HVR1 sequence in 1997.

The availability of a complete sequence has also removed a remaining piece of ambiguity from earlier comparisons. Because the hypervariable regions are so variable, it has always been the case that comparisons of hundreds or thousands of recent humans have included some pairs of individuals who are really divergent in their control region sequences. The result: some people living today are more different from each other than Neandertals are from recent people.

Now, that particular fact is not meaningful in a cladistic sense. Neandertal sequences share derived mutations, as do recent humans. But the concept of a "range" of genetic divergence has confused comparisons. Comparing the control region alone, it may appear that Neandertals were not so very different from living humans, even if they have a few derived mutations that no longer exist. As long as some humans were also very different from each other, it remained possible that the tree had been wrongly reconstructed. An equally parsimonious tree (or even a more parsimonious one) might link the Neandertal clade with some modern human, even if not a recent European. When comparing humans to chimpanzees and more distantly related primates, the hypervariable regions are somewhat saturated with mutations, meaning that parallel mutations between different species are very common. This makes it even harder to reconstruct the tree of mtDNA relationships based on the hypervariable regions alone.

Comparing the complete mtDNA genomes of a Neandertal and many recent humans presents a very different picture. Humans are all more similar to each other, when comparing the complete mtDNA genome, than any human is to a Neandertal. And in fact the Neandertal sequence is three or more times as different, on average, from us as we are from each other. This change from the earlier picture is a purely statistical one: more sites, with a more regular mutation rate. But it makes a clearer picture, and one that supports the phylogenetic model more clearly.

Selection on COX2?

Even though the control region is so helpful for analysis of recent humans, and easy identification of Neandertals, it's only a small fragment of the complete mtDNA. The mitochondrial genome is inherited as a single unit, so different mutations on a single mtDNA are co-inherited with each other. That means that the diversity of the noncoding control region is shaped by both genetic drift (due to demography) and selection. The selection includes purifying selection on coding sites across the entire mtDNA genome, and the possibility of positive selection on one or more ancient mutations.

I believe that positive selection on mtDNA in ancient humans has a lot of indirect support (and I wrote as much here). To give a brief list:

  • Mitochondrial haplotypes in living humans correlate with functional variation in disease, longevity, and performance -- all areas that have undergone recent biological shifts in humans.
  • Some mtDNA haplotypes in humans appear to have been under recent positive selection, as indicated by their geographic distributions.
  • Some mtDNA haplotypes have vastly changed in frequencies within the past few thousand years, as evidenced by ancient DNA samples.
  • Nuclear genes involved in mitochondrial function have been under recent positive selection.
  • MtDNA from Neandertals is completely absent today, despite the other evidence for genetic survival of that population. This combination is very unlikely if mtDNA was neutral.

So I think that positive selection is not only a reasonable hypothesis, it is extremely likely. But that is not to say that it has been demonstrated. Others might say that my final reason, that positive selection can explain the apparent contradiction between mtDNA and other data (such as skeletal comparisons and apparent nuclear introgression), is a case of wishful thinking. They might argue that all this other evidence of Neandertal-modern gene flow is an illusion, and not a problem to be explained.

I don't think they're right, but in the spirit of honest advertising, that's what they think.

It would be unreasonable for me to expect that a Neandertal mtDNA genome would provide strong evidence of positive selection on the human lineage. Finding such evidence would require repeated selected substitutions, probably within a single gene. Otherwise there would never be statistical evidence of positive selection. The available tests for positive selection in a two-genome (or in this case, two-clade) comparison are very weak.

Only a single selected mutation would be sufficient to explain the complete replacement of Neandertal mtDNA by an advantageous modern human type. No test of selection is powerful enough to refute neutrality based on a single selected site in a comparison of two mtDNA genomes. And repeated selection on a single gene just doesn't seem as likely as one or a few instances of selection, potentially on many mtDNA coding regions.

So imagine my surprise, when reading this paper, when I discovered that they found repeated substitutions on a single mtDNA gene in the human lineage, and statistical evidence of positive selection!

The gene is cytochrome oxidase subunit 2 (COX2). Using the chimpanzee mtDNA sequence as an outgroup, there were 18 human-specific and 20 Neandertal-specific nonsynonymous coding substitutions. Out of the 18 human-specific substitutions, 4 were in COX2. Only three synonymous substitutions occurred in humans for this gene (the ratio 3:4 differs from the ratio for other mtDNA coding regions, 54:14). In contrast, Neandertals had no coding substitutions -- every difference between Neandertal and human sequences is inferred to have occurred in ancient humans. These data are unlikely unless COX2 was recurrently selected in ancient humans.

More evidence will be necessary to establish positive selection. The paper includes multiple comparisons of different genes, so a significant result for this one is necessarily weakened by the multiple-comparisons correction.

But in a very interesting part of the paper, the authors did a functional analysis of the human-specific changes in COX2. Functional analysis of coding sites has come a long way in the last few years. Last fall, we saw it applied to the Neandertal-specific mutation of the MC1R gene. It was the functional analysis that argued that the mutation likely resulted in a red hair phenotype. These functional analyses consider the position of a mutation within the protein sequence, the extent to which that part of the protein interacts with other proteins, and whether the coding changes are otherwise conserved in other species.

Here is the paper's conclusion about COX2:

Another interesting observation is that COX2 stands out among proteins encoded in the mitochondrial genome as having experienced four amino acid substitutions on the modern human mtDNA lineage. Further work is warranted to elucidate the functional consequences of these amino acid substitutions. However, all these substitutions are in regions of the protein that, based on the crystal structure, do not have any obvious function, and they are variable among primates. Hence, they may represent either minor adaptive advantages, perhaps of regulatory relevance, or have no signi?cant functional consequences for mitochondrial function. Unless other evidence for their importance becomes available, we see no need to invoke positive selection to account for the evolution of COX2 on the human lineage (Green et al. 2008:423).

To me, a very persuasive finding is that each of the four human-specific mutations of COX2 is also found in some other primate species. In other words, where humans differ from chimpanzees and Neandertals (and generally, gorillas and orangutans), humans are like baboons or macaques. The authors of the paper read this finding as evidence that the changes have little functional importance. But I see this as a suggestion that these substitutions are functionally salient. Different primates have different energetic and dietary constraints, and it should be no surprise if they exhibit functional convergences in mtDNA. Humans evolved four separate sites, within the last half-million years, to be similar to some cercopithecoids and different from most other hominoids. Neandertals exhibited no evolution in this gene. This makes sense under a hypothesis of mtDNA selection in accordance with functional requirements, which we have good reason to believe were different in humans and Neandertals.

But as the authors say, we need more evidence about the function of these genes. I think the comparative evidence now supports the hypothesis of selection very strongly, and is consistent with the pattern of evidence from the nuclear genome and from the anatomy of early Upper Paleolithic Europeans.


This paper advances our understanding of contamination within the Neandertal sequences. The authors acknowledge Wall and Kim's (2007) interpretation of a high contamination rate in the earlier reported nuclear genetic data off the 454 platform, and provide additional information to support a relatively high contamination rate:

Contamination with extant human DNA is the other dominant source of erroneous Neandertal sequences. Given the high coverage and the fact that the best estimate of the contamination rate here is 0.5% (with an upper 95% con?dence limit of 0.87%), we do not expect contamination to affect the mtDNA sequence assembly to any appreciable level. Under the assumption that the Neandertal mtDNA sequence is reliable, it is a useful tool for gauging contamination when sequencing the Neandertal nuclear genome. Previously, assays to determine contamination within Neandertal fossil extracts were limited to the HVRI, which carry few positions where extant humans differ from Neandertals. By contrast, the complete Neandertal mtDNA now offers 133 such positions. This enables a reliable estimation of mtDNA contamination by analyzing sequence reads from 454 libraries, rather than by PCR-based assays of the DNA extracts. For example, when we do this in a small preliminary data set initially published from this fossil (Green et al., 2006), 10 of 10 sequences are classi?ed as Neandertal. However, in further unpublished sequencing runs from that library, 8 out of 75 diagnostic sequences derive from extant human mtDNA, suggesting a contamination rate of ˜ 11% (CI = 4.7%20%). This is in agreement with the suggestion (Wall and Kim, 2007) that contamination occurred in that experiment. That library was constructed outside our cleanroom facility and before the introduction of the Neandertal-speci?c key, which is crucial for the detection of contamination by other 454 libraries, and was therefore not used for the subsequent Neandertal genome sequencing project (Briggs et al., 2007). However, with the help of the mtDNA presented here, such levels of contamination are now easily detectable from 454 sequencing runs (Green et al. 2008:424).

So the mtDNA from the same sequence library as the previously reported 1 Mb of Neandertal nuclear genome shows a high contamination rate. That's really disappointing, since it means we have no data to work with. We'll just have to wait.

OK, that's all I have time to post; more later...


Green RE and 24 others. 2008. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing. Cell 134:416-426. doi:10.1016/j.cell.2008.06.021