The Denisova genome FAQ

Today, a paper by David Reich and colleagues presents the nuclear genome of the Denisova pinky bone Reich:Denisova:2010. This is the second whole genome of an apparently extinct population of Pleistocene humans. This genome is nearly as distinct from Neanderthals as the draft Neanderthal genome is from living people.

Between the draft Denisova genome, the draft Neanderthal genome, and the genomes of living people, we now have a record of three human populations that share origins relatively early in the Pleistocene.The paper presents some population modeling that attempts to estimate the divergence times and levels of gene flow among these populations. I think as a first effort these models answer some questions definitively, but leave substantial room for elaboration and improvement. There are many clear mysteries, most notably whether any known fossil samples can be attributed to the population represented by the Denisova sequence.

The most significant finding in the paper is the demonstration that some living humans trace significant fraction of their ancestry to the population represented by the Denisova genome. As in the case of Neanderthals, different human populations show significantly different levels of similarity to the Denisova sequence. For Neanderthals, the similarities indicated between one and four percent Neanderthal ancestry for living people outside of Africa. In the case of the Denisova sequence, the greatest similarities are with living people in Melanesia in this paper, represented by genome samples from Papua New Guinea and Bougainville. The similarities are consistent with approximately 4% contribution of a Denisova-like population to the ancestry of these living Melanesians.

The paper estimates that together, the Denisova and Neanderthal-derived genes account for 8% of the ancestry of these living people.

I find that estimate stunning, it's a huge contribution into living populations by these ancient Pleistocene populations.

The paper additionally reports the mtDNA of a second individual from Denisova Cave, represented by an isolated third molar. This mitochondrial sequence is very similar to the sequence of the pinky bone, which I count as very important because it means there is potentially a population here. However, they do not report any nuclear genome results from this second individual.

Those are the basic headline results. As I often do, I've prepared a series of frequently asked questions about the paper. This one is very dense with information content, and that includes 90 pages of supplementary information. We'll be working through it carefully during the next few weeks. The most exciting part is that, like the Neanderthal genome, these data will be available for other researchers to study. My lab has been intensively going through the Neanderthal genome with several hypotheses in mind, and we are eager to start working with the Denisova sequence.

Could we have predicted this result?

There were pretty clear hints that something interesting may have been going on with the population structure in the ancestry of living people in Papua New Guinea. My graduate student, Aaron Sams, has been looking into the hypothesis of a deeper Pleistocene component of ancestry in this population for the last few months. Of course we had earlier this year the announcement from Keith Hunley and Jeff Long's group that microsatellite variation was consistent with an ancient Pleistocene structure to the ancestry of Melanesians.

Our notion here was that we could use ascertainment bias within the public sets of SNP data to look for deeper genealogical roots within some populations. Because most single nucleotide polymorphisms have been ascertained in Europeans, and secondarily within other populations represented in biomedical contexts or the HapMap chiefly Africans and East Asians there is a chance that a deep genealogical root in Melanesians might be obviously represented by a haplotype bearing all ancestral polymorphisms. That's not to say that the population is more ancestral than other populations, just that the unique derived variants in that population were not ascertained.

By targeting these regions with all-ancestral haplotypes, we began to make substantial progress identifying regions as candidates for a more ancient population structure in this part of the world. Pretty exciting stuff in the absence of an ancient human genome. But now the Denisova sequence gives us a very clear sign that such regions should be very widespread across the genome. Some of them are presently at high frequencies within samples of PNG genetic variation, so there is a good chance that some variants will turn out to be of adaptive importance in this population.

The point is, this result doesn't come from nowhere. It was clearly anticipated by analysis of the genetic variation within living Melanesians. It is perhaps a bit of a surprise that an ancient genome from southern Siberia would provide so many genealogical ties to this island population. That will require us to give some close consideration to the population structure of Pleistocene people as well as the migration history leading to the peopling of Oceania.

What is this tooth?

The paper identifies the tooth as an upper third molar, or possibly as a second molar. What we can say about it is that it's relatively large. In fact its length and breadth put it within a size range occupied by australopithecines and early Homo, both H. habilis and H. erectus. There are no distinctive morphological characters that would allow it to be assigned to any taxon.

What the paper doesn't point out is that there are Upper Paleolithic specimens that equal or exceed this tooth in size. For example, the measured length and breadth of an upper second molar from Oase, Romania, are larger than this specimen, and the third molar (in the crypt) of that specimen is yet larger. There is an Upper Paleolithic-associated molar from Turkey which is also exceedingly large.

I don't take that as a sign of relationship between this specimen and early Upper Paleolithic people -- even though these are some of the earliest. It is another sign of how non-diagnostic this tooth actually is. I would say that in the absence of genetic information, we'd be looking at these remains as likely early Upper Paleolithic people, and accentuating these similarities.

With the genome, there's a tendency to assume a completely opposite attitude -- that they must represent something separate and different from Upper Paleolithic people. That may be an overreaction -- the evidence of gene flow suggests the possibility of continued interaction among these Late Pleistocene groups.

What happened to the X-Woman?

I guess when they found a second individual, it was better to have a name for the group rather than the individual. Or maybe somebody didn't like the name X-Woman. As in, "I wonder what happened to the Oneders".

Anyway, the paper uses the term "Denisovans" for this ancient population. That implies a certain agnosticism about whether any particular kinds of fossil humans might belong to the same population as the two sequenced individuals.

How were the Denisovans related to Neandertals?

Remembering that the Neandertal draft genome contains a very high fraction of spurious unique changes, Reich and colleagues performed a similar series of statistical comparisons to those done by Green and colleagues in the Neandertal analysis. Most prominent is limiting the comparison to places where humans and chimpanzees are known to differ. By targeting these sites, the analysis cuts the rate of false positive changes to a manageable level.

I mention that because it is necessary to make sense of the direct quotes:

The Denisova genome diverged from the reference human genome 11.7% (CI: 11.412.0%) of the way back along the lineage to the human chimpanzee ancestor. For the Vindija Neanderthal, the divergence is 12.2% (CI: 11.912.5%). Thus, whereas the divergence of the Denisova mtDNA to present-day human mtDNAs is about twice as deep as that of Neanderthal mtDNA, the average divergence of the Denisova nuclear genome from present-day humans is similar to that of Neanderthals.

So the Denisova, Neandertal and human genomes are close to a trichotomy in terms of their average relationship. For any particular gene, of course, there may be sister pairings between any two of those three -- and in many cases, between Denisovans and some living humans to the exclusion of other living humans. This gives rise to several tricky statistical issues as we consider particular gene loci.

For the moment, we'll consider the genome-wide average. How similar are Denisovans and Neandertals? Reich and colleagues considered the subset of sites where two sequences (out of Denisova, Neandertal and human) share a derived SNP variant:

The number of sites where the Denisova individual and Neanderthal cluster to the exclusion of the Yoruba and chimpanzee is 46,362, compared with an average of 22,012 sites for the other two possible patterns (Yoruba and Denisova, or Yoruba and Neanderthal). This excess of sites where Denisova and Neanderthal cluster supports the view that the Denisova individual and Neanderthals share a common history since separating from the ancestors of modern humans (Supplementary Information section 6).

They share twice the number of derived variants compared to the human in their comparison. Denisovans and Neandertals shared substantial ancestry with each other. That may mean they emerged from a single population -- possibly the early Middle Pleistocene population of Eurasia. Or it may mean that they exchanged genes after they reached Eurasia.

Reich and colleagues address this issue further by comparing pieces of two Neandertal genomes with Denisova. The Mezmaiskaya specimen is represented by much less sequence than the Vindija draft genome but it is geographically intermediate between Croatia and Denisova. By including this specimen with the Neandertals, Reich and colleagues could do a statistical analogue of FST -- giving a way of examining the extent of genetic exchanges between the ancestors fo these Neandertals and Denisovans. They found that the Mezmaiskaya and Vindija specimens were much more likely to share alleles with each other than with the Denisova sequence. It's a striking statistic -- if you do the same comparison with living people, they're 10 percent or so more likely to share alleles with neighbors than with distant individuals; Neandertals were apparently 65 percent percent more likely to share alleles with each other than with Denisova. It's not an exact stand-in for FST, but it's nearby. This was a highly structured Pleistocene population.

Is the nuclear variation consistent with the mtDNA?

I wrote about the Denisova mtDNA sequence last spring ("The Denisova mtDNA sequence: The X-Woman"). The sequence is an outgroup to a clade including both humans and Neandertals, and appeared to branch from our ancestors roughly a million years ago. That appeared to be a very interesting date -- possibly consistent with Homo erectus, but too recent to reflect the first dispersal of Homo from Africa, more than 1.8 million years ago.

That mtDNA divergence date was not easily interpreted. As I pointed out at the time, it might have been consistent with incomplete lineage sorting in a single widespread human population -- maybe even the Neandertal population.

Reich and colleagues show that the mtDNA divergence between Denisova and the modern-Neandertal clade is deeper than expected given the nuclear genome genealogical divergence. They also show that the nuclear genomes of Neandertals and Denisovans are somewhat closer than either is to the majority ancestors of living people. They discuss two possible explanations.

One scenario a mixture of the Denisovans with a more ancient Pleistocene population, followed by introgression of a more ancient mtDNA clade into the Denisovans. This would assert an ancient structured population preceding the origin of Denisovans, presumably from one of the Middle Pleistocene populations of Africa or Eurasia.

A second scenario is incomplete lineage sorting, in which an earlier mtDNA divergence was captured by the Denisova and Neandertal populations at the time of their divergence and differentially lost from them.

Reich and colleagues show that both these scenarios may be consistent with evolution by genetic drift in these ancient groups, given some assumptions about their population sizes.

I think there are still some reasonable questions about the relative dates of divergence, but those can probably be answered by considering the full pattern of variation of genealogies across the genome. Additionally, there may be uncertainty about the mutation rates used in both the mtDNA and nuclear comparisons. That's one reason why I consider the population models here to be a first draft of the real history.

What are the archaeological associations?

The current paper is more clear about the site's dating and stratigraphy than the earlier, shorter paper by Krause and colleagues Krause:Denisova:2010. In the spring, it appeared that the pinky bone was associated with the Upper Paleolithic at the site. In the current paper, the authors explain the complexity of layer 11, which contains both Upper Paleolithic industry and these skeletal and dental remains:

The small size of both the phalanx and the tooth precludes direct radiocarbon dating. We instead dated seven bone fragments found close to the hominin remains in layer 11 in the east and south galleries. To ensure that they were associated with human occupation of the cave we chose bones that have evidence of human modification, including a rib with regular incisions and a bone projectile point blank generally associated with Upper Palaeolithic cultural assemblages. In the south gallery, where modified bones were not available, we used herbivore bones (Supplementary Information section 12).
Four of the seven dates are infinite dates older than 50,000 years BP (uncalibrated), whereas three are finite dates between 16,000 and 30,000 years BP (Supplementary Table 12.1). The rib with incisions and the projectile point blank are about 30,000 and 23,000 years BP, respectively. Together with three previous dates23 this shows that layer 11 contains cultural remains from at least two different time periods, one period older than 50,000 years BP and one more recent period. However, the stratigraphy is complicated by the discovery of a wedge- shaped area close to the area where the phalanx was found that is likely to be disturbed (Supplementary Information section 12). Hominin remains large enough to allow direct radiocarbon dates may even- tually be discovered in the cave, but a reasonable hypothesis is that the phalanx and molar belong to the older occupation.

So, no direct dates. By inference (of their weird-looking genetic sequences), the two skeletal individuals are likely to be older than the Upper Paleolithic, but the stratigraphy does not require this. There is a mixing of older and younger materials.

Adding to the problem, the finger bone has anomalously good preservation of DNA -- the authors point out in the first paragraph of the discussion:

The molecular preservation of the Denisova phalanx is exceptional in that the fraction of endogenous relative to microbial DNA is about 70%. By contrast, in all Neanderthal remains studied so far the relative abundance of endogenous DNA is below 5%, and typically below 1%. Furthermore, the average length of hominin DNA fragments in the Denisova phalanx is 58 base pairs (bp) (SL3003) and 74 bp (SL3004) in spite of the enzymatic treatment that removes uracil residues and decreases the average fragment size, whereas in most well-preserved Neanderthal samples it is 50 bp or smaller without this treatment. Thus, although many Neanderthals are preserved under conditions apparently similar to those in Denisova Cave, the Denisova phalanx is one of few bones found in temperate conditions that are as well preserved as many permafrost remains. It is not clear why this is.

They can rule out some explanations because the molar does not have the same exceptional preservation. At the moment, we can probably just chalk it up to good luck. But I think the issue is not irrelevant to the problem of dating. What is going on with this site? Very unusual.

Why Melanesians?

Denisova Cave is in southern Siberia. The hominin occupation of the cave appears to have been within the last 50,000 years. People reached Sahul sometime before 40,000 years ago. How in the world did these people come into contact?

The most plausible hypothesis is that the Denisovans represent a much larger and more widespread population across South and Southeast Asia. A population dispersing in the direction of island Southeast Asia would have encountered and mixed with this population. The dispersing population would have absorbed some adaptive genes, which would have increased in frequency thereby increasing the apparent genetic contribution of the indigenous Pleistocene population.

This leaves some unanswered questions.

  1. Who were these ancient people? Were they "Homo erectus"?

This would be my null hypothesis -- that we are looking at one site representing a widespread population across the eastern extent of Eurasia, including Sundaland, during the Middle Pleistocene. However, this scenario is not fully consistent with the population model presented by Reich and colleagues. In particular, they derive Denisovans and Neandertals from a single ancestral population that diverged from humans sometime during the last 500,000 years. That means that the type specimen of Homo erectus (roughly a million years old) cannot possibly have been part of the Denisovan population. Most of the fossil record of Homo erectus in Asia is too old to have been part of a Denisovan population.

  1. Why do the other populations of East and Southeast Asia not show clear signs of mixture with the Denisovans?

The statistics in the paper show a clear (and large) component of Denisovan ancestry in the PNG and Bougainville genomes, but no large component elsewhere in Asia. Reich and colleagues address this question briefly.

An interesting question is how widespread Denisovans were. A possibility is that they lived in large parts of East Asia at the time when Neanderthals were present in Europe and western Asia. One observation compatible with this possibility is that Denisovan relatives seem to have contributed genes to present-day Melanesians but not to present-day populations which currently live much closer to the Altai region such as Han Chinese or Mongolians (Table 1). Thus, they have at least at some point been present in an area where they interacted with the ancestors of Melanesians and this was presumably not in southern Siberia.

Probably the best explanation for the disproportionate impact of the gene flow into the ancestors of Melanesians is a kind of peninsula effect -- they encountered these people early, moved along through their population the furthest, and acquired a substantial signature by a combination of selection and "surfing" neutral alleles along with population expansion. We can assume, I think, that Melanesians are not unique. We do not have a substantial genetic representation of island Indonesia or Australia in these comparisons, I would expect they trend in the same direction. Also, Melanesian-derived genes make up a large component (upwards of 20 percent) of the nuclear genome of Polynesians today. This is a large population of people with Denisovan genes, in other words.

But why not China? Why not South Asia? These are extremely interesting questions. Were the Denisovans not present in China -- was there possibly yet another Pleistocene population there?

Why not call them "Homo erectus"?

Formally, we don't know whether the individuals represented by these genetic samples would have had the diagnostic features of Homo erectus. They don't live especially near the main samples of Homo erectus, and they lived long after the main samples of Homo erectus appear to have existed.

But worse, as I indicated above, there are serious inconsistencies between the fossil record and the population model presented by Reich and colleagues.

  1. "Homo erectus", as usually understood, occurred widely in Asia, including China and Java, and Africa during the span from 1.95 million to 750,000 years ago. In China and Java, fossils attributed to Homo erectus persisted until 200,000 years ago. There is no unequivocal fossil of Homo erectus after 200,000 years ago (including some not-yet-published redating). I'm obviously glossing many complexities in that description, but trying to pose the species in the broadest possible geographic and temporal range.

  2. Green and colleagues Green:draft:2010 derived Neandertals from a common ancestor with living Africans only 250,000-400,000 years ago. A model including the Denisova data is provided in the current paper. It has wider confidence limits and reports the answers in generations. If we assume 20-year generations, the current paper puts the emergence of a Neandertal-Denisova clade at between 190,000 and 520,000 years ago, and the divergence of the Neandertal and Denisova branches around 50,000-100,000 years later.

In other words, possibly sometime after the time of the last unequivocal H. erectus fossils, the Denisovan population was diverging from Neandertals. These events occurred more than a half million years after the Trinil individual -- type specimen of Homo erectus -- lived.

  1. Millions of living people have their ancestry in these Pleistocene populations. That tends to make their identification as different species somewhat problematic. Even if we could identify the Denisovan population with the fossil evidence of Homo erectus, maybe they don't merit that species-level distinction. Or maybe we should recognize two or more distinct populations within what we now call Homo erectus.

And before you splitters out there get excited -- these would not be the same two populations (H. ergaster and H. erectus) currently promoted by some paleoanthropologists. That issue is way too early to be consequential in the current context.

Some of these issues can be solved by altering the population model. For example, if we assume a slower mutation rate (consistent with comparisons between parents and offspring in living people), the estimated divergence times will be much higher, possibly consistent with a widespread population at the time of Zhoukoudian or Sangiran. It's not obvious that this would fully bring the genetics into accord with the fossil record, but it would eliminate many inconsistencies.

What drives you crazy about this?

Well, it's obviously very exciting, but I find it very difficult to talk about these Pleistocene populations without falling into bad habits.

Our common ancestry as humans goes back to the Early and Middle Pleistocene. The (now multiple) Neandertal genomes and the Denisova genome share genes with some people and not others because of this common ancestry.

In addition, some living people carry even more genes from Neandertals because they have an appreciable fraction of Neandertal ancestry. That makes it nonsensical to talk about "Neandertals and the ancestors of modern humans". Neandertals are among the ancestors of modern humans.

Just so with Denisova. It's nonsensical to talk about a three-way split between Neandertals, Denisova and modern humans. We can talk about a population model with a clade separating an ancestral Neandertal-Denisova population from contemporary Africans.

I have to remind myself again and again when I talk to people about these issues that "modern human ancestors" is not a group that excludes these Pleistocene people.

Once we put ourselves into the mode where we are referring to a population model, it is important to recognize the limitations of those models. For example, we cannot presently exclude many kinds of gene flow among these Pleistocene populations. We can understand some limits to the level of gene flow -- these populations were highly structured, it wasn't Pleistocene panmixia. But it is premature to talk about isolation without recognizing the limits of our ability to test these population models.

The difficulty with terminology tells us something very important. A large-scale reorganization of the science of human origins is upon us. The terms we are used to using will, many of them, become obsolete. Some now-obscure terms will become very important.

We might think the new terms are likely to be technological -- but I think that the technology is changing too fast for that. Most people won't need to learn the ins and outs of a particular sequencing platform, because in two years it will be obsolete.

No, much more important is our way of talking about the relations of biological and cultural evidence. What does an archaeological pattern mean, and how does it relate to biological connections between populations? How can we identify the genetic causes of skeletal and dental phenotypes? What is the importance of a morphological or phylogenetic species in the context of these clear signs of genetic intermixture?

Many of these are old questions. They are about to get new answers, addressed in a new way using new evidence.