The mtDNA sequence of Paglicci 23

Is there anything surprising about finding the Cambridge Reference Sequence in Paglicci 23?

UPDATE follows at the bottom. Original post:

I don’t think it’s surprising in the least. No European who has yet been sampled carries a mitochondrial DNA sequence that looks like any known Neandertal mtDNA sequence. The Neandertal sequences had to disappear from the European population sometime. The sequence diversity of modern mtDNA haplogroups in Europe suggests that several of them entered Europe during Upper Paleolithic times, although some entered later. Some Upper Paleolithic Europeans must have had sequences like some living Europeans, and the CRS is one of the most common.

Does that mean that all Upper Paleolithic Europeans carried sequences that are present in living Europeans? Now, this question cannot really be answered without an exhaustive sampling of Upper Paleolithic remains. The sampling of living Europeans has been pretty exhaustive, but not so complete as to rule out the presence of Neandertal mtDNA entirely.

But David Serre and colleagues (2004) did a fairly thorough sampling of early Upper Paleolithic European remains, finding that five preserved recoverable mtDNA sequences (out of 40 sampled), none of which had a Neandertal-like sequence. If we add in the two Paglicci specimens, that makes seven. That is enough to show that it’s unlikely that more than a quarter of Upper Paleolithic Europeans had Neandertal-like mtDNA sequences. Still, the mtDNA distribution may have changed under the influence of selection, so it’s not possible to say much more about the ancestry of Upper Paleolithic Europeans for other genetic loci. To know that, we’ll want to sample more genes.

Therein lies the problem. We have enough trouble telling whether Neandertal genomic sequence has been affected by contamination (see my earlier entry on that topic). Neandertals must share a substantial sequence similarity with humans, and most randomly-chosen 100-bp fragments will be identical to any randomly-chosen living person. The problem is even worse when we consider Upper Paleolithic and more recent people. It is inevitable that they will be identical to some living people, and the most common alleles for most loci then will often be the same as the most common alleles today.

The real debate is about how far we should be skeptical when sequences from ancient specimens look like sequences found in living people. This debate is important not because Upper Paleolithic people shouldn’t look like living people, but because Neandertals sometimes should. Sequence identity between Neandertal specimens and living people is expected for almost all the nuclear genome sequence. That means that it is impossible to authenticate Neandertal genomic fragments based on sequence alone, and we must resort to other characteristics of the fragments, such as length, proportion of base misincorporations or deamination, or intrasample polymorphism (more than two alleles is usually bad).

In this regard, I think that the current study is a helpful example. When we deal with any kind of evidence, we must establish its provenience. Forensic scientists call this “chain of custody” – that is, how do we know that the evidence is really the same thing that was originally found, instead of something caused by events along the way during analysis? If you’re doing a radiocarbon sample, you want to make sure that stray radioactivity hasn’t affected your sample. If you’re extracting DNA, you want to find the sequence of everyone who could be a possible source of contamination. So the fact that Caramelli and colleagues have done this is a great thing.

In fact, a 2006 paper authored by Maria Lourdes Sampietro and colleagues, including Caramelli, covered a much larger sample of Neolithic teeth and attempted to track down all cases of contamination by the excavation and laboratory teams. Here’s the abstract:

DNA contamination arising from the manipulation of ancient calcified tissue samples is a poorly understood, yet fundamental, problem that affects the reliability of ancient DNA (aDNA) studies. We have typed the mitochondrial DNA hypervariable region I of the only 6 people involved in the excavation, washing, and subsequent anthropological and genetic study of 23 Neolithic remains excavated from Granollers (Barcelona, Spain) and searched for their presence among the 572 clones generated during the aDNA analyses of teeth from these samples. Of the cloned sequences, 17.13% could be unambiguously identified as contaminants, with those derived from the people involved in the retrieval and washing of the remains present in higher frequencies than those of the anthropologist and genetic researchers. This finding confirms, for the first time, previous hypotheses that teeth samples are most susceptible to contamination at their initial excavation. More worrying, the cloned contaminant sequences exhibit substitutions that can be attributed to DNA damage after the contamination event, and we demonstrate that the level of such damage increases with time: contaminants that are >10 years old have approximately 5 times more damage than those that are recent. Furthermore, we demonstrate that in this data set, the damage rate of the old contaminant sequences is indistinguishable from that of the endogenous DNA sequences. As such, the commonly used argument that miscoding lesions observed among cloned aDNA sequences can be used to support data authenticity is misleading in scenarios where the presence of old contaminant sequences is possible. We argue therefore that the typing of those involved in the manipulation of the ancient human specimens is critical in order to ensure that generated results are accurate.

The second half of that is kind of scary. Post-excavation contamination rapidly converges on the damage pattern of genuine endogenous DNA. That means that patterns of damaged DNA will not authenticate a sequence from an ancient skeleton.

Yuck.

We have to find criteria that will let us assign confidence to ancient sequences. To some extent, a lab that has relatively recent skeleton or mummy can drown out contamination by grinding up more material. But some preservation contexts won’t allow that kind of sampling, and remains that are more and more ancient preserve less and less DNA.

Obviously for some samples it will be impossible to provide equivalent information. Bones from important sites like Krapina have been touched by hundreds of people over the years, many of whom are no longer available for DNA extraction. For newer sites, it might be possible to provide a list of all people who have contacted the bones, but it still might not be desirable.

Imagine a policy requiring credible researchers to put their DNA sequences on file, before they are allowed to touch a specimen. That would greatly restrict access to new finds, at least until DNA testing becomes much more routine than today. More access restrictions are the last thing we need.

And I still worry about other sources of contamination. When I was familiar with laboratories, the biggest contamination problems were not the people working on the DNA, but instead were the other samples being studied. Yes, yes, I know. Clean rooms, positive air pressure, the whole lot, all those things keep this kind of laboratory contamination. Well, except for one reason or another sometimes they don’t. Sometimes the contaminant is from the next room; sometimes it’s from another floor of the same building. It’s very hard to detect the source of this kind of contamination unless it’s some kind of exotic sequence. And when your result is the CRS, it’s hard to get a less exotic than that.

I am especially interested in this question because of my work on recent evolutionary changes. Our findings show that people have many new alleles that should have been very rare or absent even 5000 years ago. Sampling ancient skeletons is an obvious way to test this hypothesis of recent evolutionary change. Already, results from ancient skeletons have confirmed some cases of recent selection – for example, the absence of lactase persistence in Neolithic Germans (Burger et al. 2007).

With Neandertals we have the problems that stem from the use of difference as a means of authentication. If you throw out sequences that look similar to modern humans, then you have a biased estimate of population variation. With more recent people, difference may not exist, and changes in the pattern of variation may be subtle. Positively selected alleles are a rare case where we can bend the biases in our favor, since they are prominent cases where the common alleles today may have been rare or absent in the past.

UPDATE (2008/07/19): A reader writes:

In the post you commend the authors of the study for genotyping all people who have come into contact with the bone and its DNA extract. Their main line of evidence against contamination is that the sequence they recover is not any of the people who contacted the sample. However, in our experience this is a control that is nearly useless.
First, it is nearly impossible to know for sure that one has DNA samples for all people who really have come into contact with the samples. Similarly, it's not possible to know how much or how close contact must be in order to introduce a risk for contamination.
Second, in our experience, contamination introduced from laboratory reagents is also a great danger. Every enzyme, oligo, etc. can potentially introduce contamination. Extraction and PCR blanks, properly done, can control for contamination by lab reagents. However, the details of these controls are quite important to known (at least as important as knowing the sequences of the bone handlers) to evaluate the data presented.
In any case, I worry that the field of ancient DNA is sliding towards a criteria of authenticity that is not helpful. To imagine that one can round up, genotype, and then rule out all possible contamination suspects seems unreasonable.

This jibes with my experience, and the point about laboratory reagents is very well taken. Here’s what I wrote back:

You know, I almost wrote that post more critically, because my contact with labs has always included stories about these weird contamination sources. With one case, it was DNA in the next building that had come in due to a faulty ventilation system. In another lab, an early UP specimen looked like Polynesians. Well, they were working on Polynesians down the hall, but they were pretty sure that all their precautions would keep the ancient stuff completely separate.
There's no chance you can rule out everything, so why try to get all the people who touched it? Especially since that's going to be useless for most of what you can sample, which has been touched by dozens of people.
On the other hand, it may be worthwhile to get people who excavate and handle things to be more careful, and taking their sequences adds some seriousness. It may be more for show than genuine contamination control, no doubt. I suppose that's the main purpose of all these proposals about protocols -- mainly ways to prevent arguments between labs, about "you can trust my sequence, but who knows about those guys?"
Looking from the outside, I think the contamination problem may be no worse for ancient DNA than for other kinds of sequencing, and errors are creeping into databases all the time. The metagenomics scheme is a tremendous improvement, since the computers actually can keep track of this contamination and correct for it. So working on Neandertals is really more a question of getting the algorithm right. Probably, we should try to get people to stop doing extractions for single-gene probing, it's a waste of material, and the question of contamination is never going away. It's really a matter of time before we have all these sequences from specimens that preserve them, and we want to keep as much as possible against the chance of future improvements.

References:

Burger J, Kirchner M, Bramanti B, Haak W, Thomas MG. 2007. Absence of the lactase-persistence-associated allele in early Neolithic Europeans.

Caramelli D and 13 others. 2008. A 28,000 years old Cro-Magnon mtDNA sequence differs from all potentially contaminating modern sequences. PLoS ONE 3:e2700. doi:10.1371/journal.pone.0002700

Sampietro ML, Gilbert MTP, Lao O, Caramelli D, Lari M, Bertranpetit J, Lalueza-Fox C. 2006. Tracking down human contamination in ancient human teeth. Mol Biol Evol 23:1801-1807. doi:10.1093/molbev/msl047

Serre D and 8 others. 2004. No evidence of Neandertal mtDNA contribution to early modern humans. PLoS Biol 2:313-317. doi:10.1371/journal.pbio.0020057