A new Neandertal (and mammoth) genetics paper from Adrian Briggs and colleagues is bigger news than it might appear at first glance:
DNA sequences determined from ancient organisms have high error rates, primarily due to uracil bases created by cytosine deamination. We use synthetic oligonucleotides, as well as DNA extracted from mammoth and Neandertal remains, to show that treatment with uracil–DNA–glycosylase and endonuclease VIII removes uracil residues from ancient DNA and repairs most of the resulting abasic sites, leaving undamaged parts of the DNA fragments intact. Neandertal DNA sequences determined with this protocol have greatly increased accuracy. In addition, our results demonstrate that Neandertal DNA retains in vivo patterns of CpG methylation, potentially allowing future studies of gene inactivation and imprinting in ancient organisms.
I've bold-faced that last sentence because my mouth dropped open when I read it. Traces of epigenetic signals are still there in the degraded DNA of ancient Neandertals.
I haven't seen anybody else notice this paper yet. It's largely technical, describing the efficacy of particular treatment processes for increasing sequencing accuracy. And one of the characteristics of chemical diagenesis of the ancient DNA is that CpG sites and methylated bases create some complications.
We still don't know how to interpret methylation in the DNA of living people. So there's some limit to the utility of this observation. But it's still really cool.
Some excerpts from the paper follow. Here's a passage reminding us of the high quality of libraries coming from some specimens:
When analyzing Neandertal DNA sequences, contamination of experiments with contemporary human DNA is a potential problem (10,34). However, the level of such contamination in a Neandertal DNA library can be assessed by counting the ratio of Neandertal versus contaminant fragments at nucleotide positions where Neandertals diﬀer from all or almost all present-day humans (33). The mtDNA of this Neandertal carries 133 such diagnostic positions (23). The ‘no repair’ dataset yielded 139 mtDNA fragments that overlapped such positions; 138 carried the Neandertal base while one matched modern human mtDNA. The UDG/endoVIII treated dataset yielded 128 informative fragments, of which all were the Neandertal type. Thus, the mtDNA in all libraries was almost completely free of contamination by modern human mtDNA, even after treatment with UDG and endoVIII. Since the ratio of mitochondrial to nuclear DNA may diﬀer between the contaminating and the Neandertal DNA, this estimate is strictly applicable only to the mtDNA (33). However, the estimate of mtDNA contamination in these libraries is low enough that within even a few-fold variation in mtDNA:nuclear DNA ratios between the Neandertal and contaminating DNA, sequences aligning to the human nuclear genome will be predominantly of Neandertal origin (Briggs et al. 2009: 9).
A recurring theme in the paper is that very high numbers of resequencing will be necessary to improve the accuracy of ancient DNA. Probing for a particular genetic variant -- such as the "Neandertal diagnostic" mtDNA sites -- is less of a problem, because there is a low probability of a sequencing error at any particular site. But summed across thousands or millions of base pairs, the number of sequence errors can easily exceed the number of genuine differences between Neandertal and human genomes. For example:
Figure 6 shows that for mitochondrial sequences, overall error rate per base (Figure 6) was 2.20% for ‘no repair’ sequences; 0.40% for UDG/endoVIII sequences and 0.09% for multi-pass UDG/endoVIII-treated sequences. Thus, UDG/endoVIII treatment alone results in a 5.5-fold reduction in error rates while deep sequencing results in an additional 4.4-fold reduction. In combination this results in a 22-fold reduction in errors. In nuclear DNA, for which we removed CpG sites from the analysis due to the eﬀect of methylation described above, a similar pattern is observed (Figure 6) although the true error rate at these low levels cannot be accurately calculated due to genuine sequence diﬀerences between this Neandertal and the human reference.
A truly impressive low error rate in the best case -- which here involves sequencing many copies of the same sites, generating "deep coverage". But consider: the average sequence divergence between two humans is around 0.1 percent. With a total error rate of 0.09 percent, there would be nearly as many erroneous differences as real ones between an ancient genome and the modern reference. And sequencing two ancient genomes would double the errors, so that the apparent genetic differences would be roughly three times the actual value.
Even deeper coverage will be necessary, hopefully reducing error rates further. This may lead to a hypothesis-testing approach for sequence differences found in Neandertal genomes, as the false positives are sifted out of the data. It's going to take some creative population genetics to deal with this problem, as we try to analyze the data from these ancient specimens.
Briggs AW, Stenzel U, Meyer M, Krause J, Kircher M, Pääbo S. 2009. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res (advance) doi:10.1093/nar/gkp1163