Ancient penguin mtDNA and substitution rates

5 minute read

Here’s an example of a really incomprehensible press release:

Ancient penguin DNA raises doubts about accuracy of genetic dating techniques
Penguins that died 44,000 years ago in Antarctica have provided extraordinary frozen DNA samples that challenge the accuracy of traditional genetic aging measurements, and suggest those approaches have been routinely underestimating the age of many specimens by 200 to 600 percent.
In other words, a biological specimen determined by traditional DNA testing to be 100,000 years old may actually be 200,000 to 600,000 years old, researchers suggest in a new report in Trends in Genetics, a professional journal.

You can see why I’m interested – the Neandertal genetic samples are in the neighborhood of 44,000 years old, so if ancient DNA is saying something unusual about penguins, it might say something unusual about them, right? But what are they talking about here? Racemization? I mean, there are no “genetic dating techniques” for specimens! The rest of the release doesn’t clarify matters very much, although it does say that the findings

may force a widespread re-examination of determinations about when one species split off from another, if that determination was based largely on genetic evidence

That sounds like an argument that penguin sequences didn’t evolve at the rate one might estimate from a molecular clock based on penguin systematics. The quotes from the researchers involved do include the words “molecular clock”, which is a good sign.

Well, enough of this, let’s go straight to the research.

High mitogenomic evolutionary rates and time dependency
Using entire modern and ancient mitochondrial genomes of Adélie penguins (Pygoscelis adeliae) that are up to 44000 years old, we show that the rates of evolution of the mitochondrial genome are two to six times greater than those estimated from phylogenetic comparisons. Although the rate of evolution at constrained sites, including nonsynonymous positions and RNAs, varies more than twofold with time (between shallow and deep nodes), the rate of evolution at synonymous sites remains the same. The time-independent neutral evolutionary rates reported here would be useful for the study of recent evolutionary events.

Their sample includes 12 modern Adélie penguins and 8 ancient ones, two of which are from the maximum time interval, although some are only around 250 years old. Now, the age distribution of the rest is fairly important to their analysis, but I can’t see it because it’s hidden in a data supplement, and I’m reading this in a laundromat in Vienna with no internet access.

You see why I don’t like these freaking online supplements? I’m in the middle of Europe and inconvenienced. Imagine if some penguin enthusiast in an underdeveloped country, with no subscription to the journal, got this paper in an e-mail attachment. They’d never be able to get a copy of the methods.

There are several problems estimating substitution rates with data like these penguin mitochondria. You really depend very strongly on neutral demographic history – if there were big population movements or partial replacements among the penguins, the estimation of rate is totally confounded by these. The paper refers to prior work on mammoth ancient mtDNA:

A previous study on the mitochondrial genomes of the extinct mammoth also suggests that the rate based on internal calibrations (within mammoths) is ~1.6 times higher than that obtained using the external (i.e. mammothelephant) calibration.

…which raises a similar issue – since the mammoths apparently did undergo a partial population replacement (or at least, an mtDNA replacement) across part of their range.

Also, you depend very strongly on the few most ancient specimens, because they sample the longest time interval. Which means, you need to know the date of these specimens with great accuracy and you need to place them accurately on the genealogy that connects the more recent specimens.

I think the biggest hangup is the genealogy. You can’t assume that a 44,000-year-old penguin is a direct ancestor of any living mtDNA sequences. It’s a relative, at some distance, possibly a member of an extant clade, possibly not. When we’re talking about fossils that are 10s of thousands of years old, it becomes very likely that most of the branches connecting with living sequences will have coalesced into very few ancient branches, and it becomes progressively less likely that you will discover a representative of one of those actual ancestral branches. In other words there’s an error intrinsic to the coalescent process that really can’t be corrected by sampling more extant lineages.

In other words, you can’t just convert sequence differences into substitution rates without a model involving some pretty strong assumptions.

The paper mentions two very well-known issues concerning the relationship of substitution rate, purifying selection, and saturation. Basically, deleterious mutations can hang around within a population for a while, so that a genetic sample from a living population will tend to over estimate the substitution rate. And long-term comparisons of distinct taxa may include so much time that multiple substitutions may have happened at the same site – leading to an underestimate of substitution rate. These are the reasons, for example, why the number of mitochondrial mutations between mothers and their daughters is much higher than you would estimate from the number of differences between humans and chimpanzees.

What does this mean for the penguins? Or, more to the point, the Neandertals? Here’s a short passage where the paper discusses the comparison:

By contrast, the synonymous substitution rate (0.0540.073 s/s/My) estimated here is ?ve to seven times higher than previous phylogenetic rate estimates [14] and signi?cantly higher than those based on intra-speci?c comparisons within human (0.0480.052 s/s/My) [14] and Neanderthal (0.0360.042 s/s/My) [24] populations. These results clearly argue against the use of the classical 1% rate per lineage (or the 2% rule as it is commonly known) to study the evolution or genetics of individual species.

Well, the penguin rate may be significantly higher than the within species human rate estimate, but it’s not very much higher – a minimum of 0.054 compared to a maximum of 0.052. So I don’t think there is anything to get very exercised about with respect to ancient human DNA or Neandertal DNA.

Unless you really are trying to use DNA like some sort of radiocarbon method. But that would be silly.


Subramanian S, Denver DR, Millar CD, Heupink T, Aschrafi A, Emsile SD, Baroni C, Lambert DM. 2009. High mitogenomic evolutionary rates and time dependency. Trends Genet 25:482-486. doi:10.1016/j.tig.2009.09.005