mutation rate

A low human mutation rate may throw everything out of whack

Last week, a paper looking for the genetic causes of Miller syndrome reported the whole genomes of four members of a single family: two siblings with the disorder and their two parents without. The idea was that they would simply compare the affected and unaffected genomes. They would then find candidate loci that might account for Miller syndrome in the affected siblings. By exploiting some other sources of information, they found what they were looking for. Daniel MacArthur covered the story in his post, "Disease hunting with whole genome sequences: the good news, and the bad news".

I got interested in another aspect of the story. With whole-genome sequences of parents and offspring, it becomes possible to directly determine the rate of mutations in each generation. The paper by Roach and colleagues did just that -- they counted 28 in the 2.3 billion bases of sequence they included in their comparison. That makes a per-site mutation rate of 1.1 x 10-8 per generation.

Which is a pretty interesting number. You see, it's less than half what it ought to be:

[O]ur estimated human mutation rate is lower than previous estimates, the most widely cited of which is 2.5 x 10-8 per generation (10) based on three parameters: a human-chimpanzee nucleotide divergence per site (Kt) of 0.013, a species divergence time of five million years ago, and an ancestral effective population size of 10,000. More recent estimates indicate a nucleotide divergence of 0.012 (9), species divergence time between six and seven million years ago (11–15), and ancestral effective population size between 40,000 and 148,000 (16–19). With these parameter ranges and a generation length of 15 to 25 years, the mutation rate estimate is between 7.6 x 10-9 and 2.2 x 10-8 per generation, which is consistent with our intergenerational estimate of 1.1 x 10-8. Our estimate is within one standard deviation (SD) of an earlier estimate of 1.7 x 10-8 (SD: 9 x 10-9) based on 20 disease-causing loci (20). The rate we report is for autosomes, and should be several-fold lower than that of the Y chromosome, as in the male germline more cell divisions occur per generation. Though our rate differs approximately as expected from the recently reported estimate of 3.0 x 10-8 (95% CI: 8.9 x 10-9 – 7.0 x 10-8) for the Y chromosome, the error rates make this difference not significant (21).

You can see the obvious implication: If this mutation rate is accurate, then the average human-chimpanzee gene divergence has to be up around 11 million years ago. That can be accommodated with a 7-million-year-old species divergence only if we assume a very large ancestral population -- on the order of 50,000 or higher. Or, the ancestral effective size could be lower -- but that would make the species divergence substantially older -- 9 million years or more.

There is a second implication. Most studies of human genetic variation have assumed that 5-million-year-old human-chimpanzee divergence and the high associated rate of mutations. If the true rate is less than half that, then the coalescence times of human genes are more than double most estimates. That would include our estimates of human-Neandertal genetic differences.

Well, that's a fine pickle.

I'm not quite ready to believe the very low rate estimate. The analysis in this paper uncovered tens of thousands of false positives, and had to filter through those to arrive at 28 true mutations. The filtering involved resequencing all the positives to determine which were true and which were false, but maybe there's room in there for a substantial number of false negatives, too.

If this low estimate were true of the human-chimpanzee divergence, it would imply vastly higher ages for other primate divergences, or a much lower rate on the human lineage specifically. So that allows another check on the process.

But generally, I'll be looking at whole-genome family comparisons with great interest, because they will give us a much more precise understanding of the rate of mutations and recombinations across the genome.

References:

Roach JC and 14 others. 2010. Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing. Science (early online) doi:10.1126/science.1186802

Ancient penguin mtDNA and substitution rates

Here's an example of a really incomprehensible press release:

Ancient penguin DNA raises doubts about accuracy of genetic dating techniques

Penguins that died 44,000 years ago in Antarctica have provided extraordinary frozen DNA samples that challenge the accuracy of traditional genetic aging measurements, and suggest those approaches have been routinely underestimating the age of many specimens by 200 to 600 percent.

In other words, a biological specimen determined by traditional DNA testing to be 100,000 years old may actually be 200,000 to 600,000 years old, researchers suggest in a new report in Trends in Genetics, a professional journal.

You can see why I'm interested -- the Neandertal genetic samples are in the neighborhood of 44,000 years old, so if ancient DNA is saying something unusual about penguins, it might say something unusual about them, right? But what are they talking about here? Racemization? I mean, there are no "genetic dating techniques" for specimens! The rest of the release doesn't clarify matters very much, although it does say that the findings

may force a widespread re-examination of determinations about when one species split off from another, if that determination was based largely on genetic evidence

That sounds like an argument that penguin sequences didn't evolve at the rate one might estimate from a molecular clock based on penguin systematics. The quotes from the researchers involved do include the words "molecular clock", which is a good sign.

Well, enough of this, let's go straight to the research.

High mitogenomic evolutionary rates and time dependency

Using entire modern and ancient mitochondrial genomes of Adélie penguins (Pygoscelis adeliae) that are up to 44000 years old, we show that the rates of evolution of the mitochondrial genome are two to six times greater than those estimated from phylogenetic comparisons. Although the rate of evolution at constrained sites, including nonsynonymous positions and RNAs, varies more than twofold with time (between shallow and deep nodes), the rate of evolution at synonymous sites remains the same. The time-independent neutral evolutionary rates reported here would be useful for the study of recent evolutionary events.

Their sample includes 12 modern Adélie penguins and 8 ancient ones, two of which are from the maximum time interval, although some are only around 250 years old. Now, the age distribution of the rest is fairly important to their analysis, but I can't see it because it's hidden in a data supplement, and I'm reading this in a laundromat in Vienna with no internet access.

You see why I don't like these freaking online supplements? I'm in the middle of Europe and inconvenienced. Imagine if some penguin enthusiast in an underdeveloped country, with no subscription to the journal, got this paper in an e-mail attachment. They'd never be able to get a copy of the methods.

There are several problems estimating substitution rates with data like these penguin mitochondria. You really depend very strongly on neutral demographic history -- if there were big population movements or partial replacements among the penguins, the estimation of rate is totally confounded by these. The paper refers to prior work on mammoth ancient mtDNA:

A previous study on the mitochondrial genomes of the extinct mammoth also suggests that the rate based on internal calibrations (within mammoths) is ~1.6 times higher than that obtained using the external (i.e. mammoth–elephant) calibration.

...which raises a similar issue -- since the mammoths apparently did undergo a partial population replacement (or at least, an mtDNA replacement) across part of their range.

Also, you depend very strongly on the few most ancient specimens, because they sample the longest time interval. Which means, you need to know the date of these specimens with great accuracy and you need to place them accurately on the genealogy that connects the more recent specimens.

I think the biggest hangup is the genealogy. You can't assume that a 44,000-year-old penguin is a direct ancestor of any living mtDNA sequences. It's a relative, at some distance, possibly a member of an extant clade, possibly not. When we're talking about fossils that are 10s of thousands of years old, it becomes very likely that most of the branches connecting with living sequences will have coalesced into very few ancient branches, and it becomes progressively less likely that you will discover a representative of one of those actual ancestral branches. In other words there's an error intrinsic to the coalescent process that really can't be corrected by sampling more extant lineages.

In other words, you can't just convert sequence differences into substitution rates without a model involving some pretty strong assumptions.

The paper mentions two very well-known issues concerning the relationship of substitution rate, purifying selection, and saturation. Basically, deleterious mutations can hang around within a population for a while, so that a genetic sample from a living population will tend to over estimate the substitution rate. And long-term comparisons of distinct taxa may include so much time that multiple substitutions may have happened at the same site -- leading to an underestimate of substitution rate. These are the reasons, for example, why the number of mitochondrial mutations between mothers and their daughters is much higher than you would estimate from the number of differences between humans and chimpanzees.

What does this mean for the penguins? Or, more to the point, the Neandertals? Here's a short passage where the paper discusses the comparison:

By contrast, the synonymous substitution rate (0.054–0.073 s/s/My) estimated here is five to seven times higher than previous phylogenetic rate estimates [1–4] and significantly higher than those based on intra-specific comparisons within human (0.048–0.052 s/s/My) [14] and Neanderthal (0.036–0.042 s/s/My) [24] populations. These results clearly argue against the use of the classical 1% rate per lineage (or the ‘2% rule’ as it is commonly known) to study the evolution or genetics of individual species.

Well, the penguin rate may be significantly higher than the within species human rate estimate, but it's not very much higher -- a minimum of 0.054 compared to a maximum of 0.052. So I don't think there is anything to get very exercised about with respect to ancient human DNA or Neandertal DNA.

Unless you really are trying to use DNA like some sort of radiocarbon method. But that would be silly.

References:

Subramanian S, Denver DR, Millar CD, Heupink T, Aschrafi A, Emsile SD, Baroni C, Lambert DM. 2009. High mitogenomic evolutionary rates and time
dependency. Trends Genet 25:482-486. doi:10.1016/j.tig.2009.09.005

More on the X variation conundrum

Last winter I noted the contradiction between two papers that each attempted to explain variation on the X chromosome compared to the autosomes. They had come to opposite conclusions, based on discrepancies in their data. I noticed that they had used different methods of determining mutation rates for X chromosome loci:

So, for their current paper, Keinan and colleagues (2008) try to correct for the recent divergence of human and chimpanzee X chromosomes. Simple enough -- rescale all X chromosome mutation events by the some ratio proportional to the human-chimp divergence discrepancies. In this case, they attempt to rescale to the human-macaque divergence. Since that divergence happened in the Oligocene, the discrepancies among chromosomes should slight compared to the overall divergence. I'd feel better if they actually tested this idea.

Meanwhile, Mike Hammer and colleagues scaled X chromosome diversity to the human-orangutan divergence. They claimed that this gave the same results as the human-chimpanzee divergence. Which, if true, would obviously give a different outcome than the procedure followed by Keinan and colleagues, which was predicated on the idea that the human-chimpanzee X divergence is the wrong number to use.

I had sort of forgotten about this (which drove me crazy at the time), but another question led me to revisit it late this week. In the intervening time, I see that Carlos Bustamante and Sohini Ramachandran (2009) happened across the same explanation that I had offered:

It appears that the rest of the discrepancy is explained by different normalizations for background mutation rate differences between the X chromosome and autosomes (Hammer et al.10 used human-orangutan divergence and Keinan et al.9 used human-macaque divergence).

So you read it here first. Which I suppose means that I should submit letters to journals more often. I don't because it seems to me that all I'm doing is reading and trying to understand papers, which sometimes takes more work than it should. On the other hand, I wonder how many people are really putting much effort into their reading...

Meanwhile, Bustamante and Ramachandran add an additional explanation -- the different means of ascertainment, since Mike Hammer's group used resequencing to find variation, while Keinan and colleagues (2008) had used HapMap SNPs under a specific ascertainment model. They end their short piece by pointing out the value of further resequencing data:

In order to address continuing questions on the nature of sex-biased processes, full genome sequencing of large numbers of individuals sampled from diverse populations will be needed. The upcoming 1,000 Genomes Project (http://www.1000genomes.org/), for example, will provide orders of magnitude more data for these types of analyses. We share the enthusiasm of the population genetics community that this will bring the potential for resolving continuing questions regarding how human history and cultural practices have shaped global patterns of genomic diversity.

Ascertainment is a serious issue with the existing SNP data, because different SNPs were ascertained in different, non-commensurable ways. That's how I was led into reconsidering this issue this week, another set of data seem to have features that are partially explained by ascertainment, but partially not. It's hard to use existing data for some kinds of population genetics analysis, although others are less affected by ascertainment biases.

So the 1000 Genomes effort will make some kinds of analyses simpler to accomplish. I suppose if ascertainment becomes less of a problem, we may see people focus more effort into understanding non-genetic sources of information, too!

References:

Bustamante CD, Ramachandran S. 2009. Evaluating signatures of sex-specific processes in the human genome. Nat Genet 41:8-10. doi:10.1038/ng0109-8

Syndicate content