More on the mutation rate

I've received several questions over the last few weeks about human genome-wide mutation rates. Some people are noticing heterogeneity in mutation rate estimates among family trios (spurred by a recent paper from the 1000 Genomes Project) while others are asking about apparent contradictions between estimates from pedigree-based methods and those based on phylogenetic comparisons with other primates (see, for instance, Dienekes' discussion of the recent paper by Li and Durbin Li:Durbin:2011).

I wrote a very extensive and referenced post last fall about this issue, and I just want to bring it to people's attention: "What is the human mutation rate?"

The 1000 Genomes Project has adopted the low per-generation mutation rate that has been coming out of the family trio comparisons. This low rate is around 1.2e-8 per site per generation as opposed to the estimate of around 2.4e-8 per site per generation that was often used prior to last year. Several new or upcoming papers will use the lower rate as applied to comparisons in humans or other hominoids.

I'll just point out two conclusions I arrived at last fall:

  1. The 1000 Genomes comparisons are not very strong evidence in favor of a low rate. There is too much error in the sequences, and the means of filtering errors may affect the rate estimation. Much stronger evidence comes from pedigree-based comparisons of de novo Mendelian diseases, which encompass tens of thousands of mutational events instead of a few dozen. These also suggest a low rate -- in particular Michael Lynch's work from early 2010 Lynch:spectrum:2010. This work also demonstrates that different sequence contexts give rise to different effective rates of mutation.

  2. The higher rate based on phylogenetic comparisons was always based on circular reasoning. People applied a rate that would fit the observed sequence differences to some paleontological event. Logically, the fossil appearance of an extant lineage puts a minimum time on the divergence of that lineage from others; but geneticists typically assumed that this was the expected time of sequence divergence, not the minimum possible time of species divergence. These two dates may easily differ by a factor of two, given the quality of the hominoid fossil record. Sequence divergence must always precede speciation, and speciation must always precede the earliest fossil occurrence of a lineage. The paleontological dates were then often bootstrapped from estimated mutation rates. The famous "6 million year human-chimpanzee divergence" was always based on these faulty assumptions -- that we knew with exactitude the human-orang or human-macaque sequence divergence time, and that the sequence divergence time between humans and chimpanzees was identical to the speciation time of the two lineages.

I've had several conversations with people about this issue during the past year. Some of them take it very seriously, others don't. Myself, I see that the lower rate simplifies many problems with the fossil record and comparisons of archaic genomes, but creates some others. For this reason, I'm cautious about it.