john hawks weblog

paleoanthropology, genetics and evolution

mutation rate

  • A longer timescale for human evolution

    Fri, 2012-08-10 16:36 -- John Hawks
    Research authors: 
    Publication information: 

    In press in Proceedings of the National Academy of Sciences, USA

    Work status: 

    This manuscript has just been added to the open research queue. Until this status is updated, readers can assume that the manuscript is incomplete and essential parts are being added by one or more authors. It may be an extremely early draft upload awaiting editing and addition of citations, so reader beware.

    Abstract: 

    none

    During the last few years, the best estimates of the human single nucleotide mutation rate have been cut in half. Until recently, estimates of mutation rate have relied on counting substitutions between primate species and assuming that fossil relatives of living species can accurately pin dates onto phylogenetic branches. This procedure allows very precise estimates, but introduces systematic bias toward higher substitution rates and longer branch lengths because a new lineage can leave a fossil record only after its origin, never beforehand [1]. Now, widespread resequencing, initially of de novo Mendelian genetic disorders [2],[3] and later of whole genomes in parent-offspring trios [4], has allowed direct comparisons of parent and offspring genomes. The most commonly-used but now outdated estimate of the mutation rate was 2.4 times 10^-8 changes per nucleotide per generation [5]. Current estimates of the same value based on resequencing data are much lower, around 1.1-1.28 times 10^-8 [4], [3].

    Langergraber et al. (this issue) [6] seal one of the remaining holes in this emerging understanding, by providing the most accurate estimates of generation length yet possible for wild chimpanzees and gorillas. They determined the parentage of chimpanzees and gorillas in wild study populations, which in concert with field data on births allows an accurate measure of the mean generation length. Chimpanzees average more than 24 years per generation; gorillas more than 19, substantially longer than indicated by earlier life history assessments [7]. Long generations, with few genetic mutations in each, mean that the clock of genetic substitutions has ticked very slowly during the evolution of humans and apes.

    Breathing easier

    Some paleoanthropologists will welcome the new, slower mutation rate. For twenty years, they have been unearthing Late Miocene fossils that purport to represent the lineage leading to recent hominins. Candidates including the 7-million-year-old Sahelanthropus tchadensis, 6 million-year-old Orrorin tugenensis, and 5.5-million-year-old Ardipithecus kadabba vie for a place in our ancestry. Genetic comparisons once pegged the human-chimpanzee common ancestor as recently as 4 million years ago, pruning these fossil limbs out of our family tree [8]. As Langergraber et al. report, a slower rate places the human-chimpanzee common ancestor more than 7 and possibly as early as 13 million years ago, reopening the case for these and other fossils.

    A longer timescale has many other consequences. The 10.5-million-year-old Chororapithecus abyssinicus may really be an early member of the gorilla lineage, as its dental anatomy suggests [9]. For the orangutan lineage, the prospect of a much deeper genetic estimate of divergence illuminates the relation between phylogenetics and population genetics. Genetic divergence between two species is a function not only of the time that the species became isolated, but also of the genetic variation within their ancient common ancestral population. Whole-genome analysis of apes and humans has uncovered abundant evidence of complex population structure in the common ancestors of living species [10]. Hobolth et al. [11] assessed incomplete lineage sorting of orangutan similarity in human and chimpanzee genomes, showing that the ancestral population of the orangutans and African apes must have been large and diverse. A fast mutation rate and this complex ancient structure made the origin of the orangutan branch uncomfortably recent, only 9-13 million years ago, barely old enough to accommodate the earliest known orangutan-like fossil evidence, the 12.5-million-year-old Sivapithecus indicus. A slower mutation rate appears to be a better fit to both fossil evidence and the genetic structure of this ancient population.

    Beyond branches

    The genomes of the African apes and humans have opened a new way of studying population history. In addition to the cladistic relations among species plodding along phyletic branches, we can now test hypotheses about the diversity and structure of dynamic populations. We depend on accurate estimates of mutation and recombination to examine introgression, partial population replacement, continuing gene flow and changes in population size. A slower mutation rate demands that we revisit the population histories of humans and our close relatives. The histories of the present subspecies of chimpanzees may go back to nearly a million years ago. As Langergraber et al. show, the genetic differences between western and eastern gorillas may be 1.5 million years or older. A longer timescale shows that the present subspecies of primates have survived multiple episodes of climate change in tropical Africa, events should also have shaped human evolution in complementary ways. More interesting, the depth of gene genealogies in these primates may reflect ancient episodes of partial population replacement and introgression.

    Along these lines, genomes from Neandertals and from Denisova Cave [12],[13] demonstrate the complexity of human population history. Ten years ago, many scientists argued that the population history of living humans converged to a recent strong bottleneck in a single African population. Today we work to refine a richer and more complex model with multiple episodes of dispersal, genetic differentiation and introgression. So much is left for us to discover as we are far from achieving the full potential of billions of base pairs of new data.

    A mere two years ago, genomic evidence from Neandertals suggested that they had originated within the last 270,000-440,000 years [12]. This troublesome date excludes specimens that have appeared to be strong candidates for Neandertal ancestors, including the large sample of skeletal remains from Sima de los Huesos, Atapuerca, Spain, possibly more than 530,000 years old. Now the maximum value for Neandertal-human common ancestry from 2010 seems instead closer to a minimum date. Langergraber et al. suggest a range from 420,000-780,000 years, bringing much of the Middle Pleistocene record of Europe into the scope of Neandertal ancestry.

    Moving out

    Across this same timescale, the archaic ancestors of today's Africans had already developed an intricate population structure. Genomic investigation of African hunter-gatherers has opened new windows onto this deep genetic history of differentiation and introgression [14], [15], bringing the origin of modern African diversity into the population structure of the early Middle Pleistocene. A simple hypothesis of modern human origins in a bottlenecked population cannot account for this diverse genetic history.

    The mitochondrial DNA timescale now poses a hanging question. Mitochondrial mutations occur much more often than nuclear DNA mutations, with greater heterogeneity among sites [16]. Still, our estimate of mtDNA substitution rates depends on our estimates of branch lengths of the primate phylogeny. Up to now, mitochondrial comparisons have been the strongest evidence in favor of a short timescale for the dispersal and differentiation of non-African peoples, within the last 70,000 years [17]. Some recent attempts to examine the relationships of non-African populations using nuclear genome data have led to timescales in excess of 100,000 years [18], others favor more recent estimates [19]. Despite the recency of this work, most authors have continued to use outdated fast molecular clock and short generation time estimates. As we move forward, such results will need to be corrected or adjusted to enable comparisons with current work.

    A common language

    It may seem surprising that such a basic parameter as the mutation rate could have been inaccurately estimated for so long. An accurate per-genome estimate of mutation rate depends on large amounts of sequence data, observed for a large number of parent-offspring pairs. Whole genome sequencing has become very widespread during the last two years, but low-coverage genomes have a high rate of false positive changes, which have delayed acceptance of the lower rate estimates. Stronger evidence about mutation rate comes from the even broader sample of parent-offspring trios from surveillance of de novo Mendelian diseases [3]. These values will be subject to continuing refinement, as geneticists add more and more primate and human genomes and closer examination of their biology.

    Sampling DNA from other primates effectively collates thousands of generations of time into a single comparison, allowing the substitution rate to be estimated from relatively short DNA sequences. For mutations not under selection, the substitution rate estimates the mutation rate very precisely.

    But precision is not accuracy. Radiometric ages are often very precise, and paleontologists can constrain the provenience of some Miocene primate fossils to ranges less than a hundred thousand years. Accuracy about the time of speciation would require evidence the fossil record can never provide. We cannot say how many orangutan ancestors may have lived before the 12.5-million-year-old Sivapithecus indicus; we can only hope to discover more of them. Fossils have limited value even as minimum estimators of speciation time. Steiper and Young [20] estimated a relatively slow rate of mutations in primates, by assuming that a series of fossils represent minimum ages for various phylogenetic branches of primates. Their slow rate estimate depended upon placing the 7-million-year-old Sahelanthropus tchadensis as a member of the hominin lineage, an assumption that has been challenged on morphological grounds [21]. This challenge could not necessitate a higher mutation rate, but could delay acceptance of a slower rate. A slow mutation rate does not settle the phylogenetic position of Sahelanthropus or other fossil specimens, it merely refocuses study upon anatomical and ecological evidence.

    Mutation rates estimated from pedigree and phylogenetic data may still prove to be significantly different, as they are for mitochondrial DNA [16]. The nuclear mutation rate varies among sites and regions (e.g., CpG nucleotides) [22], and discovery of functional elements will bring to light some amount of previously unrecognized purifying selection. The average mutation rate across the genome is only a starting point. Still, as genomes have begun to reveal the kind of complexity long evidenced by the fossil record, we can begin to seek a new anthropological synthesis that ties together genomes, morphology, and life history data.


    References

    1. Steiper ME, Young NM. Timing primate evolution: Lessons from the discordance between molecular and paleontological estimates. Evol. Anthropol. [Internet]. 2008;17:179–188. Available from: http://dx.doi.org/10.1002/evan.20177
    2. Kondrashov AS. Direct estimates of human per nucleotide mutation rates at 20 loci causing mendelian diseases. Hum. Mutat. [Internet]. 2003;21:12–27. Available from: http://dx.doi.org/10.1002/humu.10147
    3. Lynch M. Rate, molecular spectrum, and consequences of human mutation. Proceedings of the National Academy of Sciences [Internet]. 2010;107:961–968. Available from: http://dx.doi.org/10.1073/pnas.0912629107
    4. Roach JC, Glusman G, Smit AFA, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M, et al. Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing. Science [Internet]. 2010;328:636–639. Available from: http://dx.doi.org/10.1126/science.1186802
    5. Nachman MW, Crowell SL. Estimate of the Mutation Rate per Nucleotide in Humans. Genetics [Internet]. 2000;156:297–304. Available from: http://www.genetics.org/cgi/content/abstract/156/1/297
    6. Langergraber KE, Prüfer K, Rowney C, Boesch C, Crockford C, Fawcett K, Inoue E, Inoue-Muruyama M, Mitani JC, Muller MN, et al. Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution. Proceedings of the National Academy of Sciences of the United States of America. 2012.
    7. Teleki G, Hunt EE, Pfiffering JH. Demographic observations (1963–1973) on the chimpanzees of {Gombe} {National} {Park}, {Tanzania}. Journal of Human Evolution. 1976;5:559–598.
    8. Wildman DE, Uddin M, Liu G, Grossman LI, Goodman M. Implications of Natural Selection in Shaping 99.4% Nonsynonymous DNA Identity Between Humans and Chimpanzees: Enlarging Genus Homo. Proceedings of the National Academy of Sciences, U. S. A. [Internet]. 2003;100:7181–7188. Available from: http://dx.doi.org/10.1073/pnas.1232172100
    9. Suwa G, Kono RT, Katoh S, Asfaw B, Beyene Y. A New Species of Great Ape from the Late Miocene Epoch in Ethiopia. Nature [Internet]. 2007;448:921–924. Available from: http://dx.doi.org/10.1038/nature06113
    10. Siepel A. Phylogenomics of primates and their ancestral populations. Genome research. 2009;19(11):1929-41.
    11. Hobolth A, Dutheil JY, Hawks J, Schierup MH, Mailund T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome research. 2011;21(3):349-56.
    12. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, et al. A Draft Sequence of the Neandertal Genome. Science [Internet]. 2010;328:710–722. Available from: http://dx.doi.org/10.1126/science.1188021
    13. Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand EY, Viola B, Briggs AW, Stenzel U, Johnson PLF, et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature [Internet]. 2010;468:1053–1060. Available from: http://dx.doi.org/10.1038/nature09710
    14. Lachance J, Vernot B, Elbers  C, Ferwerda B, Froment A, Bodo J-M, Lema G, Fu W, Nyambo  B, Rebbeck  R, et al. Evolutionary History and Adaptation from High-Coverage Whole-Genome Sequences of Diverse African Hunter-Gatherers. Cell. 2012.
    15. Hammer MF, Woerner AE, Mendez FL, Watkins JC, Wall JD. Genetic evidence for archaic admixture in Africa. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(37):15123-15128.
    16. Soares P, Ermini L, Thomson N, Mormina M, Rito T, Röhl A, Salas A, Oppenheimer S, Macaulay V, Richards MB. Correcting for purifying selection: an improved human mitochondrial molecular clock. American journal of human genetics [Internet]. 2009;84:740–759. Available from: http://dx.doi.org/10.1016/j.ajhg.2009.05.001
    17. Endicott P, Ho SYW, Metspalu M, Stringer C. Evaluating the mitochondrial timescale of human evolution. Trends in Ecology & Evolution [Internet]. 2009;24:515–521. Available from: http://dx.doi.org/10.1016/j.tree.2009.04.006
    18. Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data. PLoS Genet [Internet]. 2009;5:e1000695+. Available from: http://dx.doi.org/10.1371/journal.pgen.1000695
    19. Lukic S, Hey J. Demographic Inference Using Spectral Methods on SNP Data, With an Analysis of the Human out-of-Africa Expansion. Genetics. 2012.
    20. Steiper ME, Young NM. Primate molecular divergence dates. Molecular Phylogenetics and Evolution [Internet]. 2006;41:384–394. Available from: http://dx.doi.org/10.1016/j.ympev.2006.05.021
    21. Wolpoff MH, Hawks J, Senut B, Pickford M, Ahern J. An Ape or the Ape: Is the Toumaï Cranium TM 266 a Hominid?. PaleoAnthropology. 2006;2006:36–50.
    22. Subramanian S, Kumar S. Neutral Substitutions Occur at a Faster Rate in Exons Than in Noncoding DNA in Primate Genomes. Genome Research [Internet]. 2003;13:838–844. Available from: http://dx.doi.org/10.1101/gr.1152803
  • Will a Jurassic placental mammal make the molecular clock make sense?

    Wed, 2011-08-24 17:19 -- John Hawks

    A new paper in Nature by Zhe-Xi Luo and colleagues [1] reports the discovery of a 160-million-year-old early mammal, Juramaia, which they attribute to the placental mammal lineage. The news aspect is that this extends the chronology of fossil placental and marsupial mammals (the sister clade of placentals) by some 40 million years. That's a big chunk of time, but it's a really nice fossil which seems pretty clear in its morphology.

    I'm reading this closely because of the effect on the interpretation of mutation rates and the molecular clock. Obviously, if the earliest evidence for placental mammals used to be 120 million years ago, and now it's 160, that should affect the way we approach the genetic divergence of mammal lineages. In particular, when it comes to primates, some modern lineages are represented by fossils relatively early in the Cenozoic, suggesting that the common ancestor of all the primates may have been much earlier, deep in the Cretaceous period. But there is no fossil evidence of that ancestor, and until recently molecular comparisons seemed to suggest a recent chronology with a common ancestor just before the Cretaceous-Tertiary (K-T) boundary. That is, until direct estimates of the human mutation rate started to suggest a much lower rate of mutations per generation than had previously been assumed.

    I've written about these issues several times, both with respect to hominins and other primates. For example my (unfinished) series from 2010:

    "Were there Cretaceous anthropoids? Part 1. The problem in a nutshell"

    "Were there Cretaceous anthropoids? Part 2: What is an anthropoid?"

    "Were there Cretaceous anthropoids? Part 3: Ghost lineages"

    And last month's "More on the mutation rate", pointing to my review from late last year, "What is the human mutation rate?" It's a key scientific problem right now, and genetic evidence may be approaching the point of a solution. Finding older and older fossils tends to confirm a lower rate of mutations, and a long chronology for the extant lineages.

    The current paper by Luo and colleagues addresses the molecular clock and suggests how a 160-million-year-old placental mammal may affect things:

    Timing of the divergence of marsupials and placentals is critical for calibrating the rates of evolution in therian mammals, especially for molecular evolutionary studies and comparative genomics 2, 10, 13. Previously, some molecular time estimates for marsupial and placental divergence postulated significantly older windows for this divergence than the then-oldest fossil records3, 7. However, these and other previous molecular estimates differed widely. Several were compatible with relatively young placental intraordinal divergences (for example, ref. 10), and just about all showed wide error margins (reviewed by ref. 13). Regarding the marsupial–placental split, recent molecular rate studies provided estimates of 147.7 ± 5.5 Myr (ref. 11), or 160 Myr (median) with a 95% highest posterior distribution of 143–178 Myr (ref. 12), or a window of 193–186 Myr (ref. 9). This new eutherian fossil age is now similar to the age of placentals at 160 Myr with 95% posterior distribution from 143 to 178 Myr by the latest molecular estimate12. The age of Juramaia has now set the minimal divergence time by the fossil to coincide with the range of molecular time estimates, serving as a corroboration of the newest fossil record with the molecular clock of evolution. The 160-Myr-old Juramaia also has important implications for mammalian evolution as a whole. Eutherian mammals are nested in the more inclusive Mesozoic boreosphenidan clade (Fig. 3, node 1), for which the previously earliest record had been entirely Early Cretaceous1, 27. The eutherian Juramaia requires that the ghost-lineages of boreosphenid and cladotherian mammals would also extend to the Middle Jurassic. Therefore the magnitude of the mammalian faunal turnover from the Early to Middle Jurassic is greater than previously known, and the Early–Middle Jurassic is a critical transition for the appearance of more of the derived mammalian clades1, 2.

    Reference 10 from that quote is a paper by Kitazoe and colleagues (2007) [2]. In that paper, the divergence of New World and Old World monkeys is up over 55 million years ago, and the divergence of anthropoids and strepsirrhines was around 85 million years. In other words, that "fast" chronology paper predicted anthropoids at or around the K-T boundary. Reference 11 is by Bininda-Emonds and colleagues (2007) [3], in which primates were estimated to have originated 91 million years ago, with a haplorhine-strepsirrhine divergence 87 million years ago. The other references here don't discuss within-primate divergences together with the more ancient mammalian representatives. I discussed more focused comparisons of primate divergences last year ("Were there Cretaceous anthropoids? Part 1. The problem in a nutshell").

    It looks to me like an earlier origin of placental mammals will elevate the likely divergence dates for primates to some degree, which will make a difference to interpretation of fossils like Altiatlasius or Algeripithecus. I think it's consistent with a lower mutation rate within the hominoids also, but it's unclear whether we need the within-family rate of change to be consistent with the longer term rate of change among orders of mammals.


    References

    Synopsis: 
    The discovery of the 160-million-year-old Juramaia suggests a lower mutation rate and longer chronology for primates.
  • More on the mutation rate

    Mon, 2011-07-18 04:43 -- John Hawks

    I've received several questions over the last few weeks about human genome-wide mutation rates. Some people are noticing heterogeneity in mutation rate estimates among family trios (spurred by a recent paper from the 1000 Genomes Project) while others are asking about apparent contradictions between estimates from pedigree-based methods and those based on phylogenetic comparisons with other primates (see, for instance, Dienekes' discussion of the recent paper by Li and Durbin [1]).

    I wrote a very extensive and referenced post last fall about this issue, and I just want to bring it to people's attention: "What is the human mutation rate?"

    The 1000 Genomes Project has adopted the low per-generation mutation rate that has been coming out of the family trio comparisons. This low rate is around 1.2e-8 per site per generation as opposed to the estimate of around 2.4e-8 per site per generation that was often used prior to last year. Several new or upcoming papers will use the lower rate as applied to comparisons in humans or other hominoids.

    I'll just point out two conclusions I arrived at last fall:

    1. The 1000 Genomes comparisons are not very strong evidence in favor of a low rate. There is too much error in the sequences, and the means of filtering errors may affect the rate estimation. Much stronger evidence comes from pedigree-based comparisons of de novo Mendelian diseases, which encompass tens of thousands of mutational events instead of a few dozen. These also suggest a low rate -- in particular Michael Lynch's work from early 2010 [2]. This work also demonstrates that different sequence contexts give rise to different effective rates of mutation.

    2. The higher rate based on phylogenetic comparisons was always based on circular reasoning. People applied a rate that would fit the observed sequence differences to some paleontological event. Logically, the fossil appearance of an extant lineage puts a minimum time on the divergence of that lineage from others; but geneticists typically assumed that this was the expected time of sequence divergence, not the minimum possible time of species divergence. These two dates may easily differ by a factor of two, given the quality of the hominoid fossil record. Sequence divergence must always precede speciation, and speciation must always precede the earliest fossil occurrence of a lineage. The paleontological dates were then often bootstrapped from estimated mutation rates. The famous "6 million year human-chimpanzee divergence" was always based on these faulty assumptions -- that we knew with exactitude the human-orang or human-macaque sequence divergence time, and that the sequence divergence time between humans and chimpanzees was identical to the speciation time of the two lineages.

    I've had several conversations with people about this issue during the past year. Some of them take it very seriously, others don't. Myself, I see that the lower rate simplifies many problems with the fossil record and comparisons of archaic genomes, but creates some others. For this reason, I'm cautious about it.


    References

    1. Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature [Internet]. 2011;475:493–496. Available from: http://dx.doi.org/10.1038/nature10231
    2. Lynch M. Rate, molecular spectrum, and consequences of human mutation. Proceedings of the National Academy of Sciences [Internet]. 2010;107:961–968. Available from: http://dx.doi.org/10.1073/pnas.0912629107
    Synopsis: 
    The 1000 Genomes Project is on the verge of demonstrating a lower mutation rate in humans. Should we believe it?
  • Mailbag: Mutations and perfect people

    Fri, 2011-01-21 08:14 -- John Hawks

    A friend of mine and I were discussing evolution, my friend is a Christian and told me that the human mutation rate is 1 in ten billion cell duplications, and with this rate you can track back to the 'genetically perfect human' around 6000 years ago, which is apparently when God created man. I'm quite sceptical of this, considering i have learnt about mutations in science and biology this year at school, so i was wondering whether you could tell me what the rate is and whether it is even possible for there to have been a 'genetically perfect human', after all when it comes to genetics what is classified as perfect?

    If you could answer this question it would be greatly apprectiated.

    I appreciate your question. To really explore the topic, I suggest the free materials from Nature education. For example, this article about mutations has links to many other related topics.

    http://www.nature.com/scitable/topicpage/genetic-mutation-1127

    The short answer is that human DNA has mutated around 2/100000000 per generation per base pair. The average two copies of a gene in people today differ by 1/1000, making their common ancestor around 50000 generations (1 million years) ago. No person is genetically perfect -- even discounting new mutations, some of our genes work only in combination with others. Natural selection has eliminated many deleterious mutations but thousands of them remain and have always existed.

  • What is the human mutation rate?

    Thu, 2010-11-04 01:33 -- John Hawks

    Last spring I wrote about a study that used whole-genome comparisons between parents and offspring to estimate the rate of per-genome mutation in humans ("A low human mutation rate may throw everything out of whack").

    The study was by Jared Roach and colleagues [1], and as you might guess from my post title, the result was surprising. Previous work had suggested a human mutation rate around 2.5 x 10-8 per site per generation. The new study found less than half the expected number of mutations between these parents and offspring, an estimated rate of only 1.1 x 10-8 per site.

    If this lower rate of mutation were to hold up, it would affect much of our understanding of the chronology of human evolution. Fossils and archaeological sites would not change in date, but some hypotheses about their relationships would be challenged. For example, the higher rate of 2.5 x 10-8 per site suggests a chimpanzee-human population divergence around 4 million years ago. A new rate of 1.1 x 10-8 would not have a linear effect on this divergence time -- the genes don't have genealogical roots at the same instant as the population divergence. But the human-chimpanzee divergence time would be radically higher than in many recent estimates.

    The same might be true for other primate divergences, and for genealogical relations within human populations today. Basically any times that are estimated from genetic differences may be affected by our knowledge of the per-generation rate of mutations.

    What does this mean? Open below the fold to read more.

    What mutations are we counting?

    Human genomes differ from each other in many ways. There are single base-pair changes in sequences, insertions and deletions, repeat polymorphisms, and larger-scale rearrangements such as inversions and gene duplications. Recent work suggests that some of these larger-scale effects may be very important to phenotypic variation among people. So why should we be talking about only the first of these kinds of variation?

    Single nucleotide mutations have been the focus of most attention about mutation rates because they are relatively easy and quantify. In high-quality sequence data, a single nucleotide change is relatively unambiguous. Reversals are fairly unlikely, although at a small fraction of "hotspot" sites, recurrent mutations can make a big difference.

    It is somewhat misleading to refer to "a" rate of single nucleotide mutations, because some kinds of sites (e.g., CpG nucleotides) have had a much higher probability of mutations than others. This affects the apparent rate of mutations in noncoding versus synonymous sites [2]. Also, the germline in males has been estimated to be as much as 6 times more likely to suffer mutations than the germline in females (discussed by Crow [3]). The idea of a genome-wide rate assumes that when we bin all the single nucleotide mutations together, across large amounts of sequence, we do arrive at a relatively stable rate that can be applied to similarly broad extents of sequence data. Or at least that we can identify sequence regions with compatible rates (e.g., noncoding DNA or synonymous sites).

    At the moment, technical issues make it hard to find and quantify many other kinds of variation. The current generation of sequencing devices tend to generate short reads, which make it difficult to assess the presence of insertions or deletions of more than a few base pairs. Duplications and other rearrangements require special treatment such as higher coverage or longer sequence reads. By contrast, a single nucleotide mutation will typically align in the proper location and be quite evident in a read. In principle, we can just run down the genome and count them.

    Still, finding novel mutations is not without its problems. Recent sequencing projects have yielded a very high rate of false positives. The rate of false negatives is really not yet known. We have a good reason to suspect that the false negative rate will be high. In a low-coverage genome, many short segments of the genome will have very low read numbers, making it likely that the sequence reads represent only one of the two copies of the genome present at that location. Any novel mutations in that area have a 50-50 chance of being missed by our sequencing efforts. This false negative risk can be reduced by adding higher sequence coverage, but we're not yet at the point where we have a lot of genomes sequenced at the 10x or higher coverage that we would really want.

    So while sequencing a parent and offspring genome is the most direct way to estimate the per-generation mutation rate, it is not yet ideal.

    Where did the high rate come from?

    That means we need to look very closely at other sources of data, to see if they may provide some independent confirmation of a lower per-generation mutation rate. In the process, we should ask, why did the higher rate, around 2.5 x 10-8 per generation, become so widely accepted?

    The source cited by Roach and colleagues for the higher rate, 2.5 x 10-8 per site, is a paper by Michael Nachman and Susan Crowell [4]. Nachman and Crowell examined processed pseudogenes in humans and chimpanzees, under the assumption that mutations in these pseudogenes would be neutral to selection in the human and chimpanzee lineages.

    The average mutation rate was calculated from the average autosomal rate of evolution assuming a generation time of 20 years (Table 3). Recent estimates of the time since humans and chimpanzees diverged (T) include 4.5 mya (TAKAHATA and SATTA 1997 ), 5.5 mya (KUMAR and HEDGES 1998 ), and 6.0 mya (GOODMAN et al. 1998 ). ARNASON et al. 1998 estimated the Homo-Pan divergence at 10–13 mya; however, their estimate is based on a calibration using distant, nonprimate species and is at odds with most other recent estimates. Mutation rates were calculated for a range of different human-chimpanzee divergence times and for two different ancestral population sizes. Mutation rate estimates vary from 1.3 x 10-8 (assuming T = 6 mya and Ne = 105) to 2.7 x 10-8 (assuming T = 4.5 mya and Ne = 104). If the average generation time is assumed to be 25 years (e.g., EYRE-WALKER and KEIGHTLEY 1999 ), then mutation rates are estimated to be between 1.6 x 10-8 and 3.4 x 10-8.

    Wait a minute. There's no independent estimate of mutation rate here at all!

    What they did was to assume values for the human-chimpanzee divergence and ancestral (chuman) effective size, and then provide an estimate of mutation rate consistent with those assumptions. That's perfectly reasonable as a way of quantifying the genetic divergence that they observed. If our goal is to predict the per-generation mutation rate from interspecific divergence, that's more or less the kind of estimate that we want.

    But many, many other studies have instead used a citation to the Nachman and Crowell rate as a justification for their own estimates of the human-chimpanzee divergence time! That's not perfectly reasonable, in fact, it's perfectly circular. It's turtles all the way down!

    Worse, those citations tend to cite the midpoint of Nachman and Crowell's range of estimates (2.5 x 10-8) as if it were a true value measured with little error. Reading the original reference, you can plainly see that Nachman and Crowell reported estimates that varied over a factor of three, corresponding to a wide range of chuman population histories. From their discussion:

    Mutation rates estimated for a range of divergence times and ancestral population sizes fall between 1.3 x 10-8 and 2.7 x 10-8 assuming a generation time of 20 years (Table 3) or between 1.6 x 10-8 and 3.4 x 10-8 assuming a generation time of 25 years. We suggest that 2.5 x 10-8 is a reasonable estimate of the average mutation rate per nucleotide site (but caution that the actual rate may be between 1.3 x 10-8 and 3.4 x 10-8).

    That 2.5 x 10-8 is simply the midpoint of their range of estimates with the 25-year generation time.

    What would be more reasonable? For hominins and chimpanzees, we probably want to apply a shorter generation length, a larger ancestral effective size, and a higher time of divergence. All of these would have yielded a lower rate for the Nachman and Crowell data. But we don't want to just assume these values, we should try to test whether they are valid based on other data.

    Other mutation rates from phylogenetic comparisions

    Nachman and Crowell have not been alone in their ultimate reliance on fossil evidence as an assumption underlying the per-generation mutation rate. But several other studies came to a slower mutation rate. Mostly, these studies have assumed that the human-chimpanzee divergence happened significantly earlier than 5 million years ago. Necessarily, then, the human per-generation mutation rate would have to be lower, as long as the sequence divergence remained the same.

    These estimates are ultimately rooted in the date of one or more fossils, among which the generation time certainly varied. The resulting per-site mutation rates are often reported as per-year instead of per-generation. For example, Yi and colleagues [5] yielded a rate of 0.99 x 10-9 per year for the human-chimpanzee comparison, which would multiply to 1.98 x 10-8 per 20-year generation. They propose this as a maximal rate, assuming that Sahelanthropus at a minimum date of 6 million years ago is a hominin. With an older divergence date, they propose a correspondingly lower rate (e.g., 0.79 x 10-9 per year, not accounting for ancestral population polymorphism).

    Similarly, Steiper and Young [6] considered a long (1.9 Mb) alignment of sequence from 19 primate species. In their model to estimate relative rates on different branches of the primate phylogeny, they incorporated the assumption that Sahelanthropus is on the hominin clade. A divergence date of 6 million years gave rise to a human per-site mutation rate of 0.65 x 10-9 per year (1.3 x 10-8 per 20-year generation). A divergence date of 7 million years lowered the mutation rate to 0.57 x 10-9 per year.

    Low mutation rates do not always result from these studies. Several have arrived at either a high human mutation rate or a recent human-chimpanzee divergence time. Sometimes a recent human-chimpanzee divergence emerges simply by assuming the rate given by Nachman and Crowell. Yang [7] provides an example of this -- a paper that very thoroughly explores the relationship of divergence time and ancestral effective population size, but ultimately roots the estimates on a single value for mutation rate. This rate we have already seen was itself based on an assumption about divergence time.

    Kumar and colleagues [8] came to a much lower estimate for the human-chimpanzee divergence time, based on an Old World monkey-hominoid divergence at 23.8 million years ago. This estimate did not consider the effect of ancestral polymorphism on the mean genetic divergence time, and so should -- in the language of computer software -- be deprecated.

    I should reiterate that none of these estimates are suitable for testing the times of phylogenetic divergences, because they all assume that the date of some particular fossil (or set of fossils, by fitting a model) is the minimum divergence time for a clade.

    So much of the literature in this area is ultimately circular, I'm pulling out my sparse hair reading through it. By the time we get back to the mid-1990's, the sequence data are even sparser than my hair by today's standards -- only a few hundred base pairs, or a sampling of restriction sites. But the divergence time estimates have propagated forward from that time to today, recycled through the assumptions of papers in the intervening time. It's like the genetic equivalent of money laundering!

    Evidence from parent-offspring sequence differences

    There is another way besides phylogenetic comparison: Simply look at living people and see how many new mutations they have.

    But this is tricky because we are rarely in a position to know which mutations are new. Most variations that we see between two people have persisted in the population for hundreds of generations or more. It takes a special kind of mutation to make its newness evident.

    Up until the advent of large-scale sequencing, the most important source of information about the mutation rate came from the rates of spontaneous Mendelian diseases. When a person has a dominant genetic disorder not carried by either of his parents, you know that the mutation must be new. Disease rates have long been tracked as standard public health data.

    However, the per-genome or per-locus rate of Mendelian disorders can estimate the per-site rate of mutations only by adding well-resolved information about the target size of functional genes. For example, if we know the average gene length and the proportion of different amino acids in functional proteins we can make some estimate of the ratio of synonymous to nonsynonymous sites. But we would still lack information about the fraction of nonsynonymous mutations that cause deleterious effects on protein function. For this reason, it was possible for very early workers (e.g., Haldane) to come within the ballpark of per-locus mutation rates even before the genetic code was available. Yet such estimates are not strictly useful for understanding per-site rates of mutation.

    By 2000, widespread sequencing had begun to identify disease-causing mutations at the sequence level. When exons are known, it is possible to determine the "target size" -- the number of sites at which loss-of-function mutations may occur. These two values provide the numerator and denominator for an estimate of the per-site mutation rate.

    Kondrashov [9] applied this method to estimate the per-site mutation rate across 20 human genes. He surveyed the literature for genes where more than 100 patients had been sequenced completely for the causative locus, finding the causal mutations. Using this value and the disease incidence allowed an estimate of the per-site rate of mutation for different categories of transitions and transversions. There was some variation among loci, with an average rate of per-site mutation equal to 1.8 x 10-8 per generation.

    Kondrashov observed a few hotspots in these genes, with substitution or deletion rates as much as a hundred times the average site. He also observed that the per-gene rate of mutation varies according to the number of CpG sites. The rate of short deletions was on the order of 5 x 10-10, insertions were even less frequent.

    The rate estimate by Kondrashov is within the range considered by Nachman and Crowell, but only 3/4 of the value 2.4 x 10-8 widely cited as the long-term estimate. If this rate were applied to Nachman and Crowell's pseudogene data, it would predict a human-chimpanzee divergence time around 6 million years.

    This year, Lynch [10] performed a more extensive comparison using similar methods as Kondrashov. Including more genes, and considering a broader range of mutational effects (including missense as well as nonsense coding mutations), Lynch found an even lower estimate of mutation rate per generation -- only 1.28 x 10-8 per site.

    These estimates are not precisely the same as comparing parent-offspring pairs, but they are exceedingly powerful because the data on disease rates encompass very large populations of people.

    We should keep in mind the result of Subramanian and Kumar [2], which showed that exons have a higher effective rate of substitution than do noncoding regions. That result implies that the genome-wide rate of change should be lower than estimated by Lynch, because his estimate encompasses only coding mutations. Also, any effect of purifying selection on these mutations will tend to decrease the long-term rate of substitutions per site to a lower value than the rate of mutations. The rate estimated by Lynch should then be an overestimate of the substitution rate that would be applicable to hominoid phylogenetic relationships.

    A slower rate

    These estimates of the per-generation mutation rate are all low compared to the commonly-cited 2.5 x 10-8. They are not quite as low as the rate estimated by Roach and colleagues [1], but the Lynch estimate is very close: 1.28 x 10-8 compared to 1.1 x 10-8 per site.

    The lower estimate from Roach and colleagues is a direct comparison of parent and offspring. In my earlier discussion of that rate, I suggested that false negatives in the sequence comparisons might have lowered the apparent rate of mutations. I still think we can't rule out that possibility. But the rate is not alone, and so it is less surprising than it may have seemed.

    My post last week on the 1000 Genomes Project results ("Now for anthropological genomics") mentioned that the 1000 Genomes comparisions have arrived at essentially the same rate as Roach and colleagues. Comparison of one family trio led to a rate of 1.0 x 10-8 per site per generation; the other family trio gave rise to an estimate of 1.2 x 10-8 per site per generation. These bracket the estimate given by Roach and colleagues.

    My basic observation about the human-chimpanzee divergence time is still sound:

    If this mutation rate is accurate, then the average human-chimpanzee gene divergence has to be up around 11 million years ago. That can be accommodated with a 7-million-year-old species divergence only if we assume a very large ancestral population -- on the order of 50,000 or higher. Or, the ancestral effective size could be lower -- but that would make the species divergence substantially older -- 9 million years or more.

    As we go further back in time, this lower human mutation rate may be less and less relevant, because different primate lineages may have higher (or lower) rates. When some of the kinks have been worked out of whole-genome sequencing, it would be tremendously useful to sequence parent-offspring pairs in other primate species. With those data, rate heterogeneity could be tested directly.

    For events within the hominins, the parent-offspring rate of mutations ought to be better than a rate estimated from phylogenetic distance. Phylogenetic distances are estimated with even more error than mutations, increasingly so as our methods for comparing genomes improve. But some fraction of new mutations will ultimately be lost to purifying selection. That implies, again, that the longer term rate of substitutions will be lower than the rate of mutations measured from parent-offspring comparisons.

    A rate of 1.1 x 10-8 would have no effect on the number of genetic differences observed between people, because these differences are just counted, not estimated by genealogical relationships that are known. It is the unknown genealogical relationships, which are estimated from genetic differences, that may change substantially.

    Let's consider an example. Harris and Hey [11] sequenced 4200 bp of the gene PDHA1, an X-linked gene whose product is part of a mitochondrial enzyme complex. At the time of their study (1999), their result was one of the oldest coalescence times estimated for non-African populations based on sequence data; they estimated the root of the PDHA1 genealogy was 1.8 million years old. This estimate was based on the assumption that human and chimpanzee copies, which differed by an average of 40.42 substitutions, had diverged at 5 million years ago. That would imply that the average genetic difference between humans across the deepest root of the genealogy, 15.05 mutational differences, corresponds to 1.86 million years of time. If we instead assert a per-generation rate of 1.1 x 10-8 per site, these data would generate an estimate of 163,000 generations for the root of the genealogy, roughly 3.3 million years.

    In other words, a coalescence that appeared to have happened in early Homo now looks rooted at the age of A. afarensis. The chimpanzee-human genetic root would be around 8.7 million years for these data.

    These estimates would likely be biased too low, because the X chromosome has a lower rate of mutation than the autosomes by some extent. That issue was addressed by Lynch [10], due to the fact that X chromosomes are in males (with their higher rate of mutations) only 1/3 of the time compared to 1/2 the time for autosomes. Any purifying selection would also bias the estimate too low. If these 4200 bp have a higher-than-average CpG content, that is one factor that might require a higher per-generation rate.

    Is any of this a problem? I don't think we know yet. A lower rate must readjust the apparent correspondence of some molecular time estimates with the archaeological record. But to be honest, most of the apparent correspondences of such dates have been illusory, because genealogical relationships among genes have such large expected variance under any realistic human population model. It is really the availability of whole-genome comparisons that has a chance of improving these population models.


    References

    1. Roach JC, Glusman G, Smit AFA, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M, et al. Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing. Science [Internet]. 2010;328:636–639. Available from: http://dx.doi.org/10.1126/science.1186802
    2. Subramanian S, Kumar S. Neutral Substitutions Occur at a Faster Rate in Exons Than in Noncoding DNA in Primate Genomes. Genome Research [Internet]. 2003;13:838–844. Available from: http://dx.doi.org/10.1101/gr.1152803
    3. Crow JF. The origins, patterns and implications of human spontaneous mutation. Nature Reviews Genetics [Internet]. 2000;1:40–47. Available from: http://dx.doi.org/10.1038/35049558
    4. Nachman MW, Crowell SL. Estimate of the Mutation Rate per Nucleotide in Humans. Genetics [Internet]. 2000;156:297–304. Available from: http://www.genetics.org/cgi/content/abstract/156/1/297
    5. Yi S, Ellsworth DL, wen-Hsiung Li. Slow Molecular Clocks in {Old World} Monkeys, Apes, and Humans. Molecular Biology and Evolution. 2002;19:2191–2198.
    6. Steiper ME, Young NM. Primate molecular divergence dates. Molecular Phylogenetics and Evolution [Internet]. 2006;41:384–394. Available from: http://dx.doi.org/10.1016/j.ympev.2006.05.021
    7. Yang Z. Likelihood and Bayes Estimation of Ancestral Population Sizes in Hominoids Using Data From Multiple Loci. Genetics [Internet]. 2002;162:1811–1823. Available from: http://www.genetics.org/cgi/content/abstract/162/4/1811
    8. Kumar S, Filipski A, Swarna V, Walker A, Hedges BS. Placing Confidence Limits on the Molecular Age of the Human-Chimpanzee Divergence. Proceedings of the National Academy of Sciences, U. S. A. [Internet]. 2005;102:18842–18847. Available from: http://dx.doi.org/10.1073/pnas.0509585102
    9. Kondrashov AS. Direct estimates of human per nucleotide mutation rates at 20 loci causing mendelian diseases. Hum. Mutat. [Internet]. 2003;21:12–27. Available from: http://dx.doi.org/10.1002/humu.10147
    10. Lynch M. Rate, molecular spectrum, and consequences of human mutation. Proceedings of the National Academy of Sciences [Internet]. 2010;107:961–968. Available from: http://dx.doi.org/10.1073/pnas.0912629107
    11. Harris EE, Hey J. X chromosome evidence for ancient human histories. Proceedings of the National Academy of Sciences, U. S. A. 1999;96:3320–3324.
    Synopsis: 
    The 1000 Genomes Project is finding that the mutation rate is half the value usually assumed.
  • mtDNA, purifying selection and "distorted" genealogies

    Sat, 2010-10-23 11:13 -- John Hawks

    I'm going to pass along this paper without much comment, it's by Jon Seger and colleagues and it came out earlier this year in Genetics [1]:

    Gene Genealogies Strongly Distorted by Weakly Interfering Mutations in Constant Environments

    Neutral nucleotide diversity does not scale with population size as expected, and this "paradox of variation" is especially severe for animal mitochondria. Adaptive selective sweeps are often proposed as a major cause, but a plausible alternative is selection against large numbers of weakly deleterious mutations subject to Hill–Robertson interference. The mitochondrial genealogies of several species of whale lice (Amphipoda: Cyamus) are consistently too short relative to neutral-theory expectations, and they are also distorted in shape (branch-length proportions) and topology (relative sister-clade sizes). This pattern is not easily explained by adaptive sweeps or demographic history, but it can be reproduced in models of interference among forward and back mutations at large numbers of sites on a nonrecombining chromosome. A coalescent simulation algorithm was used to study this model over a wide range of parameter values. The genealogical distortions are all maximized when the selection coefficients are of critical intermediate sizes, such that Muller's ratchet begins to turn. In this regime, linked neutral nucleotide diversity becomes nearly insensitive to N. Mutations of this size dominate the dynamics even if there are also large numbers of more strongly and more weakly selected sites in the genome. A genealogical perspective on Hill–Robertson interference leads directly to a generalized background-selection model in which the effective population size is progressively reduced going back in time from the present.

    The topic arises for me at the moment because of some inconsistencies between the apparent timing of events from mtDNA estimates compared to nuclear DNA estimates. Across the crucial "out of Africa" time interval between 200,000 and 50,000 years ago, the mtDNA is not really giving the same chronology as might be expected from nuclear DNA comparisons.

    The mutation rate of mtDNA genome-wide is very high, giving rise to the possibility of interaction between weakly deleterious mutations on the same sequence. It is widely known that the apparent rate of mtDNA mutation depends on the timescale of the comparison in humans. Mothers and their offspring differ by much more than would be predicted by longer pedigrees or by comparisons between populations. Recently diverged populations (such as those in island Polynesia) differ much more than would be predicted from the difference between humans and Neandertals or humans and chimpanzees.

    This apparent "speed-up" of rate as we get closer to the present is consistent with the action of strong purifying selection. So establishing the other genealogical effects of this selection should help us understand the patterns of mtDNA sequence differences found in humans.


    References

  • A low human mutation rate may throw everything out of whack

    Thu, 2010-03-18 16:30 -- John Hawks

    Last week, a paper looking for the genetic causes of Miller syndrome reported the whole genomes of four members of a single family: two siblings with the disorder and their two parents without. The idea was that they would simply compare the affected and unaffected genomes. They would then find candidate loci that might account for Miller syndrome in the affected siblings. By exploiting some other sources of information, they found what they were looking for. Daniel MacArthur covered the story in his post, "Disease hunting with whole genome sequences: the good news, and the bad news".

    I got interested in another aspect of the story. With whole-genome sequences of parents and offspring, it becomes possible to directly determine the rate of mutations in each generation. The paper by Roach and colleagues did just that -- they counted 28 in the 2.3 billion bases of sequence they included in their comparison. That makes a per-site mutation rate of 1.1 x 10-8 per generation.

    Which is a pretty interesting number. You see, it's less than half what it ought to be:

    [O]ur estimated human mutation rate is lower than previous estimates, the most widely cited of which is 2.5 x 10-8 per generation (10) based on three parameters: a human-chimpanzee nucleotide divergence per site (Kt) of 0.013, a species divergence time of five million years ago, and an ancestral effective population size of 10,000. More recent estimates indicate a nucleotide divergence of 0.012 (9), species divergence time between six and seven million years ago (11–15), and ancestral effective population size between 40,000 and 148,000 (16–19). With these parameter ranges and a generation length of 15 to 25 years, the mutation rate estimate is between 7.6 x 10-9 and 2.2 x 10-8 per generation, which is consistent with our intergenerational estimate of 1.1 x 10-8. Our estimate is within one standard deviation (SD) of an earlier estimate of 1.7 x 10-8 (SD: 9 x 10-9) based on 20 disease-causing loci (20). The rate we report is for autosomes, and should be several-fold lower than that of the Y chromosome, as in the male germline more cell divisions occur per generation. Though our rate differs approximately as expected from the recently reported estimate of 3.0 x 10-8 (95% CI: 8.9 x 10-9 – 7.0 x 10-8) for the Y chromosome, the error rates make this difference not significant (21).

    You can see the obvious implication: If this mutation rate is accurate, then the average human-chimpanzee gene divergence has to be up around 11 million years ago. That can be accommodated with a 7-million-year-old species divergence only if we assume a very large ancestral population -- on the order of 50,000 or higher. Or, the ancestral effective size could be lower -- but that would make the species divergence substantially older -- 9 million years or more.

    There is a second implication. Most studies of human genetic variation have assumed that 5-million-year-old human-chimpanzee divergence and the high associated rate of mutations. If the true rate is less than half that, then the coalescence times of human genes are more than double most estimates. That would include our estimates of human-Neandertal genetic differences.

    Well, that's a fine pickle.

    I'm not quite ready to believe the very low rate estimate. The analysis in this paper uncovered tens of thousands of false positives, and had to filter through those to arrive at 28 true mutations. The filtering involved resequencing all the positives to determine which were true and which were false, but maybe there's room in there for a substantial number of false negatives, too.

    If this low estimate were true of the human-chimpanzee divergence, it would imply vastly higher ages for other primate divergences, or a much lower rate on the human lineage specifically. So that allows another check on the process.

    But generally, I'll be looking at whole-genome family comparisons with great interest, because they will give us a much more precise understanding of the rate of mutations and recombinations across the genome.

    References:

    Roach JC and 14 others. 2010. Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing. Science (early online) doi:10.1126/science.1186802

    Synopsis: 
    Whole genome sequencing of a family finds a very low number of mutations, suggesting evolution doesn't have the timescale we thought.
  • Ancient penguin mtDNA and substitution rates

    Mon, 2009-11-16 17:32 -- John Hawks

    Here's an example of a really incomprehensible press release:

    Ancient penguin DNA raises doubts about accuracy of genetic dating techniques

    Penguins that died 44,000 years ago in Antarctica have provided extraordinary frozen DNA samples that challenge the accuracy of traditional genetic aging measurements, and suggest those approaches have been routinely underestimating the age of many specimens by 200 to 600 percent.

    In other words, a biological specimen determined by traditional DNA testing to be 100,000 years old may actually be 200,000 to 600,000 years old, researchers suggest in a new report in Trends in Genetics, a professional journal.

    You can see why I'm interested -- the Neandertal genetic samples are in the neighborhood of 44,000 years old, so if ancient DNA is saying something unusual about penguins, it might say something unusual about them, right? But what are they talking about here? Racemization? I mean, there are no "genetic dating techniques" for specimens! The rest of the release doesn't clarify matters very much, although it does say that the findings

    may force a widespread re-examination of determinations about when one species split off from another, if that determination was based largely on genetic evidence

    That sounds like an argument that penguin sequences didn't evolve at the rate one might estimate from a molecular clock based on penguin systematics. The quotes from the researchers involved do include the words "molecular clock", which is a good sign.

    Well, enough of this, let's go straight to the research.

    High mitogenomic evolutionary rates and time dependency

    Using entire modern and ancient mitochondrial genomes of Adélie penguins (Pygoscelis adeliae) that are up to 44000 years old, we show that the rates of evolution of the mitochondrial genome are two to six times greater than those estimated from phylogenetic comparisons. Although the rate of evolution at constrained sites, including nonsynonymous positions and RNAs, varies more than twofold with time (between shallow and deep nodes), the rate of evolution at synonymous sites remains the same. The time-independent neutral evolutionary rates reported here would be useful for the study of recent evolutionary events.

    Their sample includes 12 modern Adélie penguins and 8 ancient ones, two of which are from the maximum time interval, although some are only around 250 years old. Now, the age distribution of the rest is fairly important to their analysis, but I can't see it because it's hidden in a data supplement, and I'm reading this in a laundromat in Vienna with no internet access.

    You see why I don't like these freaking online supplements? I'm in the middle of Europe and inconvenienced. Imagine if some penguin enthusiast in an underdeveloped country, with no subscription to the journal, got this paper in an e-mail attachment. They'd never be able to get a copy of the methods.

    There are several problems estimating substitution rates with data like these penguin mitochondria. You really depend very strongly on neutral demographic history -- if there were big population movements or partial replacements among the penguins, the estimation of rate is totally confounded by these. The paper refers to prior work on mammoth ancient mtDNA:

    A previous study on the mitochondrial genomes of the extinct mammoth also suggests that the rate based on internal calibrations (within mammoths) is ~1.6 times higher than that obtained using the external (i.e. mammoth–elephant) calibration.

    ...which raises a similar issue -- since the mammoths apparently did undergo a partial population replacement (or at least, an mtDNA replacement) across part of their range.

    Also, you depend very strongly on the few most ancient specimens, because they sample the longest time interval. Which means, you need to know the date of these specimens with great accuracy and you need to place them accurately on the genealogy that connects the more recent specimens.

    I think the biggest hangup is the genealogy. You can't assume that a 44,000-year-old penguin is a direct ancestor of any living mtDNA sequences. It's a relative, at some distance, possibly a member of an extant clade, possibly not. When we're talking about fossils that are 10s of thousands of years old, it becomes very likely that most of the branches connecting with living sequences will have coalesced into very few ancient branches, and it becomes progressively less likely that you will discover a representative of one of those actual ancestral branches. In other words there's an error intrinsic to the coalescent process that really can't be corrected by sampling more extant lineages.

    In other words, you can't just convert sequence differences into substitution rates without a model involving some pretty strong assumptions.

    The paper mentions two very well-known issues concerning the relationship of substitution rate, purifying selection, and saturation. Basically, deleterious mutations can hang around within a population for a while, so that a genetic sample from a living population will tend to over estimate the substitution rate. And long-term comparisons of distinct taxa may include so much time that multiple substitutions may have happened at the same site -- leading to an underestimate of substitution rate. These are the reasons, for example, why the number of mitochondrial mutations between mothers and their daughters is much higher than you would estimate from the number of differences between humans and chimpanzees.

    What does this mean for the penguins? Or, more to the point, the Neandertals? Here's a short passage where the paper discusses the comparison:

    By contrast, the synonymous substitution rate (0.054–0.073 s/s/My) estimated here is five to seven times higher than previous phylogenetic rate estimates [1–4] and significantly higher than those based on intra-specific comparisons within human (0.048–0.052 s/s/My) [14] and Neanderthal (0.036–0.042 s/s/My) [24] populations. These results clearly argue against the use of the classical 1% rate per lineage (or the ‘2% rule’ as it is commonly known) to study the evolution or genetics of individual species.

    Well, the penguin rate may be significantly higher than the within species human rate estimate, but it's not very much higher -- a minimum of 0.054 compared to a maximum of 0.052. So I don't think there is anything to get very exercised about with respect to ancient human DNA or Neandertal DNA.

    Unless you really are trying to use DNA like some sort of radiocarbon method. But that would be silly.

    References:

    Subramanian S, Denver DR, Millar CD, Heupink T, Aschrafi A, Emsile SD, Baroni C, Lambert DM. 2009. High mitogenomic evolutionary rates and time
    dependency. Trends Genet 25:482-486. doi:10.1016/j.tig.2009.09.005

  • More on the X variation conundrum

    Sun, 2009-05-17 13:30 -- John Hawks

    Last winter I noted the contradiction between two papers that each attempted to explain variation on the X chromosome compared to the autosomes. They had come to opposite conclusions, based on discrepancies in their data. I noticed that they had used different methods of determining mutation rates for X chromosome loci:

    So, for their current paper, Keinan and colleagues (2008) try to correct for the recent divergence of human and chimpanzee X chromosomes. Simple enough -- rescale all X chromosome mutation events by the some ratio proportional to the human-chimp divergence discrepancies. In this case, they attempt to rescale to the human-macaque divergence. Since that divergence happened in the Oligocene, the discrepancies among chromosomes should slight compared to the overall divergence. I'd feel better if they actually tested this idea.

    Meanwhile, Mike Hammer and colleagues scaled X chromosome diversity to the human-orangutan divergence. They claimed that this gave the same results as the human-chimpanzee divergence. Which, if true, would obviously give a different outcome than the procedure followed by Keinan and colleagues, which was predicated on the idea that the human-chimpanzee X divergence is the wrong number to use.

    I had sort of forgotten about this (which drove me crazy at the time), but another question led me to revisit it late this week. In the intervening time, I see that Carlos Bustamante and Sohini Ramachandran (2009) happened across the same explanation that I had offered:

    It appears that the rest of the discrepancy is explained by different normalizations for background mutation rate differences between the X chromosome and autosomes (Hammer et al.10 used human-orangutan divergence and Keinan et al.9 used human-macaque divergence).

    So you read it here first. Which I suppose means that I should submit letters to journals more often. I don't because it seems to me that all I'm doing is reading and trying to understand papers, which sometimes takes more work than it should. On the other hand, I wonder how many people are really putting much effort into their reading...

    Meanwhile, Bustamante and Ramachandran add an additional explanation -- the different means of ascertainment, since Mike Hammer's group used resequencing to find variation, while Keinan and colleagues (2008) had used HapMap SNPs under a specific ascertainment model. They end their short piece by pointing out the value of further resequencing data:

    In order to address continuing questions on the nature of sex-biased processes, full genome sequencing of large numbers of individuals sampled from diverse populations will be needed. The upcoming 1,000 Genomes Project (http://www.1000genomes.org/), for example, will provide orders of magnitude more data for these types of analyses. We share the enthusiasm of the population genetics community that this will bring the potential for resolving continuing questions regarding how human history and cultural practices have shaped global patterns of genomic diversity.

    Ascertainment is a serious issue with the existing SNP data, because different SNPs were ascertained in different, non-commensurable ways. That's how I was led into reconsidering this issue this week, another set of data seem to have features that are partially explained by ascertainment, but partially not. It's hard to use existing data for some kinds of population genetics analysis, although others are less affected by ascertainment biases.

    So the 1000 Genomes effort will make some kinds of analyses simpler to accomplish. I suppose if ascertainment becomes less of a problem, we may see people focus more effort into understanding non-genetic sources of information, too!

    References:

    Bustamante CD, Ramachandran S. 2009. Evaluating signatures of sex-specific processes in the human genome. Nat Genet 41:8-10. doi:10.1038/ng0109-8

Subscribe to mutation rate

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.