Substitution rates and ancestral population sizes

The rate of neutral mutations varies across the genome. When studying a single gene, this variation in rates is not especially important -- it is generally possible to obtain an estimate of the neutral rate for a single locus by comparing just that locus among closely related species.

But some comparisons involve looking at the pattern of variation among different loci. For instance, testing hypotheses about the ancestral populations leading to living species (like the common ancestor of humans and chimpanzees) involves comparing the amount of divergence among many independent loci. The variance in divergence times among loci gives an estimate of inbreeding in the ancestral population.

I discussed this particular example two years ago this week, after the paper that proposed extended hybridization between ancestral hominids and chimpanzees. The conclusion of the paper was that the X chromosome displays much less divergence between humans and chimpanzees than the autosomes, and this might reflect a late introgression of the X chromosome into hominids from another population that (mostly) was ancestral to chimpanzees. The autosomes, by contrast, averaged very old genetic divergences, although there was substantial variance. As I concluded then, the data look consistent with a large population size in the human-chimpanzee ancestor species, coupled with greater selection on the X chromosome. The interpretation of large population size (or alternatively, the interpretation of long-term population structure) comes from the low inferred inbreeding in that ancestral population -- which caused the variance in divergence dates among loci.

But there is another reason for a large variance in divergence dates: variance in mutation rates. Whenever mutation rates vary among loci, this variance adds to the variance among loci in their between-species genetic differences -- that is, the substitution rate. And as long as we are excluding selected sites (as we always try to do for these kinds of comparisons) we will overestimate the genetic diversity in ancestral species whenever the mutation rate varies among loci.

A new paper by Svitlana Tyakucheva and colleagues looks at human and macaque genomes to find patterns underlying the variance in mutation rates among regions of the genome. They find that a number of factors may cause such variations, including chemical factors like the CG content of the genome, functional causes such as male versus female rates of recombination, and large-scale structural causes such as telomeric proximity:

While a complete understanding of all biological mechanisms leading to variation in neutral substitution rates across the genome remains elusive, it is plausible that at least some of these mechanisms are conserved over relatively long evolutionary distances. For instance, both mouse-specific and rat-specific substitution rates are positively correlated with rodent-primate substitution rates [14], suggesting shared mechanisms persisting over ca. 90 million years [15]. Additionally, a positive correlation exists in substitution rates of homologous X- and Y-chromosomal introns that diverged from each other ca. 100 million years ago [16] (Tykucheva et al. 2008: R76).

Their finding that male recombination is an important contributor to mutation rate heterogeneity puts the focus on the X chromosome -- which has little recombination in males -- as unusual. X versus autosomal position did not explain a large fraction of the variance in this study (only around 2 percent, controlling for other factors) but the deviation was in the right direction to help account for the low X chromosome divergence between humans and chimpanzees.

Altogether in this study, a large fraction of variation in the human-macaque substitution variability could be explained by phenomena that affect the rate of mutations, including the structural and functional factors listed above as well as the corresponding homologous variability between mice and rats, and dogs and cattle. If these variations were explained by inbreeding in the human-macaque ancestral species, they would be random with respect to the dog-cow or mouse-rat divergences, and with respect to structural causes. So current estimates of the effective sizes of human-chimpanzee and other ancestral populations are almost certainly inflated. The amount of inflation is not clear, but a good estimate will require correcting for a large number of factors -- a complicated analysis.

Since the date of the human-chimpanzee divergence depends on our assessment of the diversity within the human-chimpanzee ancestral population, it may be a while before we can settle the issue of human-chimpanzee divergence time. That may or may not provide hope for Sahelanthropus, Orrorin, and Ardipithecus kadabba -- all supposed hominids that would predate 5 million years ago, the current best genetic estimate of the human-chimpanzee divergence time. To be sure, if the date is simply in error, that error might encompass older dates consistent with a 7-million-year divergence. But I'm not sure we should believe that the error is biased toward an older divergence -- "error" might lean in either direction, and a younger species divergence remains possible.


Tyakucheva S, Makova KD, Karro JE, Hardison RC, Miller W, Chiaromonte F. 2008. Human-macaque comparisons illuminate variation in neutral substitution rates. Genome Biol 9:R76. doi:10.1186/gb-2008-9-4-r76