john hawks weblog

paleoanthropology, genetics and evolution

Error message

  • Notice: Trying to get property of non-object in _biblio_citekey_print() (line 1891 of /var/www/johnhawks.net/public/modules/biblio/biblio.module).
  • Notice: Trying to get property of non-object in _biblio_citekey_print() (line 1891 of /var/www/johnhawks.net/public/modules/biblio/biblio.module).

fitness effects

  • Recent evolution of coding variants

    Wed, 2012-12-05 01:00 -- John Hawks

    How did I get myself quoted in a story as the skeptic about recent human evolution? ("Human Evolution Enters an Exciting New Phase"). After all, I've been a huge advocate of the idea that recent human evolution was a lot faster and more interesting than anthropologists used to think ("Why human evolution accelerated").

    The story, by Brandom Keim, is a good account of a new paper in Nature by Wenqing Fu and colleagues, "Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants" [1]. It's a pretty cool study, which has identified protein-coding alleles in large samples of European-American and African-American individuals.

    Fu and colleagues compared all the coding variants they found in large samples of European-Americans and African-Americans, and discovered that the European-ancestry people have a higher fraction of rare coding variants. They propose that the rate of new coding variants entering and persisting within the population actually accelerated in the ancestral European population. Why would this happen? In their view, demography is the most likely explanation. As European populations expanded during the Neolithic and later time periods, the rate by which new mutations are lost by genetic drift began to decline. These new mutations have pooled up within the European population, giving them a glut of new changes to protein-coding sequences. Many of these mutations may be deleterious, just not bad enough for natural selection to have weeded them out in the growing ancient population.

    I think in large part this explanation is correct. In some ways it is incomplete.

    The effect of population history on our evolution was the theme of our 2007 paper on positive selection in recent humans [2]. We relied on exactly the same mathematical relations used in this new paper: More people means more different mutations entering the population. In our case, the increase in the total number of mutations meant that we could expect more potential adaptive mutations to be selected within a growing population. In this case, the increase in the total number of mutations means more mutations remain to be picked up by resequencing rare neutral or deleterious variations in present samples.

    One of the senior authors of the study, Joshua Akey, commented:

    Most of the mutations that we found arose in the last 200 generations or so. There hasn’t been much time for random change or deterministic change through natural selection. We have a repository of all this new variation for humanity to use as a substrate. In a way, we’re more evolvable now than at any time in our history.

    (this is quoted by Punnett Square, not sure about the original source)

    That's a cool concept. These rare protein-coding variations may be mostly unimportant to fitness today, and many are slightly deleterious. Still they provide a store of variability that increases the potential range of responses to future adaptive challenges. Or, they give us room to examine the effects of small differences, which will help us to understand better how genes work. For the past few thousand years, a small proportion of those have come under positive selection, the part that we have been studying in my lab since 2007.

    The current study has some drawbacks. For one, it isn't evident from the results how these new coding mutations are distributed among individuals. Under population growth alone, we should expect that the number of these new coding variants carried by any one individual should be approximately the same as any other individual, regardless of the population size. Where a big population differs from a small population is in the variety of mutations carried by different individuals, with the average number per individual being equal. That may be true in this study, but it isn't possible to tell from the results presented.

    To the extent that some of these mutations are deleterious, their distribution matters. In Europeans, there may be a greater number of deleterious mutations that are on average more rare; all things being equal, this pattern should make it harder to find statistical evidence for association of these rare variants with complex disorders. By contrast, in Africans, the higher average frequencies of such variants should make them easier to tie to phenotypic variation. All this can be concluded from frequencies alone, without a need to relate frequency to age.

    Probably the biggest shortcoming of the paper is in its estimation of ages for these rare mutational variants. Estimating the ages of mutations in human populations has been a real problem for those of us working with genotyping or sequencing data from small samples. When we depend on the linkage between a rare allele and nearby genetic loci, we run into a sampling problem: Estimating the proportion of recombinants in a population fundamentally has a lot of error when you are working with a sample of 10 copies of the rare allele.

    Estimating dates by LD is bad enough, but this paper doesn't even go that far. Instead, it estimates the ages of alleles from their frequency.

    Frequency estimation of age is OK if the genome sequences have come from a Wright-Fisher population (that is, a random-mating, constant size population). More common alleles tend to be older, new alleles tend to be very rare. This isn't a very accurate means of dating any particular mutation, because the relationship of age and frequency under genetic drift has a tremendous variance. But when pooling large sets of alleles into frequency classes, the age-by-frequency approach gives a rough idea of whether mutations have accelerated or stayed at a constant rate over time.

    But there's one obvious thing missing from the model that may have a large effect on the frequencies of rare coding variants: Introgression from Neandertals! If we want to know why Europeans have a large store of rare coding variants relative to Africans, their ancient mixture of a small fraction of a very divergent human population is one obvious reason. None of the Neandertal alleles in Europeans today are new, they are all old. But a method that estimates their ages by allele frequency alone will always conclude that these rare Neandertal alleles are very young.

    In the current paper, the relation of frequency and age is derived from simulations that are based on a model of human population history. Like all recent papers that apply a model of human population history, this one is both overcomplicated (lots of parameters to which we have no good estimates) and oversimplified (too few events to accommodate known historical phenomena). Here's the population model used to derive allele ages in the paper:

    Population model from Fu et al. 2012

    Population model from Figure S5 in the supplementary information from Fu et al. 2012

    The parameters for population divergence times and ancient population sizes are estimated from genetic data, so any systematic error will propagate through to the estimation of allele ages. The exclusion of Neandertal introgression in the model really does bias the allele age estimates badly, as Neandertal genes today are mostly rare, and mostly very old. This year's shift in our assumptions about mutation rates (to a much slower rate than previously assumed) will also affect the estimates of the demographic parameters in the model. An older coalescence time for most genes means a larger ancestral effective size for these populations, and much older allele ages when frequency is the estimator.

    Our lab is working very hard on allele ages, and I hope to be able to share some of that work soon.

    This study is not alone in demonstrating the real importance of rare coding variation in human populations. This line of research has substantial value, as it helps to show why so much of the additive genetic variation underlying variation in human phenotypes has not yet been assigned to genes. We know that many traits are heritable by comparing genetic relatives with each other. Finding the genetic loci that explain similarity among relatives is relatively easy when the genes involved are common, because the same gene variants will be shared across many families. But pooling many families doesn't help us find very rare mutations, as these are likely carried only by a few pedigrees even in a very large sample. By showing the large store of rare coding variation, these studies help to establish that much of the genetic variation underlying disease may be there for us to discover, if we change our discovery approach.


    References

    Synopsis: 
    Probing the pattern of noncoding rare variation in whole exome data.
  • When genes break: validating loss-of-function variants

    Fri, 2012-02-17 12:20 -- John Hawks

    Daniel MacArthur and colleagues have an important paper in Science, titled "A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes" [1]. They took 1000 Genomes Project pilot data and systematically looked at every allelic variant in the sample that appeared to cause the loss of function of a protein-coding gene. Mutations that de-activate genes in this way are not rare, but they are often eliminated from the population rapidly by purifying natural selection, because the normal function of a protein is necessary to survival or reproduction. However, not every protein is so important, and MacArthur and colleagues confirmed that more than 1200 alleles in this sample genuinely occur in one or more of the 1000 Genomes Project individuals.

    Some of these are common but most occur in fewer than 2% of individuals in the sample, as expected if purifying selection were affecting many of them.

    MacArthur is one of the authors of the Genomes Unzipped group blog, and has written a great summary and introduction to his research paper: "All genomes are dysfunctional: broken genes in healthy individuals". It's free and well-written, so it will probably work better for many readers than the original paper.

    Science is running a commentary to accompany the research article, by Lluis Quintana-Murci [2]. This paragraph encompasses a lot of the numerical facts about these loss-of-function variations, and discusses the idea that some of them were positively selected -- that is advantageous in recent human populations.

    MacArthur et al. estimated that, depending on ethnic background, each individual's genome carries 26 to 37 variants that introduce a stop codon (which signals the termination of translation of nucleic acids into protein), with up to 6 present in the homozygous state. When considering other types of LoF variants, including those that disrupt splice-sites, large deletions, or insertions or deletions of nucleotides that change the DNA reading frame, the total number per individual is extended to 103 to 121, with ∼20 present in homozygosity. A large proportion of LoF variants were enriched in low-frequency alleles, suggesting that the removal of deleterious alleles has prevented them from increasing to high frequencies. Furthermore, some have already been associated with severe human diseases, supporting the less-is-less hypothesis. Other LoF variants, which can reach higher population frequencies, fall into poorly evolutionarily conserved genes or belong to multigene families displaying high paralogous sequence identity. This suggests that the functions of the corresponding genes are highly redundant, explaining their greater tolerance for LoF variants and supporting a less-is-nothing scenario. Also, although no substantial enrichment in positive selection signals was observed among LoF variants at the genomewide level, 20 of them fell into regions displaying signatures of positive selection, as predicted by the less-is-more hypothesis, suggesting that they may have conferred a selective advantage in human evolution.

    Common loss-of-function variants that are evolutionarily recent are very interesting to us as we work to understand the changes that accompanied modern human origins and the later invention and spread of agriculture. I am really excited that these analyses were carried out using the 1000 Genomes samples because that means we can use the sequence data to estimate the ages of these functional losses. We can do quite a lot better than to say that they "fall into regions displaying signatures of positive selection": In fact, we can determine whether these variants themselves were selected, or hitchhiked to high frequency along with some other variant that was selected.

    Many of loss-of-function variants are in genes that may not matter much to selection. Olfactory receptor genes, for example, comprise a very large family with recurrent duplications and pseudogenizations during primate evolution. We have scores of olfactory receptor pseudogenes, many of which are polymorphic in living human populations. Some may continue to make a noticeable difference to the phenotype, such as the asparagus-urine-smelling polymorphism. But many are probably invisible to us. Still, a few of these do look like they've been positively selected in recent human populations.

    Sometimes less really is more.


    References

    Synopsis: 
    A "punishing" resequencing project validates mutations in the 1000 Genomes Project individuals that deactivate protein-coding genes.
  • Polygenic traits and directional selection

    Sat, 2010-09-18 13:41 -- John Hawks

    This has been an eventful week for those of us who study the dynamics of recent selection in humans. The most significant event was the publication of a paper describing genetic analysis of a long selection experiment in Drosophila. Although the experiment differs from most natural instances of selection in some important ways, the results give some very helpful corroboration that the recent human pattern of adaptive evolution was rapid and of an expected pattern for massive selection on many traits.

    Meanwhile, Jonathan Pritchard and Anna Di Rienzo have a short review in the current Nature Reviews Genetics [1], discussing the idea that a large fraction of adaptive evolution may be difficult to find with current genetic evidence.

    Their idea is that polygenic adaptations are unlikely to occur by successive "sweeps" of new adaptive mutations.

    It seems likely to us that, as in traditional quantitative genetic models, many — possibly even most — adaptive events in natural populations occur by polygenic adaptation. Polygenic adaptation could allow rapid adaptive shifts, yet would often go undetected using conventional methods for detecting selection. Moreover, polygenic adaptation is qualitatively different from the models of adaptive substitutions that dominate the population genetics literature.

    This is not a new idea, but Pritchard and Di Rienzo review it in a productive way, and the topic is worth some deeper thought...

    An adaptive genetic substitution is often modeled as an episode of logistic growth. A new mutation, initially in a single copy, increases exponentially in numbers until it is very common in the population. After this point, it continues to increase in frequency up to fixation, but progressively slowly. The entire process takes hundreds or a few thousands of generations, which sounds like a long time but is actually very rapid compared to the deep genealogical histories of most genetic loci. The initial rapid increase in numbers carries a region of linked sequence along with the selected variant. This "hitchhiking" region is highly visible because of the co-association of nearby allelic variants. Thus, if such a "sweep" is ongoing, we should have little trouble finding it. In humans we've found a lot of them, which is a big piece of evidence for the rapidity of human evolution during the past 40,000 years.

    But all that describes the dynamics of a single, strongly selected, mutation. What if a trait comes under selection, but the variation in the trait is explained not by a single gene, but by dozens or hundreds of genes? Pritchard and Di Rienzo outline such a scenario:

    The key point is that we should expect such an adaptation to occur by small allele frequency shifts spread across many loci. As a hypothetical example, consider the adaptation of human height — a trait for which there are probably hundreds of SNPs that each affect height by a few millimeters. Strong selection for increased height could be very effective, as height is extremely heritable. But at the level of individual SNPs, the effect of selection would be rather weak, exerting just a small upward pressure in favour of each of hundreds of 'tall' alleles. Suppose that at 500 SNPs, the tall alleles each increase the expected height of a person by 2 mm. Then, an average shift of just 10% in the population allele frequency of each tall allele would increase average height in the population by 20 cm (assuming that SNPs contribute additively). Although these numbers are hypothetical, they illustrate that, for a highly polygenic trait, a dramatic adaptive response could result from modest allele frequency changes at many loci. This model is different from classical sweep models. Most importantly, adaptation could occur without dramatic allele frequency changes and without adaptive fixation events.

    But the description isn't precisely what would happen in the case of selection on stature. Consider:

    1. It is true that alleles that already exist in the population provide the most immediate opportunity for change under directional selection. Any short-term phenotypic evolution we see is likely to be caused by changes in the frequency of standing variants.

    2. Some of the alleles that affect stature are constrained by their effects on other phenotypes. They might not change, even under directional selection on stature.

    3. Stature may be affected by hundreds of loci, but these do not account for equal proportions of the additive variance. Loci are subject to selection roughly in proportion to the additive variance in fitness they explain. Directional selection on stature will change the allele frequencies for a few loci quite a bit more quickly than most.

    The distribution of effect sizes is fairly well known for stature in humans. For example, Park and colleagues [2] this spring plotted the distribution of effect sizes for variants discovered by GWAS in 63,000 Europeans:

    Effect size distribution of variants found to explain heritability of stature, Crohns and BPC cancers in human genome-wide association studies

    In the figure, (a) is based on observed loci -- for stature, this includes 30 loci that reached significance in the GWAS without follow-up genotyping. There is a pretty severe ascertainment bias against small effect sizes, so curve (b) attempts to model the actual distribution correcting for ascertainment. Curve (c) is normalized to give the three conditions the same observed range.

    You can see that if we suddenly started selecting for height, most of the genetic response would come from a very small proportion of the loci that explain the current additive variance. These would be the subset of loci in the large-effect-size tail of the distribution, excluding those that are constrained by their role in other phenotypes under selection.

    4. As an allele becomes common enough (going up toward fixation), the locus will account for less and less of the additive variance in fitness. To maintain the same response to selection, other alleles must pick up the slack. Over time, groups of different alleles will come into focus of selection, sort of like the "cover flow" feature of an iPod. Some alleles increase in frequency across a transient in the mid-frequency range, only to be gradually replaced by others. Most of the phenotypic change occurs as alleles cross rapidly from 40 to 60 percent or so.

    5. A few loci will be special. These account for an appreciable fraction of additive variance even though the favored allele is very rare. As they become common, these favored alleles change in frequency more and more rapidly, and account for more and more of the additive variance. They suck up the oxygen of selection. These alleles will look like a classic sweep.

    6. Over many generations, new mutations may occur that also have strong effects on the trait. They will follow the "special" pattern described in 5.

    The question is how many loci of this type can we expect to exist? We all know that there are two patterns that could account for the heritability of traits like stature, where no common variants have very strong effects. Either the additive variance is spread across many rare variants with large effects, or instead across many common variants with small effects. Pritchard and Di Rienzo's scenario accentuates the second of these -- a small frequency change in many common variants with small effects.

    But if even a small fraction of the additive variance is explained by a few rare variants with strong effects, these may cause most of the phenotypic change, and may look a lot like a standard selective sweep.

    Pritchard and Di Rienzo note that the two options -- a rapid sweep of one or a few locus, versus slight frequency changes in many loci -- are not mutually exclusive. Most cases of directional selection on phenotypes may involve both patterns. If so, that will be very helpful, because we can use the easy-to-find sweeps to target analysis of harder-to-find frequency changes.

    They sketch a strategy for examining the evolution of such traits.

    One type of approach will be to identify phenotypes that may have undergone adaptive changes in particular environments, such as adaptations to cold climate, high altitude or novel ecological conditions. To dissect the genetic basis of such adaptations, one might collect phenotyped samples from closely related populations that have and have not experienced the selective pressure of interest and use GWA mapping to identify relevant quantitative trait loci (QTLs). Additionally, one would want to measure the extent of phenotypic adaptation — estimated as the difference in average phenotype between the adapted and non-adapted populations when they are living under matched conditions (exact matching of conditions may be difficult in human studies). Then one could ask: what fraction of the phenotypic difference can be explained by alleles with large versus small frequency differences? Are the phenotypic effect sizes of QTLs with large allele frequency differences greater than those with subtle frequency shifts10? What fraction of the phenotypic difference cannot be explained by detected sweep signals or QTLs at all (and hence might result from the cumulative effect of many weak QTLs)?

    In another type of scenario, one might hypothesize that a particular aspect of the environment is an important selective factor (for example, climate or diet) but it is unclear what all the relevant phenotypes are. In this case, we might study adaptation by looking at sets of populations that have independently adapted to the same selective pressures. One type of signal would be alleles that show parallel frequency shifts in response to similar environmental pressures in distantly related populations (although this type of approach is unlikely to be powerful for alleles with very small effects).

    These are exactly the kind of tests that we are working on here at Wisconsin. We have some pretty promising ideas, I think. If you're on a dissertation grant panel, would you please give some money to my students who want to apply these approaches?

    I mean, really, this is the best application of anthropology to develop new genetic approaches, rich in theory and in empirical evidence. Humans are the ideal model organism, because we know the histories and ecologies of different populations. Since the development of agriculture, we've had several ongoing natural selection experiments in our species.

    Nor can we ignore the longer prehistory of human populations. I tend to think that a lot of recent selection has involved new genetic solutions in cases of strong stabilizing selection. A trait like brain size does not evolve under classic directional selection, but instead as a consequence of shifting patterns of stabilizing selection. With intense selection on multiple functions, such traits are constrained in their evolutionary response. Slight frequency changes are not likely to relax such constraints, but a new mutation of large effect might break a long-standing genetic logjam.

    So I think Pritchard and Di Rienzo have outlined many important issues in this review. They have the potential to be highly productive for people with a little talent for applying theory to the data.


    References

Subscribe to fitness effects

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.