john hawks weblog

paleoanthropology, genetics and evolution

Error message

  • Notice: Trying to get property of non-object in _biblio_citekey_print() (line 1891 of /var/www/johnhawks.net/public/modules/biblio/biblio.module).
  • Notice: Trying to get property of non-object in _biblio_citekey_print() (line 1891 of /var/www/johnhawks.net/public/modules/biblio/biblio.module).

GWAS

  • Quote: Lederberg on the candidate gene approach

    Sun, 2012-09-30 00:46 -- John Hawks

    In my last post ("Quote: Lederberg on Haldane") I pointed to a 1999 article by Joshua Lederberg [1]. Later in the article, he considers an interesting question:

    Outside the domain of malaria and the erythrocyte, the pickings for established polymorphisms in relation to human disease are rather thin. Why have they predominated for malaria? Its geographic, climatic, and altitudinal restrictions–related to the habitats of vector mosquitoes–lend themselves to epidemiological revelation. In addition, few diseases, barring mainly tuberculosis, have a prevalence and fitness-impairing morbidity so high that subject genes will have significant penetrance. Most other morbid infections will attack a small sector of the population, thus introducing high “environmental” variance into the heritability calculations. This is also compounded by maternally inherited immunity and, needless to say, elements of culture (including saluto-genic technology). Most of our successes have entailed the ascertainment of candidate genes, e.g., the blood group and MHC polymorphisms, and searches for disease correlations to them. These are abundant and can be partially explained by specializations in epitope presentation to the immune system or antigenic mimicry between parasites' surface antigens and self-antigens of the host.

    Today we have GWAS, which has potentially much greater power to show associations between large samples of people and previously unknown risk loci. Yet the same limitations remain true that Lederberg pointed out in 1999: Environmental variance in parasite or pathogen incidence and load is very high, acquired immunity complicates the analysis of effects, and the present effects of pathogens are different than they would have been in past populations with greater frailty.


    References

  • The risk gradient

    Wed, 2011-11-09 23:58 -- John Hawks

    Ann Gibbons reports [1] from the International Congress of Human Genetics, on papers that examine GWAS risk alleles for type 2 diabetes: "Diabetes Genes Decline Out of Africa" (paywall).

    At the poster session, Stanford graduate student Erik Corona stood in front of a Google Earth map of the world that he finds surprising. On this map he had plotted the frequency of 12 gene variants known to be associated with type 2 diabetes in 51 populations from Australia to Zaire. It shows “a clear gradient of red to green from west to east, from Africa to Asia,” Corona says (see map). “Something strange is going on with type 2 diabetes.”

    This is of course a challenging problem because risk alleles identified in one population may not replicate in other populations. The most well-known example is ApoE4, strongly associated with Alzheimer's Disease in Europeans, but not in Africans. More generally, looking at a set of risk variants that are identified in one population introduces an ascertainment bias that constrains their likely frequencies in other populations. An allele is more likely to yield a statistically significant association with a trait if the allele is not too rare. If we take many alleles associated with a trait, we're likely to see some gradient across populations due to this bias alone.

    Hidden ascertainment bias is a problem we run up against quite a lot. It may not apply in this case, depending on where the risk alleles were identified, in particular since many risk alleles for type 2 diabetes appear to be linked to recent positive selection (explaining why I got interested).


    References

    1. Gibbons A. Diabetes Genes Decline Out of Africa. Science. 2011;334(6056):583 - 583.
  • Archaic genome snooping from GWAS

    Tue, 2011-10-18 22:08 -- John Hawks

    The 23andMe blog reports on a recent genome-wide association study of type 2 diabetes in South Asian people: "SNPWatch: Genetic Variants Associated with Type 2 Diabetes in South Asians and Europeans". The study was published in August in Nature Genetics, by Kooner and colleagues [1]. As described in the post:

    The authors behind this study carried out one of the largest type 2 diabetes studies to date, scanning the genomes of nearly 19,000 people with the disease and 40,000 without it, all of South Asian descent. Their analysis identified six SNPs linked to this condition. When they combined their results with previously published findings in other ethnicities, they found suggestive evidence that five of the six SNPs were also associated with type 2 diabetes in European populations. Similarly, there was some evidence that the majority of the genetic risk factors in Europeans were also linked to disease in South Asians. Only three genetic factors were not shared at all between the two groups.

    Type 2 diabetes is presently a very interesting topic from an evolutionary viewpoint, and we're beginning to think about it very seriously now. Whenever I see a study like this, I quickly look at the Neandertal and Denisovan genomes to see if any interesting patterns emerge. Sharing GWAS SNP alleles is not necessarily very interesting, because the GWAS risk alleles are mostly not causative themselves; each may be linked to some causative allele that remains to be discovered. The linkage is a function of the evolutionary history of that chromosome region, and many of the key historical events that affect linkage happened within the last 10,000 years. So we really shouldn't expect GWAS alleles to be predictive of phenotypes in Neandertals or Denisovans.

    Still, these alleles are associated with disease in living people, and their genotypes in ancient humans may illuminate cases where the evolutionary history links the population across the gene networks that influence disease. A closer examination of the genealogy around these loci will be more informative, but as a first look I often just genotype the archaic genomes for SNPs in a study. The six SNPs reported here include two cases where the archaic genomes have the derived risk alleles, one of them present in Neandertals but not the Denisova genome. Again, that doesn't tell us anything about the phenotype of the ancient people, but worth a closer look to see if one or both of these is an introgressive allele.

    We have here the GWAS Catalog genotypes for all the archaic genomes. Not much actionable information but there are some interesting phenotypes in there. I'll share some more of those later this week.


    References

  • Steampunk genetics

    Thu, 2011-09-15 10:42 -- John Hawks

    An article in European Journal of Human Genetics that came out a couple of years ago has always impressed me, and I just noticed that it has gone to open access: "Predicting human height by Victorian and genomic methods" [1].

    The premise is that Galton's method of predicting stature from relatives still gives substantially better predictions than genotyping a collection of known variants that influence stature. It's basically a restatement of the "missing heritability" problem in more concrete (and colorful) terms. Here's a passage from the abstract:

    For highly heritable traits such as height, we conclude that in applications in which parental phenotypic information is available (eg, medicine), the Victorian Galton's method will long stay unsurpassed, in terms of both discriminative accuracy and costs. For less heritable traits, and in situations in which parental information is not available (eg, forensics), genomic methods may provide an alternative, given that the variants determining an essential proportion of the trait's variation can be identified.

    A great illustration. We know the trait is heritable, but the heritability is spread across many, many loci most of which remain unidentified. Hence, we can't predict the stature well from genotypes.


    References

  • Mailbag: Genetics of schizophrenia

    Sat, 2011-09-03 14:49 -- John Hawks

    Re: Schizophrenia

    I am watching/listening to your Teaching Co. DVD lecture series on Human Evolution and very much enjoying it. I graduated from Beloit College in '68 with a BA in Anthro, and while I have tried to keep up with new discoveries, it has been haphazard. Your lecture series really helps me appreciate what huge progress has been made in this field since 1968.

    I recently retired from a career in Mental Health. I have wondered why schizophrenia is so common amongst humans and have thought it might be like sickle cell anemia.
    A very small dose of the schizophrenia complex of genes might be connected to our use of symbolism and creativity. A large dose might create the dysfunction of psychosis.

    Thanks for your research and for being able to express the material with such clarity and energy.

    Thank you so much for your kind words! We put so much work into doing the best lectures possible, and I'm really proud of the result.

    Your question about schizophrenia is one that really strikes at what evolutionary biologists are thinking about the subject. We've been thinking with our work on recent selection in human populations that we might find some selected genes with side-effects on cognition. Many human geneticists have been looking for genes that explain the risk of schizophrenia, and we know that there are a few common gene variants that affect risk. But it appears that most of the risk must be explained by gene variations that are found in one or a few families. It seems to be a case of "every unhappy family is unhappy in its own way."

    That makes it hard to find and understand the genetic causes, but as we move toward whole-genome sequencing and more and more observations on different families, we will begin to understand more about the causes.

  • Genetics without the disclaimers

    Thu, 2011-03-17 15:48 -- John Hawks

    The NY Times covers a new genome-wide association study of SNP variants and response to exercise ("Is Fitness All in the Genes?").

    The phenotype is improvement in maximum oxygen consumption volume. Some people have rapid improvement with exercise and others don't. Straightforward enough, and there is one SNP that accounted for 6 percent of the phenotypic variation, which is quite strong as far as these associations go. Usually GWAS associations explain a much smaller fraction of the phenotypic variance.

    The final few paragraphs of the article irritated me. It's like these stories have to follow a form, with a long disclaimer at the end. They report the facts -- variant explains 6 percent of variation -- and then they proceed to preach about how the facts may not matter:

    “It will be years, if ever,” said Dr. Bouchard, before gene tests exist that can reliably separate high and low responders. Even if and when such tests become available, he continued, the results will not constitute an excuse for skipping workouts. “There are countless other benefits provided by exercise,” he said, apart from whether it raises your VO2 max. “Exercise can reduce blood pressure and improve lipid profiles,” he said. It can better your health, even if, by certain measures, it does not render you more aerobically fit.

    More fundamentally, Dr. Bouchard said, elements of the interplay of genetics, environment, the human body and resolve probably always will remain mysterious and stubbornly individualized, no matter how much science disentangles the genome. People who don’t have an ideal version of the ACLS1 gene to prompt aerobic improvements from exercise, for instance, might harbor a different, unidentified gene that just makes exercise feel enjoyable, regardless. So, too, might someone whose body is genetically predisposed not to respond aerobically to running blossom during weight training sessions.

    Hello? Six percent of the variance is six percent of the variance. The article ought to just say that 94 percent of the variance is not explained by this SNP. That answers the question! That unadorned fact tells you that the SNP isn't strongly predictive about exercise response for any single person. To do even better, the article ought to tell us how much of the total variance is explained by all the SNPs together.

    I know, statistics can be difficult for NY Times readers, but honestly explaining the result would take a lot less space than what the article does do, which is to give us a long litany of "mysterious and stubbornly individualized." And the "different, unidentified gene that just makes exercise feel enjoyable, regardless."

    Come on, people! Why not just tell us about earth spirits and auras?

    The moralizing always goes the same direction. You'll never hear hand-wringing about how we can't trust exercise because it only predicts a small proportion of the overall variance in mortality risk. What about the "mysterious and stubbornly individualized" people who are healthy at 80 without ever lifting a barbell?

  • Polygenic traits and directional selection

    Sat, 2010-09-18 13:41 -- John Hawks

    This has been an eventful week for those of us who study the dynamics of recent selection in humans. The most significant event was the publication of a paper describing genetic analysis of a long selection experiment in Drosophila. Although the experiment differs from most natural instances of selection in some important ways, the results give some very helpful corroboration that the recent human pattern of adaptive evolution was rapid and of an expected pattern for massive selection on many traits.

    Meanwhile, Jonathan Pritchard and Anna Di Rienzo have a short review in the current Nature Reviews Genetics [1], discussing the idea that a large fraction of adaptive evolution may be difficult to find with current genetic evidence.

    Their idea is that polygenic adaptations are unlikely to occur by successive "sweeps" of new adaptive mutations.

    It seems likely to us that, as in traditional quantitative genetic models, many — possibly even most — adaptive events in natural populations occur by polygenic adaptation. Polygenic adaptation could allow rapid adaptive shifts, yet would often go undetected using conventional methods for detecting selection. Moreover, polygenic adaptation is qualitatively different from the models of adaptive substitutions that dominate the population genetics literature.

    This is not a new idea, but Pritchard and Di Rienzo review it in a productive way, and the topic is worth some deeper thought...

    An adaptive genetic substitution is often modeled as an episode of logistic growth. A new mutation, initially in a single copy, increases exponentially in numbers until it is very common in the population. After this point, it continues to increase in frequency up to fixation, but progressively slowly. The entire process takes hundreds or a few thousands of generations, which sounds like a long time but is actually very rapid compared to the deep genealogical histories of most genetic loci. The initial rapid increase in numbers carries a region of linked sequence along with the selected variant. This "hitchhiking" region is highly visible because of the co-association of nearby allelic variants. Thus, if such a "sweep" is ongoing, we should have little trouble finding it. In humans we've found a lot of them, which is a big piece of evidence for the rapidity of human evolution during the past 40,000 years.

    But all that describes the dynamics of a single, strongly selected, mutation. What if a trait comes under selection, but the variation in the trait is explained not by a single gene, but by dozens or hundreds of genes? Pritchard and Di Rienzo outline such a scenario:

    The key point is that we should expect such an adaptation to occur by small allele frequency shifts spread across many loci. As a hypothetical example, consider the adaptation of human height — a trait for which there are probably hundreds of SNPs that each affect height by a few millimeters. Strong selection for increased height could be very effective, as height is extremely heritable. But at the level of individual SNPs, the effect of selection would be rather weak, exerting just a small upward pressure in favour of each of hundreds of 'tall' alleles. Suppose that at 500 SNPs, the tall alleles each increase the expected height of a person by 2 mm. Then, an average shift of just 10% in the population allele frequency of each tall allele would increase average height in the population by 20 cm (assuming that SNPs contribute additively). Although these numbers are hypothetical, they illustrate that, for a highly polygenic trait, a dramatic adaptive response could result from modest allele frequency changes at many loci. This model is different from classical sweep models. Most importantly, adaptation could occur without dramatic allele frequency changes and without adaptive fixation events.

    But the description isn't precisely what would happen in the case of selection on stature. Consider:

    1. It is true that alleles that already exist in the population provide the most immediate opportunity for change under directional selection. Any short-term phenotypic evolution we see is likely to be caused by changes in the frequency of standing variants.

    2. Some of the alleles that affect stature are constrained by their effects on other phenotypes. They might not change, even under directional selection on stature.

    3. Stature may be affected by hundreds of loci, but these do not account for equal proportions of the additive variance. Loci are subject to selection roughly in proportion to the additive variance in fitness they explain. Directional selection on stature will change the allele frequencies for a few loci quite a bit more quickly than most.

    The distribution of effect sizes is fairly well known for stature in humans. For example, Park and colleagues [2] this spring plotted the distribution of effect sizes for variants discovered by GWAS in 63,000 Europeans:

    Effect size distribution of variants found to explain heritability of stature, Crohns and BPC cancers in human genome-wide association studies

    In the figure, (a) is based on observed loci -- for stature, this includes 30 loci that reached significance in the GWAS without follow-up genotyping. There is a pretty severe ascertainment bias against small effect sizes, so curve (b) attempts to model the actual distribution correcting for ascertainment. Curve (c) is normalized to give the three conditions the same observed range.

    You can see that if we suddenly started selecting for height, most of the genetic response would come from a very small proportion of the loci that explain the current additive variance. These would be the subset of loci in the large-effect-size tail of the distribution, excluding those that are constrained by their role in other phenotypes under selection.

    4. As an allele becomes common enough (going up toward fixation), the locus will account for less and less of the additive variance in fitness. To maintain the same response to selection, other alleles must pick up the slack. Over time, groups of different alleles will come into focus of selection, sort of like the "cover flow" feature of an iPod. Some alleles increase in frequency across a transient in the mid-frequency range, only to be gradually replaced by others. Most of the phenotypic change occurs as alleles cross rapidly from 40 to 60 percent or so.

    5. A few loci will be special. These account for an appreciable fraction of additive variance even though the favored allele is very rare. As they become common, these favored alleles change in frequency more and more rapidly, and account for more and more of the additive variance. They suck up the oxygen of selection. These alleles will look like a classic sweep.

    6. Over many generations, new mutations may occur that also have strong effects on the trait. They will follow the "special" pattern described in 5.

    The question is how many loci of this type can we expect to exist? We all know that there are two patterns that could account for the heritability of traits like stature, where no common variants have very strong effects. Either the additive variance is spread across many rare variants with large effects, or instead across many common variants with small effects. Pritchard and Di Rienzo's scenario accentuates the second of these -- a small frequency change in many common variants with small effects.

    But if even a small fraction of the additive variance is explained by a few rare variants with strong effects, these may cause most of the phenotypic change, and may look a lot like a standard selective sweep.

    Pritchard and Di Rienzo note that the two options -- a rapid sweep of one or a few locus, versus slight frequency changes in many loci -- are not mutually exclusive. Most cases of directional selection on phenotypes may involve both patterns. If so, that will be very helpful, because we can use the easy-to-find sweeps to target analysis of harder-to-find frequency changes.

    They sketch a strategy for examining the evolution of such traits.

    One type of approach will be to identify phenotypes that may have undergone adaptive changes in particular environments, such as adaptations to cold climate, high altitude or novel ecological conditions. To dissect the genetic basis of such adaptations, one might collect phenotyped samples from closely related populations that have and have not experienced the selective pressure of interest and use GWA mapping to identify relevant quantitative trait loci (QTLs). Additionally, one would want to measure the extent of phenotypic adaptation — estimated as the difference in average phenotype between the adapted and non-adapted populations when they are living under matched conditions (exact matching of conditions may be difficult in human studies). Then one could ask: what fraction of the phenotypic difference can be explained by alleles with large versus small frequency differences? Are the phenotypic effect sizes of QTLs with large allele frequency differences greater than those with subtle frequency shifts10? What fraction of the phenotypic difference cannot be explained by detected sweep signals or QTLs at all (and hence might result from the cumulative effect of many weak QTLs)?

    In another type of scenario, one might hypothesize that a particular aspect of the environment is an important selective factor (for example, climate or diet) but it is unclear what all the relevant phenotypes are. In this case, we might study adaptation by looking at sets of populations that have independently adapted to the same selective pressures. One type of signal would be alleles that show parallel frequency shifts in response to similar environmental pressures in distantly related populations (although this type of approach is unlikely to be powerful for alleles with very small effects).

    These are exactly the kind of tests that we are working on here at Wisconsin. We have some pretty promising ideas, I think. If you're on a dissertation grant panel, would you please give some money to my students who want to apply these approaches?

    I mean, really, this is the best application of anthropology to develop new genetic approaches, rich in theory and in empirical evidence. Humans are the ideal model organism, because we know the histories and ecologies of different populations. Since the development of agriculture, we've had several ongoing natural selection experiments in our species.

    Nor can we ignore the longer prehistory of human populations. I tend to think that a lot of recent selection has involved new genetic solutions in cases of strong stabilizing selection. A trait like brain size does not evolve under classic directional selection, but instead as a consequence of shifting patterns of stabilizing selection. With intense selection on multiple functions, such traits are constrained in their evolutionary response. Slight frequency changes are not likely to relax such constraints, but a new mutation of large effect might break a long-standing genetic logjam.

    So I think Pritchard and Di Rienzo have outlined many important issues in this review. They have the potential to be highly productive for people with a little talent for applying theory to the data.


    References

Subscribe to GWAS

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.