john hawks weblog

paleoanthropology, genetics and evolution

recent selection

  • Lactase persistence on the march

    Fri, 2009-08-28 12:55 -- John Hawks

    Everybody's noticing the new article in PLoS Computational Biology about lactase persistence, which I've been emailed from several readers. Thanks for sending it, everyone -- it's always helpful even if I get it more than once!

    The short version is that the authors place the origin in Germany around 7500 years ago, and using a 2-d forward-time dispersal model, find that fits well with the distribution of allele frequencies in Central Europe.

    There's only one little problem: It's hard to see how the same scenario gets the allele to India. Or, for that matter, Ireland. The authors posit that Indian lactase persistence will be found to be caused by a "diversity" of alleles. They seem to have missed this paper that found a greater diversity of lactase-associated haplotypes "north of the Caucasus" -- consistent with an initial steppe dispersal. OK, that's two problems, and they're not little.

    Their potentially interesting finding -- the dispersal of lactase persistence in their model didn't increase the diffusion of other central European genes -- should inspire more modeling. How independent can a strongly-selected allele be of its genomic background? Can selection cause demographic events without affecting unlinked neutral variation? I imagine we can explore this issue with differential equations.

    (see also, Dienekes, Yann Klimentidis, GNXP)

    References:

    Itan Y, Powell A, Beaumont MA, Burger J, Thomas MG. 2009. The Origins of Lactase Persistence in Europe. PLoS Comput Biol 5(8): e1000491. doi:10.1371/journal.pcbi.1000491

    Enatteh NS and 26 others. 2007. Evidence of Still-Ongoing Convergence Evolution of the Lactase Persistence T-13910 Alleles in Humans. Am J Hum Genet 81:615-625. doi:10.1086/520705

  • Mailbag: Statistics and future evolution

    Mon, 2009-08-24 09:16 -- John Hawks

    I was trying to find out more
    about recent research predicting a relative convergence of racial features in
    future generations (but I don't know anything about "rapid evolution by drift"
    or things like that). I'm aware of debunked claims (inc. your debunking) from
    media reports, but I'm not aware of research that actually contains enough
    scientific merit to make a valid prediction. I decided to write to you after reading
    your review of a lecture by UCL geneticist Steve Jones.

    If there is any reference you can give to someone like me who has very little genetic
    training (past Mendel, anyway) I would greatly appreciate it.

    I'll be glad to help if I can. Population genetics shouldn't be too much of a challenge for you; it's basically statistics (e.g., evolution by genetic drift is modeled by repeated binomial sampling).

    We have a very high rate of gene flow between "racial" or geographic groups today compared to the past, and so we can predict that gene frequencies should converge in the future. But there are two issues -- first, the rate of change by chance in very large populations is very slow; and second, some genes may be (or recently have been) subject to selection processes that maintain diversity. That second is a complicated problem because selection pressures may be different for every gene.

  • Spatial variation and near-fixed selected alleles

    Thu, 2009-06-11 14:39 -- John Hawks

    I couple of people have asked me about a new paper in PLoS Genetics by Graham Coop and colleagues, titled, "The role of geography in human adaptation." The paper is open access, and while the details of genetic measures and simulations can be hard to follow, I think it's a great example of the way recent work on selection and human diversity has been structured.

    I'll just expand on a few of the topics in the paper, and discuss how they relate to the previous findings about the number and age of selected variants in human populations.

    Here's the paper's abstract:

    Various observations argue for a role of adaptation in recent human evolution, including results from genome-wide studies and analyses of selection signals at candidate genes. Here, we use genome-wide SNP data from the HapMap and CEPH-Human Genome Diversity Panel samples to study the geographic distributions of putatively selected alleles at a range of geographic scales. We find that the average allele frequency divergence is highly predictive of the most extreme FST values across the whole genome. On a broad scale, the geographic distribution of putatively selected alleles almost invariably conforms to population clusters identified using randomly chosen genetic markers. Given this structure, there are surprisingly few fixed or nearly fixed differences between human populations. Among the nearly fixed differences that do exist, nearly all are due to fixation events that occurred outside of Africa, and most appear in East Asia. These patterns suggest that selection is often weak enough that neutral processes—especially population history, migration, and drift—exert powerful influences over the fate and geographic distribution of selected alleles.

    The paper looks for "nearly fixed" genetic differences between populations, and finds relatively few of them. That's relatively well-known; the FST-based test has been done on fewer populations with similar results (e.g., Williamson et al. 2007; Barreiro et al. 2008). This paper has the HGDP panel, which includes many more populations, and therefore is able to add geographic resolution to these older results. They find that the geographic distribution of near-fixed alleles is clinal; there aren't strong boundaries delimiting the geographic distributions of most apparently selected alleles. That means that the same demographic forces affecting neutral genetic variation have also affected recently selected alleles.

    Is that surprising? As we pointed out in our 2007 paper, the recent demographic history of human populations has included a lot of population growth. This means that the number of adaptive mutations should have increased during the last 10,000--20,000 years. High-FST selected alleles can only reflect selected mutations that are older than this (old enough to reach near fixation in one population), or are extraordinarily strong. A few mutations are exceptionally strong in their selective advantages -- SLC24A5 and lactase persistence seem to be examples. But as long as adaptive mutations are intrinsically rare, very few of them could have occurred in the small populations of 20,000 years ago or earlier, even if many happened in the large populations of the Holocene. So I think the new paper actually reinforces the interpretation of acceleration. The pattern we're seeing today with new mutations just can't be a feature of human evolution before around 20,000 years ago.

    If selection is affected by demographic processes, does that mean that it is "weak"? Clearly, "weak" is a matter of scale. Adaptive genes disperse through a spatially structured population very slowly, even if they confer very large fitness advantages. That means that their dispersal is highly dependent upon demographic conditions, such as the disproportionate growth of some populations or occasional long-distance gene flow. Locally, an allele may rapidly increase under selection, but that effect may have little influence on the evolution of distant populations.

    We see that pattern with genes known to be under strong selection in humans, like the ones that help some people resist malaria. Sickle cell, hemoglobin C and E, alpha- and beta-thalassemia, ovalocytosis, G6PD deficiency all have restricted geographic ranges that parallel the clinal pattern of neutral genes. There is an important difference: the patterns of these genes diverge in areas where malaria risk changes rapidly with geography (like coastal versus inland areas of Mediterranean Europe), and some of them have wide geographic distributions compared to their young haplotype ages (like sickle cell). But even in the latter cases, most are too rare to elevate the FST of surrounding SNP markers. Malaria adaptations are a tremendous example of the way that demographic conditions limit strong selection.

    Africa versus other populations

    Derived alleles are expected to have lower frequencies on average than ancestral alleles. So if a population has a bias toward higher-frequency derived alleles, that may be evidence against neutral evolution. The paper finds that this bias is greater in non-African populations than within Africa:

    The overall genic enrichment is present in all three population comparisons, and each tail seems to be similarly enriched for high- FST genic SNPs. However, the number of derived alleles in each tail does differ substantially and is biased towards derived alleles outside Africa and especially in east Asia. Thus, the statistical evidence for enrichment of events inside Africa is weaker than for the other two populations (we return to this point later).

    In general, populations outside Africa have a genome-wide bias toward higher frequencies of derived alleles. The causes of that bias aren't clear -- ascertainment may account for some of the bias but cannot account for all of it; it's possible that early demographic events may explain some of the bias but the pattern isn't obvious.

    The FST-based tests of neutrality are most powerful when a new allele has swept several rare mutations with it to near-fixation. Rare mutations tend to be derived ones. So the power of the test depends on how many rare mutations there are to start with, and what their frequencies are in other populations that didn't have the same selected allele.

    It's one of many issues that make finding selection in African populations slightly different from elsewhere. I think that Africans have undergone as much, and very possibly more, selection by new adaptive mutations as other populations. But our 2007 work suggested that the modal age of the selection we ascertain in Africa may be older than in other regions. That would be consistent with demographic history, since Late Pleistocene African populations were larger than others. But it's possible that genome-wide features like faster LD decay, higher heterozygosity, and more ancestral versus derived variants may also influence our estimates of the timing and number of selected alleles in Africa.

    Polygenic adaptation

    Toward the end of the paper, the authors discuss the pattern of local adaptation in a more general sense. Why should there be relatively few near-fixed genetic differences between populations, if human ecological changes suggest that local adaptation should have been a powerful force in our recent evolution? One possibility is acceleration -- most of the variants are too recent to have reached near-fixation in any single population.

    But the authors mention another possible influence that we've also been thinking about: epistatic interactions among new variants. For example, lots of skin pigmentation loci are known to have been under recent selection, but only a couple of them have reached near-fixation in any population. The rest are at lower frequencies. Since these alleles all affect the same phenotype, they're subject to diminishing returns. As one lighter-pigment allele becomes common, it reduces the strength of selection on the others. The population doesn't have to fix for any of them; in fact, selection probably cannot drive more than one or two up to fixation since the rest of them compete with each other.

    Over the very long term, this situation would be sorted out. A handful of loci that optimize skin pigmentation might ultimately go to high frequencies or fixation, for some alleles the costs may exceed the benefits and they will disappear. Others, relatively neutral to each other, may fix by drift. But the "very long term" is a span of hundreds of thousands of generations. Here we're talking about a few hundred generations at most. So human populations aren't anywhere near an optimum, they're in a transient where epistatic interactions may be quite important.

    Greg Cochran and I have been discussing this idea for some time. We call it the "Stooge effect". Think of the Three Stooges all trying to run through a door at the same time and getting stuck in the middle. That's what these genes are doing -- all of them are competing to respond to selection, but each is slowed by the presence of the others.

    It's not a new idea -- Frank Livingstone used to talk about this general concept with different malaria adaptations. What's new is the increasing evidence that humans are really in a transient with a lot of genes out of equilibrium. It's very possible that for some phenotypes, standing variation has been an epistatic block on the selection of new mutations. For others, the emergence of some new mutations has limited the trajectory of selection on others.

    Conclusion

    All in all, I think this paper is a nice contribution to our understanding of the pattern and rate of recent positive selection in human populations. Certainly, the HGDP sample will continue to be a very informative addition to our understanding of spatial dynamics in ancient humans. The addition of the new HapMap v.3 samples may be even more important, because these represent further regions with roughly the same discovery power as the initial three HapMap samples. And of course, we have the 1000 Genomes sample coming up, adding significant potential for discovering rarer selected variants.

    References:

    Coop G, Pickrell JK, Novembre J, Kudaravalli S, Li J, et al. 2009. The Role of Geography in Human Adaptation. PLoS Genet 5(6): e1000500. doi:10.1371/journal.pgen.1000500

  • Richard Lewontin: "[T]oo rapid for genetic adaptation"

    Tue, 2009-05-26 22:56 -- John Hawks

    I have had a New York Review of Books essay by Richard Lewontin, titled, "Why Darwin?" on my desktop for a week without getting to the last section of it.

    Like many essays in the NY Review of Books, Lewontin's shoehorns small points from the books into an argument of his own. As you might guess from the title, Lewontin's theme is that Darwin has been overrated -- a result of biologists overemphasizing a "great man" story of the history of their science, and an unjustified belief in the ubiquity and power of natural selection. Lewontin mobilizes his argument against Jerry Coyne's Why Evolution Is True.

    I don't really find the "pluralist versus adaptationist" debate very interesting. Despite the vocal complaints of some, I can't ever seem to locate the mythical "adaptationists" who deny that non-adaptive evolution ever happens. So the "debate" always comes down to whether particular adaptive hypotheses are true. Since no scientific hypothesis is true a priori, and since "those adaptationists are always saying stupid things" is not a scientific argument, I don't see the point.

    Still, I meant to get to the last section of Lewontin's essay, and this morning I finally read it. To close his case for the weakness of natural selection, Lewontin turns to another new book by Greg Gibson, titled, It Takes a Genome: How a Clash Between Our Genes and Modern Life Is Making Us Sick. The book is an extended account of "diseases of civilization", a topic that I discussed here last week ("Arrested adaptation and the 'diseases of civilization'"). Here's a passage from the book's promotional material (on the Amazon page):

    In It Takes a Genome, Greg Gibson posits a revolutionary new hypothesis: Our genome is out of equilibrium, both with itself and its environment. Simply put, our genes aren’t coping well with modern culture. Our bodies were never designed to subsist on fat and sugary foods; our immune systems weren’t designed for today’s clean, bland environments; our minds weren’t designed to process hard-edged, artificial electronic inputs from dawn ‘til midnight. And that’s why so many of us suffer from chronic diseases that barely touched our ancestors.

    Set aside for a moment how "revolutionary" this hypothesis is -- I'll revisit the idea in another post. The question is whether this mismatch between our environments and our genetic variation means that human evolution "stopped" or that we are still "adapted to the Pleistocene". As I pointed out in my earlier post, both propositions are true: human populations are mismatched with their current environments, and human populations have been recently adapting very rapidly to new environments. Here's what I wrote last week:

    [M]any of today's chronic diseases reflect the reaction of human biology to novel environments for which our genes are not well adapted. But we don't need to exaggerate the slowness of human evolution to arrive at that conclusion. Recent rapid evolution of humans does not mean that humans are perfectly adapted to the present. Far from it -- if human populations have undergone rapid genetic changes into the past thousand years, it is a strong sign that fitness has not yet maximized in the post-agricultural environment.

    I can contrast my point of view with Richard Lewontin's, who perfectly reiterates the "human evolution stopped in the Pleistocene" version of events.

    An important property of adaptive evolution is that it is usually a slow process. Certainly there are cases where a single genetic change can mean the difference between life and death in a hostile environment. The classic cases are the mutations that give pathogenic microorganisms the ability to resist antibiotics or mutations that allow crops to resist pathogens, for example insects or herbicides. But these are not representative models for how species adapt, by accumulation of mutations of small effect, to changes in food availability, temperature modifications, and the thousand shocks that flesh is heir to. The usual small differences in fitness among genotypes are therefore manifest as detectable evolutionary change only after thousands of generations.

    This deliberate tempo has presented the human species with a problem of adaptation. With a human generation of about twenty-five years, there have been roughly only one hundred generations since the founding of the Roman Republic. Yet the changes in the human environment caused by changes in human activity have been enormous. Changes in diet, habitation, working conditions, the pollution of air and water, and especially the considerable increase of lifespan that result in major alterations and breakdowns in the bodily machinery have all been too rapid for genetic adaptation.

    Notice the false premises: Adaptive evolution is "usually a slow process." Species adapt by "accumulation of mutations of small effect." It's as if he were transported back in time to 1908 where no one had heard of the breeder's equation.

    There's nothing impossible about long series of small changes. But they are not the only mode of adaptation, or even the most likely one. Populations with additive genetic variation that correlates with fitness will change rapidly under selection. The structure of the additive variation may lead to strong selection on one gene of large effect, or selection in parallel across many genes of varying effects. Series of small changes may be required for some adaptations, but a rapid environmental change (as Lewontin observes for humans) may cause bursts of rapid changes in allele frequencies.

    To maintain the slowness of human evolution, Lewontin must do three things:

    1. Assume humans are genetically uniform.

    2. Where humans obviously are not uniform, argue that variations are uncorrelated with fitness.

    3. Ignore any historical or genetic evidence that might contradict 1 and 2.

    Keeping in mind the short length of this section of the essay, Lewontin does manage all three of these conditions.

    I think it's downright sneaky the way Lewontin reinforces the assumption of human genetic uniformity. He refers to "the human genotype" as if there were only one! By emphasizing that "parts of the human genome are out of correspondence with modern life", he precludes the possibility that some human genomes may be more in correspondence than others. Sure, if humans share a single genome, they can't possibly differ in any adaptive way.

    But diversity is the reality. Examples of recent human evolution are fixtures in biology textbooks, from sickle-cell to lactase persistence. These are traits that have rapidly changed in frequency during the last 2500 years, due to changes in recent human environments -- disease for the former, diet for the latter. These rapid transformations in precisely those that Lewontin says are impossible -- environmental changes being "too rapid for genetic adaptation." A number of morphological changes are also evident when comparing archaeological and recent skeletal samples in many parts of the world. Somehow the relevance of these recent changes goes unmentioned in the essay.

    One of the best-characterized examples of evolution in recent populations is the rapid Holocene evolution of pigmentation phenotypes. It's a textbook example of human variation, and several adaptive hypotheses may explain it. So pigmentation would seem an unlikely example of how human evolution has been too slow to cope with the environment. But Lewontin finds a way:

    [H]igh doses of solar radiation that is experienced by surfers on the California beaches might induce an eventually fatal skin cancer, but the cancer death almost always occurs well after reproductive age, so there is no opportunity for selection to act.

    I agree that current patterns of cancer mortality of light-skinned surfers may have little impact on their fitness. In other words, this chronic disease is a sign of an environmental "mismatch" that future genetic evolution is unlikely to erase.

    But why turn to false arguments about the speed of evolution to make this point? Surely Lewontin knows that "reproductive age" in humans is not synchronous with reproductive effort? Skin cancer is one of the earliest-killing cancers, with a good fraction of victims dying at ages when they might otherwise be helping raise their kids or grandchidlren. Lewontin must also know that human populations vary greatly in their skin cancer susceptibility, and that some surfers (the dark pigmented ones) have lower skin cancer rates after the same sun exposure. Skin cancer may or may not be the best explanation for dark pigmentation in low-latitude human populations (there are others, none mutually exclusive), but this example works strongly against Lewontin's claims that natural selection is "slow" and that human environmental changes have been "too rapid for genetic adaptation." We aren't perfectly adapted today, and the rate of our evolution in the recent past was very fast.

    References:

    Lewontin RC. 2009. Why Darwin? New York Review of Books 56(9) May 28, 2009. Online

  • Arrested adaptation and "diseases of civilization"

    Sun, 2009-05-10 23:02 -- John Hawks

    While I was browsing papers for a research project, I happened to re-open the paper, "Stone Agers in the fast lane," written by S. Boyd Eaton, Melvin Konner, and Marjorie Shostak in 1988. This paper reviewed the idea that many chronic disorders like diabetes and cardiovascular disease are actually "diseases of civilization" -- brought on by a mismatch between the human genetic heritage and the current cultural milieu.

    I'm citing this work as part of my continuing observations on biologists who predicted that human evolution must have stopped sometime in the Pleistocene. Eaton e-mailed me very soon after our acceleration paper was published, and it is only fair to say that the 2009 views of these authors may be very different from their 1988 publication. With that note, here's a quick review:

    The current genetic variation in any species is a product of evolutionary forces that affected that species' ancestors in the past -- that's a basic precept of evolutionary theory. So it's hardly more than a syllogism that if the human environment has undergone recent rapid changes, then our genes may do little to protect us from undesirable biological side effects of our new environment.

    But Eaton and colleagues, like many human biologists, went rather further than this observation. They made a point of emphasizing that the pace of human adaptation has been incredibly slow. The hypothesis of very slow human evolution had an desired corollary: the "diseases of civilization" are not merely bad side effects of recent dietary Westernization, but may ultimately be traced to the transition to agriculture -- an event that occurred 10,000 years ago in some societies. Let's consider how they emphasized this idea that human evolution had been glacially slow:

    The gene pool from which modern humans derive their individual genotypes was formed during an evolutionary experience lasting over a billion years. The almost inconceivably protracted pace of genetic evolution is indicated by paleontologic findings that reveal that an average species of late cenozoic [sic] mammals persisted for more than a million years, by biomolecular evidence indicating that humans and chimpanzees now differ genetically by just 1.6 percent even though the hominid-pongid divergence occurred seven millino years ago, and by dentochronologic data showing that current Europeans are genetically more like their Cro-Magnon ancestors than they are like 20th-century Africans or Asians. Accordingly, it appears that the gene pool has changed little since anatomically modern humans, Homo sapiens sapiens, became widespread about 35,000 years ago and that, from a genetic standpoint, current humans are still late Paleolithic preagricultural hunter-gatherers (Eaton et al. 1988:740).

    Not only was the pace of evolution slow when it was happening, but we may have reason to think that recently our gene pool hadn't been changing at all:

    The Late Paleolithic era, from 35,000 to 20,000 B.P., may be considered the last time period during which the collective human gene pool interacted with bioenvironmental circumstances typical of those for which it had been originally selected (Eaton et al. 1988:740).

    The word "originally" in this passage may admit of later changes in selection and thus in some genes. But the paper does not examine known cases of recent change, even on those genes where some kind of recent dietary adaptation was well-known in 1988 -- such as lactase persistence or ALDH2.

    Reading the paper from my current vantage point, where do I think it went wrong? The basic point in the paper is undoubtedly correct -- many of today's chronic diseases reflect the reaction of human biology to novel environments for which our genes are not well adapted. But we don't need to exaggerate the slowness of human evolution to arrive at that conclusion. Recent rapid evolution of humans does not mean that humans are perfectly adapted to the present. Far from it -- if human populations have undergone rapid genetic changes into the past thousand years, it is a strong sign that fitness has not yet maximized in the post-agricultural environment.

    Besides that, dietary influences on health may implicate the rapid cultural and ecological changes of the past 200 years. Westernization of diet is a characteristic of post-industrial economies, not early agriculturalists. Given the reduction in variance of mortality in the last 100 years as well as the short time, it is pretty likely that the genes of human populations have changed little in response to dietary Westernization.

    I think that the rapidity of recent adaptive evolution does imply a different perspective on the "diseases of civilization." For one thing, some people may be resistant to these diseases because they have inherited new protective alleles. If humans had hardly evolved in the post-agricultural environment, we would expect all populations to be equally susceptible to type 2 diabetes, cardiovascular disease, and cancer. Instead, we find that different populations have different characteristic rates of these diseases after adoption of a Western diet.

    Another insight is that some undesirable phenotypes may themselves be the consequences (or side effects) of recently selected alleles. Overdominant alleles like sickle cell naturally stand out in this regard. But the flushing reaction to alcohol, common in Asians with the selected ALDH2 allele, is a less fatal example.

    References:

    Eaton SB, Konner M, Shostak M. 1988. Stone Agers in the fast lane: chronic degenerative diseases in evolutionary perspective. Am J Med 84:739-749.

  • Mutual information between strings of loci

    Sat, 2009-03-28 21:26 -- John Hawks

    Fourth in a series on mutual information and genetic linkage. If you’re happening upon it for the first time, you can find the entire series or the first post, “Information theory: a short introduction”.

    After the last post, you might wonder what the big deal is about these information theoretic measures of linkage. After all, we’ve got lots of other measures of linkage to choose in population genetics, with many years of theory behind them. The basic conclusion about genetic drift was that it adds mutual information to samples over short regions, but that recombination over longer areas washes it out. If the net effect is no linkage, why would we bother to come up with some non-standard linkage measure?

    One answer: If the existing linkage measures were so great for testing neutrality, then we might expect some of the recent genome-wide selection scans to have used them. But they didn’t – instead we have several partially incompatible methods, all of which eschew the usual measures of linkage.

    I’m not going to summarize all the reasons for this, at least not yet. But there is one thing that the current positive selection tests have in common: they all attempt to quantify the decay of linkage across long strings of loci. Each of them—LRH, iHS, LDD—distills into a single number the pattern of reduction of linkage across physical distance.

    After the last post, that may seem odd. If we know the physical distance (or more to the point, the map distance), we can predict the amount of linkage under drift. That really ought to be enough to test neutrality. Yet the existing tests insist on a second dimension: the rate of decline of linkage. Partly, that’s because these tests try to separate positive selection from other things that violate neutrality, like inversions. But it also reflects a real limitation of the methods: they ignore some of the information available in the data, and thereby limit their power to test the hypothesis of genetic drift based on linkage. So they must turn to the rate of linkage decay as a test of drift.

    The trouble with homozygosity

    Massive genotyping projects are not sequencing projects. They provide long lists of genotypes, not sequences. If you’ve got a long list of genotypes, one of the measures of variation immediately available to you is homozygosity. Homozygosity among multiple sites, or joint homozygosity is not mathematically the same as the mutual information between sites, but it also is increased by linkage. To the extent that genetic drift predicts the distribution of long-range linkage, you can test neutrality by looking at the joint homozygosity of two or more sites. For example, the extended haplotype heterozygosity (EHH) is the probability that a pair of gametes that share a single core haplotype will also share identical haplotypes at longer distances. Homozygosity itself is not a test of linkage—for that we have to combine the probabilities of identity of all alleles in some way.

    Homozygosity-based tests of neutrality set up this combination as a ratio: the ratio of homozygosity for one allele or core haplotype versus the homozygosity of other alleles or haplotypes. That procedure throws away a lot of information carried by the fraction of haplotypes that are nonidentical at distance. Since that fraction of non-homozygote mutual information, unlike homozygosity, actually increases with distance, it may in some circumstances provide a more powerful test of neutrality. The use of a ratio of homozygosity also reduces the chance to detect certain classes of non-neutral loci—most obviously, the ones that have two or more selected alleles.

    To put some numbers to these ideas, let’s consider we have two loci, A and B. The alleles of A are a1 and a2, B has b1 and b2. Each of these alleles has an associated frequency, for example p(a1), and the frequency of the haplotype a1b1 will be written p(a1b1). If A and B are independent (unlinked) then the expected value of p(a1b1) would simply be p(a1)p(b1).

    The usual measure of linkage, D, is calculated as

    D = p(a b)p(a b) - p(a b )p(a b )        1 1   2 2     1 2   2 1
    (1)

    It thus involves the frequencies of all four possible haplotypes. At linkage equilibrium, the two terms are expected to be the same, so that D = 0.

    Several other measures of linkage are in common use (and sometimes lead to confusion). There are several reasons why you might want a different measure, and one of the biggest is that D depends on the frequencies of the alleles a1 and b1—low-frequency polymorphisms will have systematically lower linkage values for the same evolutionary scenario. A very good thing about D is that it does not require us to know the frequencies of the individual alleles a1 or b1, only the frequencies of their joint haplotypes. But the expectation that the products in the equation are equal depends on a 2 × 2 contingency table. Three-, four- or more-locus haplotypes demand some other measure.

    Using the same terminology, the homozygosity of the haplotype a1b1 is given as p(a1b1)2. By itself, this tells us nothing at all about linkage. If we combine it with the homozygosity of one or the other allele, (for example with the difference p(a1)2- p(a1b1)2, we will have a measure of the reduction in homozygosity with distance. That reduction in homozygosity still doesn’t tell us about linkage, without considering p(b1). But if we scale the reduction by taking it as a ratio with the reduction in homozygosity of one or more other haplotypes, those other haplotypes give us indirect information about p(b1). So we have an indirect measure of linkage, based on the relative rate of linkage decay.

    Relying on the rate of decay of linkage isn’t such a bad idea if we don’t know the rate of recombination between markers. In that case, we can use the rate of decay of homozygosity of other haplotypes as a scaling factor. But there is a problem lurking: The decay of homozygosity around a marker or core haplotype depends on its frequency. Under drift, higher-frequency haplotypes decay over shorter distances. So our test will be biased in a way that depends on the frequency spectrum of haplotypes.

    It seems much more direct to estimate the mutual information between A and B. We don’t need to use an indirect linkage measure when we have all the frequencies needed to calculate a direct measure. Mutual information is a useful measure because it extends easily to many loci. But it has a disadvantage: it requires us to obtain an independent estimate of the recombination fraction between our markers.

    Maybe that’s not such a problem. High-resolution recombination maps are available for the HapMap markers. If we use those maps, then we can test neutrality with markers at fixed map distances instead of physical distances. That ought to let us easily quantify the genome-wide fraction of mutual information that originated in a given time interval in the past. But we’ll have to remain cautious, since the map distances are worked out under the assumption of neutrality.

    Why multilocus haplotypes are useful

    Why do we care about many loci, when we can test neutrality using only one? Suppose we’re interested in the mutual information across a 500 kb stretch of the genome. We could pick a single marker on each end of the 500 kb, and measure the mutual information between them. That’s basically what I did in the case of drift in the last post, “When genetic drift reduces entropy”.

    That isn’t slicing the sample very finely. Suppose that our two markers both have major alleles at 80 percent. That makes the expected frequency of the most common joint haplotype 64 percent, and the least common 4 percent. In our sample of 120 gametes, we’ll conclude that the two are significantly associated if those values go to around 70 and 10 percent, respectively. That’s like salting the sample with seven or eight copies of the rare haplotype.

    On the other hand, we could salt our sample with more than a dozen copies, evenly split between the two intermediate-frequency haplotypes, and never get a significant result. Even if the expected rare haplotype were completely absent, it wouldn’t be enough to reject the hypothesis of random association, much less the hypothesis of genetic drift. Natural selection can cover its tracks, if two or more linked haplotypes have both increased in frequency.

    Is this likely to be a problem very often? Past some distance, positive selection loses its impact on linked haplotypes, because there’s too much recombination. Near a selected site, linkage is strong enough that there will usually be a single major haplotype hitchhiking upward. In between, there may be minor haplotypes hitchhiking more weakly but we will generally be able to look closer to find a major haplotype. But if there are multiple haplotypes under selection, perhaps because more than one adaptive mutation have occurred, the locus may well look completely neutral.

    An example in human populations may be MC1R. Harding et al. (2000) sequenced the gene in 106 Africans and over 356 Europeans. They found a diverse array of amino acid mutations in Europeans that were absent in their African sample, concluding that purifying selection on skin pigmentation had prevented such mutations from becoming common in Africa. In contrast, purifying selection was relaxed in Europe, allowing several loss-of-function variants to increase in frequency. They found no statistical evidence for positive selection on these loss-of-function variants in Europe. Likewise, no other recent tests for positive selection have shown MC1R to stand out as selected.

    But wait a minute. Where did all those European redheads come from?

    Harding and colleagues found five different coding polymorphisms with frequencies more than 7 percent in Europe, but completely absent in their sample of Africans. That adds up to roughly half the copies of MC1R in Europe, all derived alleles. That kind of emergence of new alleles doesn’t happen by chance, at least not in a population history like Europe’s. But none of the usual tests will reject neutrality when five different alleles have increased under selection. And individually, each of these alleles is too rare to register a rejection of neutrality, at least, if we apply the usual tests.

    What makes this gene so troublesome? There’s no problem estimating the frequency of any single derived allele in Europe, and it’s easy to figure out its extended homozygosity. But if we then compare that to other alleles at the same locus, well, some of them have long haplotypes, too. So none of them have an unusual pattern of LD decay compared to the others.

    What to do? We could try to get a much larger genetic sample, or a large sample of individuals from a part of Europe where one of these alleles is especially common.

    But since I’m lazy, I figure I’ll just consider the mutual information carried by haplotypes of multiple loci. Individually, no single allele may have a frequency high enough to reliably show linkage in a two-locus comparison. Individually, no single allele will reject neutrality by a homozygosity-based test, because the other selected haplotypes decay over very long areas, making the ratio for any single haplotype insignificant. Collectively, the selected haplotypes may carry linkage over long distances. Together they will reject neutrality, even if individually they don’t.

    But first, we need to know the distribution of multilocus mutual information under neutrality.

    Degrees of freedom

    If you can calculate a χ2 and its degrees of freedom, you can skip next four paragraphs.

    In the second post I showed that the mutual information is distributed as a χ2. That’s a statistical distribution with one parameter—the degrees of freedom. The concept of degrees of freedom is one of the trickiest definitions in statistics. Roughly, it means the number of independent factors that contribute to a single observation.

    Let’s take the case of a contingency table, where we have a number of rows, r, and a number of columns, c, where the total number of cells is r ×c. If all the cells were independently distributed, then the number in each cell is expected to be the fraction in its row times the fraction in its column. That’s the expected value. If you take the number you observed in a cell, subtract the number you expected, square that difference, divide it by the expected value, and add up the result for every cell in the table, that’s the χ2 statistic. This statistic is high when the observed numbers are far from the expected ones. It’s low when they are close to expected.

    For a 2 × 2 table, we can ask, how many ways are there for the χ2 statistic to be high? Any one of the cells may have an unusually large number of observations, so we might be tempted to say there are four different ways to get a high value. But it’s obvious that if one value is too high, the value in the same row and the other column must be too low. Likewise, the observed number in the other row and the same column must be too low. And if both those diagonal cells are too low, well then the observed value in the cell diagonal to our high cell must also be high. In other words, all four cells depend on each other. Push one, and you push them all. So there’s really only one axis along which the cells can vary—one degree of freedom.

    With a 3 × 2 table, things are a little different. We can hold one row entirely constant, and just change the proportions in the other two. If one cell has a high observed value, that might be because both of the other rows were shortchanged. Or it might be just a single row that’s low, the other being high as well. That makes two distinct ways that we could get a high χ2 statistic, two degrees of freedom. In general, there are (r - 1)(c - 1) degrees of freedom in a contingency table. It’s always one less than the number of rows or columns because at least one row and column must be shortchanged when a cell is higher than expected.

    Degrees of freedom with comparisons of multiple haplotypes

    From the previous post, you might have anticipated how we could do simulations of drift in samples of multiple markers. For the following, I’m going to examine the mutual information between two sets of three SNP markers. That is, I have a stretch of chromosome roughly 50 kb long from end to end (or more properly, 0.05 cM), with three markers tightly linked at either end. Using the same methods as the previous post, I’m applying the same three-stage population history with genetic drift only to these six markers.

    Here are the results for 10,000 trials:

    PIC

    That doesn’t look very much like the chart that I got in the previous post, using the same population history. Here is that one:

    PIC

    What’s going on? That red line in the second chart is the χ2 distribution with one degree of freedom. The first chart doesn’t match that at all.

    Why not? The second chart shows the mutual information between two markers, with two alleles each. That’s one degree of freedom. The first chart shows the mutual information between two sets of three markers. Three markers give the possibility of eight distinct haplotypes (23). So we have an 8 × 8 contingency table, with 49 degrees of freedom. Here is that distribution, in red, along with the result of our 10,000 simulated samples:

    PIC

    Except, oops, it doesn’t look like 49 degrees of freedom either! What the heck is going on?

    If you’ll remember from the second post in the series, our empirical distribution isn’t going to fit a chi-square unless we have enough observations. Sure, in theory, three markers could give us 8 haplotypes. But in practice, the three markers on each end of our 50 kb region are so tightly linked that we get many fewer haplotypes. Sometimes we get only three or four haplotypes on each end. If each set of three markers were independent of the other, they might fit a chi-square with four degrees of freedom, or nine, or six or twelve. In fact, the mutual information observed in these 10,000 simulations accords with a mixture of distributions. Here are the number of degrees of freedom in the 10,000 simulations, as a histogram:

    PIC

    Most of the trials have six or nine degrees of freedom, with 12 being a substantial proportion as well. No primes, there, except for 2 and 3. Only a tiny fraction of trials have as many as 20 degrees of freedom between the 3-marker sets. None get anywhere close to the theoretical 49, because the three-marker sets are too tightly linked to express all possible haplotypes.

    So if we want a theoretical distribution to match to our simulated one, we are going to need a mixture of different χ2 in proportion to the number represented in the simulated set. Here’s that comparison:

    PIC

    The red line is a mixed distribution, in which different χ2 distributions are blended in the proportions represented by the histogram above. That’s the distribution of mutual information between the two marker sets, conditioned on the haplotypes actually present in each set, under the hypothesis of independence. The simulations lie to the right of this distribution, meaning that small-sample bias and genetic drift have added mutual information to the simulations, compared to the expectation in the absence of linkage.

    The only difference between these simulations and the ones in the previous post is the number of markers. Here, we’re looking at mutual information between two three-marker sets; there, we were looking at mutual information between two one-marker loci. Both examples used the same distance and the same population history. Genetic drift has had a similar effect—in both sets of trials, there is a slight excess of cases with high mutual information. With both, there are around twice as many at the tail as expected if the markers were independent.

    This leads to a couple of conclusions:

    1. Under this population history, genetic drift is not sufficient to cause very large amounts of mutual information to be shared by distant sites.
    2. A slight alteration to the χ2 test—for example, adding some proportion to the critical values—might allow us to test the hypothesis of neutrality for any given case without the necessity of all those simulations.

    If we were to add more and more markers, we would eventually find that the bias in mutual information due to small sample size would get bigger. So there’s some maximum number of markers giving useful information in any given sample. But I just picked 3 markers out of a hat. Adding more markers, up to some point, should increase our ability to see rare haplotypes that might diverge from the expectations of genetic drift.

    Next: Conditional mutual information: finding the haplotypes that explain linkage disequilibrium

    References


       Harding RM, et al. 2000. Evidence for variable selective pressures at MC1R. Am J Hum Genet 66:1351–1361.

  • Ceci n'est pas un pothole

    Thu, 2009-03-26 10:54 -- John Hawks

    In 2005 I wrote this:

    "Unusual compared to the rest of the genome" is a phrase you should expect to hear a lot of in the next few years.

    I was looking back at that old post today, as I'm writing new stuff about bottlenecks. It's about the ability to detect selection using the HapMap data -- written just as I was starting to think about recent selection:

    Suppose we wanted to use a detailed topographic survey of a road to find the potholes. But for everyday roads, there is a problem -- there are lots of bumps and grooves that aren't potholes. And different parts of the road are more or less bumpy. It would help a lot if we could use the empirical distribution of bumps to simulate a section of road -- then we could figure out whether anomalies in the real road were likely to be potholes or not.

    Now suppose that the road isn't just pocked with the occasional pothole -- it has a pothole every three or four feet. Remember why we're using simulations -- not only do we not know where the potholes are, we don't know how common they are. So our simulations based on the pothole-rich road will find that pothole-sized bumps are normal. If pothole-sized bumps are not unusual, then our simulation can have only one result: a pothole is not a pothole.

    So I've been writing about the same problem for over three years -- the problem of ignoring history and archaeology when applying models of population history, and how they skew simulations of genetic drift. Time to do something about it, I guess.

  • Colin Renfrew on recent human evolution

    Wed, 2009-03-25 21:41 -- John Hawks

    Colin Renfrew is an archaeologist, in recent years well-known for his work on Neolithic Europeans and Indo-European origins. Last week, someone pointed me to his recent book, Prehistory: The Making of the Human Mind. I read a short review somewhere, but I've lost the link!

    The book was first published in 2007, so its writing would have predated the publication of recent scans of the genome for selection. Renfrew of course has his own distinctive point of view, and he is not himself a geneticist. However, he has worked to integrate his work with genetic insights, interacts closely with many geneticists, and even coined the term, archaeogenetics, to describe a certain kind of gene-driven investigation of population history. So he's no neophyte when it comes to how geneticists describe the evolution of recent human populations.

    A number of passages of the book are very interesting, from the perspective of the conventional wisdom about recent human evolution. I wanted to cite these paragraphs from page 92:

    The genetic composition of living humans at birth (the human genotype) is closely similar from individual to individual today. That was an underlying assumption of the Human Genome Project and it is being further researched in studies of human genetic diversity. We are all truly born much the same. Moreover a child born today, in the twenty-first century of the Common Era, would be very little different in its DNA -- i.e., in the genotype, and hence in innate capacities -- from one born 60,000 years ago.

    Then on page 93, after some additional discussion of Neandertal genetic results:

    The implication here must be that the changes in human behaviour and life that have taken place since that time [between 60,000 and 100,000 years ago], and all the behavoural diversity that has emerged -- sedentism, cities, writing, warfare -- are not in any way determined by the very limited genetic changes which, as we understand the matter, distinguish us from our ancestors of 60,000 years ago. So the differences in human behaviour that we see now, when contrasted with the more limited range of behaviours then, are not to be explained by any inherent or emerging genetic differences. Modern molecular genetics suggests that, apart from the normal distribution range present in all populations in matters such as IQ, all humans are born equal.

    This represents a widespread point of view, one with a long pedigree in archaeology and human genetics (refer also to my post on Ashley Montagu). Renfrew quite clearly claimed that human evolution stopped once humans became "modern". He emphasizes this point as the basis of a "paradox" -- the observation that no large anatomical changes correlate with the increase in archaeological complexity of the last 30,000 years.

    I believe there is no paradox: rapid archaeological change certainly is no proof of evolutionary stasis!

  • Overstating the obvious

    Tue, 2009-03-24 01:41 -- John Hawks

    I'm reading this interesting paper by Joseph Pickrell and colleagues, titled, "Signals of recent positive selection in a worldwide sample of human populations". The paper recounts the results of a selection scan in the Human Genome Diversity panel, which was reported in two publications last year. This is an interesting sample because it includes individuals from 53 population samples around the world.

    I was waiting to present any observations about selection from the HGDP set until Pritchard's lab had published on them, since the initial publications had mentioned that this analysis was forthcoming. Now that it's appeared, I'll be pointing to a lot of these data in upcoming posts.

    So I was reading with great interest. Then I found this statement:

    Reports of ubiquitous strong (s = 1-5%) positive selection in the human genome (Hawks et al. 2007) may be considerably overstated (8).

    I'm a little concerned that someone reading that might think that Pickrell and colleagues had actually tested our hypothesis about the number of recent strongly selected alleles. I'm also uncertain about the word, "ubiquitous", which means "everywhere." I mean, does that really sound like the kind of word I would use? It's just begging for trouble. It's like saying there's "ubiquitous" evidence of Neandertal contribution to the later European gene pool. Even if I thought it was true, I wouldn't put it in a paper!

    We reported that roughly seven percent of genes appeared to be selected. Pickrell and colleagues list a rather large number of candidate loci for selection, and don't give any estimate or test of the number genome-wide. I think one might be able to count the regions listed in the data supplement for an estimate of what they thought was important enough to list, but I can't get the supplement yet. Since these candidate loci require 16 supplementary figures to list, maybe there are a lot of them. They do list a subset of more than 110 in the paper itself.

    So what's the basis for saying we overstated anything? They suggest one reason for caution about the interpretation of candidate loci for selection:

    We find that putatively selected haplotypes tend to be shared among geographically close populations. In principle, this could be due to issues of statistical power: broad geographical groupings share a demographic history and thus have similar power profiles. However, strongly selected loci are expected to show geographical patterns largely independent of demography—depending on the relevant selection pressures, they can be highly geographically restricted despite moderate levels of migration, or spread rapidly throughout a species even in the presence of little migration (Nagylaki 1975; Morjan and Rieseberg 2004) (8).

    But wait a minute! If a gene were selected strongly and still polymorphic in human populations, it shouldn't be very old. So it can't have spread rapidly throughout the human species even in the presence of little migration. There hasn't been any time for this kind of spread.

    To give a little mathematical perspective, one common way of modeling the dispersal of an advantageous gene is the Fisher diffusion wave model. In a Fisher wave, the gene grows logistically at any single point in space, and the allele frequencies form a standing wave that travels through space at a constant velocity. That velocity in a population uniform across 2-dimensional space is σ times the square root of s, where s is the selection coefficient and σ the root mean square dispersal distance -- basically, the average distance a person moves between his birth and the birth of his children.

    If we want to know about dispersal of selected genes in early agriculturalists, we will need to know how far they move -- that's generally less than 10 km on average. So a gene selected strongly with a 5 percent advantage should move around 2.2 km/generation. Over the 400 generations since the beginning of agriculture, we'd expect a new allele to have dispersed across an area with a radius of less than 1000 km.

    So in other words, it's just implausible that a selected allele would have a geographic distribution very different from drift, at least under the Fisher wave model. But obviously, some alleles have gone a lot farther than 1000 km in the last 10,000 years. Humans don't disperse strictly according to a Gaussian distribution, as assumed by the Fisher model; they sometimes disperse long distances. This can have a large impact on the spread of an advantageous allele. But it is an irregular phenomenon -- a stochastic event.

    Let's consider the results a bit further. Here's a passage from page 1:

    We find extensive sharing of putative selection signals between genetically similar populations, and limited sharing between genetically distant ones. In particular, Europe, the Middle East, and Central Asia show strikingly similar patterns of putative selection signals.

    Which is exactly what we would predict from the history of these populations. Most signals of selection in Europe are Neolithic in date. The Neolithic was not only a time of massive population growth, but also the time of greatest mismatch between the human population and its novel agricultural environment. The dispersal of Neolithic lifeways from West Asia into Europe, and the recurrent incursions of Central Asian languages westward across the steppe into Europe and southward into the Indian subcontinent are the major features of the last 10,000 years of history in those regions. Don't we expect them to share a lot of selection? And if it took the massive migrations and interactions in those regions to generate this shared pattern of selection, shouldn't we expect other regions of the world, which lacked as extensive long-distance movements, to share fewer?

    In this case, the critical information for evaluating the evidence is historical and archaeological. We can't just say that the candidate loci for selection have a similar geographic distribution to those that aren't selected. We need to evaluate the likelihood that they would have some other distribution. That likelihood is very low for most instances of selection, but may be high for a fraction of cases, or for some regions where long-distance dispersal was a more important aspect of population history.

    So if we have a locus that is inconsistent with drift on the basis of linkage, we can reject drift. What if the geographic distribution is still consistent with drift? Should we doubt the linkage analysis? I don't see why -- basic biogeography says that most recently selected genes should have similar geographic distributions to drift.

    References:

    Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK. 2009. Signals of recent positive selection in a worldwide sample of human populations. Genome Res (early online) doi: 10.1101/gr.087577.108

  • Plumbing for bottlenecks

    Fri, 2009-03-20 16:43 -- John Hawks

    My series on mutual information and tests of selection (which began with "Information theory: a short introduction") is at a branching point. One of the critical factors determining the power of such tests is the ancient rate of genetic drift. So it's important to come to some understanding of the archaeological record and our best estimates of ancient demography, so that we can independently test the hypothesis that genetic drift was very strong in recent human evolution. That's a long project, potentially the topic of several review papers. Since nobody else has put together these data in useful way for population genetics, I'm going to do it in one place. What you see in this series are my notes about this project. Being notes, they are not complete, but they may occasionally be better than any other sources. Where it's appropriate, I'll spin off the results for review and publication, and point to them here.

    Many geneticists believe that there were massive population bottlenecks within the last 30,000 years, citing both genetic and archaeological evidence in support of this proposition. Some claim that there have been significant population bottlenecks in the last 5000 years.

    Some archaeologists agree. However, I think this is one of those Inigo Montoya cases: "That word, I do not think it means what you think it means." Archaeology and genetics have completely different interpretations of the words, "bottleneck," "contraction," and "expansion." The result has been a lot of confusion about the relation of archaeological and genetic estimates of population size.

    A population bottleneck impacts genetics by increasing the rate of inbreeding. This takes time to change gene frequencies, and does so in inverse proportion to population size. It may seem surprising that a truly massive die-off, on the scale of the Black Death, will have no measurable genetic impact. But cutting a population of millions down by half just doesn't impact gene frequencies. That is, unless you are looking at genes that helped people to survive the plague, in which case you're looking at natural selection, not a bottleneck.

    A significant genetic bottleneck is not just any population contraction -- it's an event in which the population is cut by a large fraction for a long time. In paleontological terms, we're usually considering cases where the ratio of the number of individuals and the number of generations is near one. In other words, if you cut the population down to a thousand individuals, and keep it there for a thousand generations, you're going to have a large genetic impact. Likewise, you can have a significant bottleneck that's ten generations long, but you need to cut the population down to around ten people.

    You can do a bit better measuring inbreeding by looking at lots and lots of people to study very rare alleles, like a rare genetic disease in a founder population. There, you may spot changes that unfolded in ten generations, even in a relatively large population of a hundred people. Increasingly, as we develop larger and larger datasets of gene variations, we will add power to detect such events in human prehistory.

    In archaeology, a significant event is one in which fewer sites were occupied by ancient people in a well-studied region. The length of such a contraction depends on the sampling intensity and dating methods available -- it might be a hundred years or many thousands. Likewise, the magnitude of population contraction will be uncertain -- you can get an accurate estimate, but with substantial sampling error. As in genetics, there are other possible explanations for an apparent contraction. We might lack geological exposures of the right age, or people may simply have moved from formerly favored locations to new ones. Worse, it might just be that archaeologists haven't looked hard enough at a given time interval.

    Archaeology is necessarily imprecise about the census population that existed at any given time. So is genetics. Both have their strengths and weaknesses. We want these different areas of evidence to bear on the same prehistoric events.

    Too much, instead of testing hypotheses, people just line up chronologies and look for matches. A geologist may claim that African paleoclimate is important because it may explain ``modern human origins.'' An archaeologist may claim that a hiatus at a site is consistent with ``genetic bottlenecks.'' And the geneticist may claim that inbreeding in a modern-day genetic sample dates to a period of time corresponding to the replacement of one tool industry by another.

    Any might be a valid hypothesis, but we need to take it further, to actually provide some tests. I believe we can do better now, because of the growing amount of genetic information. But we're going to have to do away with the facile idea that we're looking for massive bottlenecks, we need to introduce a recognition of the role of selection in human genetic variation, and we need to start addressing the archaeological record as it really exists.

    That's a forward to what follows. I'm going through regions of the world at different time intervals, to discuss what we know about population size from the archaeological record.

    Next: No Late Pleistocene bottleneck in southern Africa

Pages

Subscribe to recent selection

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.