john hawks weblog

paleoanthropology, genetics and evolution

genomics

  • Comparing human and chimpanzee promoters

    Fri, 2005-07-01 23:44 -- John Hawks

    Current thinking on the nature of differences between humans and chimpanzees (or any other pair of closely related species, for that matter) holds that large phenotypic differences may be the result of relatively small differences in gene expression.

    Thus, the often-cited saw that humans and chimpanzees are 98 percent the same at the genetic level says almost nothing about the potential differences between the phenotypes of the two species, because identical genes may be expressed entirely differently by making slight changes to genetic promoters or inhibitors. This is not the only reason for the mismatch between genetic similarity and phenotypic difference, since a single point mutation may radically alter the amino acid sequence of a protein as well. But differences in gene expression are often assumed to be "tunable" to a greater extent than amino acid changes to proteins. Thus, certain slight changes to promoters might result in either slight or large differences in the quantities of a protein, the speed of transcription or post-transcription processing, or other biochemical attributes that would affect the phenotype.

    The possibility of many different possible responses to changes in promoters suggests a fertile drawing board for evolutionary change to work. Such changes might especially be important in the structural differences separating humans from other apes, since ontogenetic development depends sensitively on the concentration of certain proteins and peptides in the developing embryo. A slight increase in gene expression in part of the embryo might increase the length of the leg, or alter the form of the pelvis. Scientists are actively looking for the actual genes that may have influenced these processes in human evolution, and some they have found.

    So the story appears capable of explaining the phenotypic differences between humans and chimpanzees. The only problem is, nobody really knows how promoters work.

    The problem is illustrated in a current study in Genome Biology by Florian Heissig and colleagues (2005), working from Svante Paabo's lab at the Max Planck Institute for Evolutionary Anthropology. The study examined genes with different patterns of expression (assessed by messenger RNA abundance) in chimpanzee and human tissues. They cloned the promoter regions of the genes and spliced them to a reporter gene to test directly the effect of the promoter on the gene expression.

    The results were surprising:

    Out of the 12 promoters tested, two (ACADSB, C10orf10) show a significant difference (ANOVA p-value

    Three promoters (ACADSB, C10orf10, IMPA1) show activity differences in the promoter assays that go in the same direction as the expression differences of the corresponding genes in the tissues. Interestingly, the two promoters (ACADSB, C10orf10) that show qualitatively similar differences in the two cell lines are both in concordance with the tissue expression differences. For four promoters (CGI-51, SH3BGR, UNG, TERF) that show differences in only one of the cell lines, the difference goes in the opposite direction to the expression differences in the tissues (Heissig et al. 2005:R57).

    The paper is very short and to the point, but the implications are striking. Humans and chimpanzees exhibit the same amount of sequence divergence for promoter regions with different activity levels versus promoters with the same activity level. Many genes that differ in activity exhibit no difference in promoter activity. Genes that have higher gene expression in one species sometimes have higher promoter activity, but just as often have lower promoter activity. In other words, the promoters are not an indication of the outcome.

    And the study only examined genes with substantial differences in expression between chimpanzees and humans. For genes with similar patterns of expression, the authors have this to say:

    If many genetic differences do indeed influence the expression of a single gene, the proximal promoters of these non-differentially expressed genes would be expected to differ in their activity almost as frequently as the promoters of differentially expressed genes (Heissig et al. 2005:R57).

    Why should this be? Gene expression is the outcome of a cascade of events within the cell, including the production of enhancer molecules of various kinds, the binding of such enhancers to promoter sites, the metabolism of the gene product within the cell, including the speed with which it may be taken up by receptors, and the correlated effects of other genes with similar products. Again, this system is ideal for evolutionary fine-tuning. But it means that several different changes -- at different genomic sites -- may be necessary to optimize any given function.

    The really bad part of this is that it means a comparison of human and chimpanzee genomes may actually tell us nothing about how the phenotypic differences between the species arose.

    Now, these results may be considered as tentative, since the number of genes examined was only 12. But if the pattern holds true of other genes, even for a substantial minority of genes, it bodes ill for our ability to use genomic sequences to predict gene expression levels in tissues. And predicting phenotypic differences in body structure or behavior is many levels removed from gene expression. So for the high-level structural questions we are likely to ask about human evolution, no answers are likely to be forthcoming from genomic comparisons alone.

    The paper concludes:

    Our results imply that although many promoters may differ in activity between humans and chimpanzees, it will be difficult to predict physiologically relevant gene-expression differences from promoter activities observed in cell lines, even between two closely related species such as humans and chimpanzees. Further work is necessary to elucidate to what extent this applies also to allelic DNA sequence differences in promoters observed within a species. Further work is also needed to elucidate whether a general paradigm for how genome structure translates to gene expression activity can be derived (Heissig et al. 2005:R57).

    If indeed the expression of genes is the product of a different complex cascade of events for each gene, then it may be that no "general paradigm" will ever be possible. Insights about human evolutionary changes may end up coming not from genomic comparisons, but from experimental work. Look for lots and lots of different human and chimpanzee genes to be turned on and off in cell cultures to determine not only their function but also their effects on the expression of other genes.

    References:

    Heissig F, Krause J, Bryk J, Khaitovich P, Enard W, Paabo S. 2005. Functional analysis of human and chimpanzee promoters. Genom Biol 6:R57. Free full text

  • Evaluating selection and demography in human evolution

    Tue, 2005-06-07 00:41 -- John Hawks

    Williamson et al. (2005) present a new mathematical method for deriving information about population size change and selection from the allele frequency spectrum of variation taken at multiple genetic loci. Their method depends on separating sites that are selected from those that are neutral, and thereby isolating the effects of demography from those of selection. They then apply their technique to human genetic data to derive estimates of the average selection on selected sites, and the timing and magnitude of population size change from nonselected sites.

    Oh, if it were really so easy.

    Selection

    To be fair, the paper states a major concern with accurately identifying selection in the context of species like humans and Drosophila that have experienced recent population growth. In other words, the main interest is not in deriving evidence about human prehistory, but instead about making sure that estimates of selection are not biased by population growth.

    With respect to selection, their major conclusion is as follows:

    We find evidence that negative selection on nonsynonymous mutations is widespread, which implies that deleterious mutations make up a significant proportion of standing nonsynonymous variation. Exactly how this genetic variation contributes to phenotypic variation is a matter of considerable debate, especially for medically interesting phenotypes such as multifactorial genetic disease. Because deleterious mutations, by definition, have phenotypic effects, and because of the widespread nature of negative selection on nonsynonymous mutations, it seems likely that negatively selected, generally rare nonsynonymous SNPs have some negative impact on human health. If there is a general relationship between nonsynonymous polymorphism and human genetic disease, then our genomic estimates of the fitness effects of different types of mutations contain prior information about the likelihood that a mutation contributes to disease. It may be possible to use this information to aid in identifying SNPs that cause disease. Other studies have suggested this approach (e.g., Livingston et al. 2004), but it was unclear which of the many measures of exchangeability to use. We feel that the relative fitness of different amino acid changes is the best way to evaluate exchangeability, and we have done that here by using a model that includes demography and selection (Williamson et al. 2005:7887).

    Readers may note that other studies have found evidence for a very high proportion of positive selection across the human genome (discussed in this post). The test applied in the current paper is not well suited to detecting evidence of positive selection, particularly if it is widespread, because it depends on the difference in frequency spectra between "selected" and "neutral" sites. Why the scare quotes? Because although noncoding sites or synonymous SNPs may well be neutral in the literal functional sense of not being targets of selection, it is impossible to verify that they are unlinked to selected sites. For the purposes of detecting negative (purifying) selection, this is not such a problem, because linkage will affect nearby sites only weakly (although this weak effect, called "background selection," may well influence the average level of variation in Drosophila).

    In any event, even if positive selection has been very common across the genome, most sites that have been subject to positive selection should have been fixed long ago. Only a few should still be under selection now, and these are predominantly very recent mutations.

    Consider the following scenario. The study considered 301 human genes. According to common knowledge, repeated here, positive selection leads to a relative excess of high-frequency alleles, compared to the predictions of neutrality (which predicts that there should be very few high-frequency alleles). But these high-frequency variants represent only a small proportion of the total number of genes currently under positive selection, since an allele being driven to fixation passes through every intermediate frequency, not merely the high ones. To detect evidence for positive selection, this study would have to find dozens of high-frequency variants in excess of neutral theory, representing scores of selected genes. But suppose instead that only one positively selected gene actually was in the sample. If so, then out of the human genome of approximately 20,000 genes, we might expect to find 60 or 70 genes currently under positive selection. In our fictive scenario it would be rash to extrapolate from a sample of 1, but in fact there are good reasons to think the true number is much higher. One such gene might take 1000 generations to transit from its appearance to fixation. There have been 100,000 generations in the 2 million years since the origin of our genus, and at least 300,000 since our divergence from chimpanzees. In other words, the complete transformation of the human genome by positive selection, altering thousands of genes -- or even all of them, multiple times -- would be far from detectable by this test.

    But remarkably, this test does find evidence for positive selection -- in noncoding substitutions! The authors put it less sensationally: "Interestingly, we find marginal evidence for weak positive selection on noncoding indel polymorphisms" (7885). I have no explanation for it. But if there actually is a statistically detectable excess of high-frequency variants for these polymorphisms, it may reflect selection at linked sites, or issues with the composition of the sample. If the level of positive selection is detectable, it is another strong evidence of the power of such selection over the long timespan of human evolution.

    In contrast to positive selection, even very strong purifying selection may leave low-frequency variants within the population for a long time. These variants are picked up within samples in large numbers. Low frequency variants are predicted to make up most genetic variation under neutrality, so the proportion of such variants is always a substantial part of the sample. High numbers make for powerful tests. For the human data examined in this study, the nonsynonymous coding sites have a higher proportion of low-frequency variants than do the noncoding, synonymous sites. Thus, they provide strong evidence of negative (purifying) selection.

    Human demography

    So the results of the method applied to selection are mixed. It detects the weak force of purifying selection strongly; it detects the strong force of positive selection weakly. But as the authors perceptively note, the inference of demographic history and the inference of selection are not independent of each other. Therefore, the inferences about demography are in part subject to the weaknesses in detecting the effects of past selection. This study shares this problem with all previous work that has attempted to estimate past human population size from genetic evidence.

    How can selection affect interpretations of demography? Here's one way: Positive selection occurs rapidly relative to rate of recombination between sites. This means that a selective sweep may affect a relatively large section of a chromosome, including many "neutral" sites. This is the principle behind John Gillespie's (2002) pseudohitchhiking, or "genetic draft" model of neutral evolution. In a nutshell, if positive selection has been common, there is no reason to think that genetic variation at noncoding sites provides any indication of demographic parameters. The current study (by Williamson et al. 2005) assumes that positive selection has not had such an effect, nor has any other force significantly affected the variation of neutral sites.

    These are the kinds of influences that have been suggested to result in the large difference between census population sizes (the number of individuals within living species) and estimates of effective population sizes (measures of the rate of genetic drift) in nature. In humans and in most other animal species, the rate of genetic drift on neutral sites appears to have been much stronger than the census population sizes of those species would predict. This is a systematic difference that leads species to have much lower genetic variation than would be expected if they evolved under genetic drift alone. At present, the relative importance of selection and demographic factors in leading to this systematic difference is unknown. I suspect that selection has been strongly important in this difference, others argue that demographic factors have been the most important.

    In most previous genetic work, the effective population size (denoted as Ne) is around 10,000 individuals. Some scientists have suggested that the human population actually was once that small -- that only a few tens of thousands of people once comprised the entirety of humanity. If this were true, then the human population must have expanded in size massively sometime in the recent past. The evidence for a recent change in the mitochondrial DNA molecule was once suggested to be evidence for this change in population size, which was inferred to have occurred during the Late Pleistocene, perhaps 50,000 years ago. From these estimates comes the scenario of an expansion from a single small African population beginning after 100,000 years ago, reaching Europe and the Far East by 30,000 - 50,000 years ago.

    Recently, it has become clear that a single massive expansion of a global human population cannot explain the pattern of genetic variation in living people. Simply put, the pattern of the 16,000 base pairs of the mtDNA molecule is not replicated by the 3 billion base pairs of the nuclear genome.

    To be sure, some genes do show a pattern of recent ancestry and apparent expansion. The FoxP2 gene, for example, has a recent common ancestor for living people (within the past 200,000 years), and shows strong signs that it has not evolved neutrally. If all other genes looked like this, it would be strong evidence of massive population growth.

    But most genes do emphatically not look like this. This has been understood for several years, following reviews by Molly Przeworski and colleagues (2000), Jeff Wall (2000), and even my own dissertation (Hawks 1999). Many genes show no excess of rare variants, most show only a slight excess. The average gene shows no sign whatsoever of a massive population expansion during the Late Pleistocene. This has been concluded most powerfully by recent genome-wide studies of SNP variation by Marth and colleagues (2003; 2004; reviewed previously in this post).

    Where does the current paper (Williamson et al. 2005) come in? Summarizing evidence from over 300 genes, this study does find evidence of a population expansion. Yes indeed -- a population expansion that happened 18,000 years ago! This expansion took the human population from a previous size of around 8000 individuals to a current size of around 50,000 individuals.

    Of course these estimates are far from realistic in anthropological terms. If anything, 18,000 years ago much of the human population should have been contracting rather than expanding. The idea that the human population could have been as amll as 8000 individuals (or very generously 100,000 individuals) during the LGM is simply ridiculous. By that time, the certain ancestors of living people were present from the western tip of Iberia to the edge of (or possibly well into) Beringia. If a genetic estimate cannot gauge a population that must have numbered several millions of people, it is time to stop talking about genetic estimates.

    To be fair, the demographic conclusions of the paper are phrased cautiously:

    Therefore, although we find it striking that the time of population growth (18,200 years B.P.) roughly corresponds with events in human history that may have induced population growth, such as the end of the last ice age and the origin of agriculture, we feel that our demographic inferences should be interpreted cautiously until the full range of plausible demographic models has been explored in one coherent framework (7887).

    At the same time, this apparently cautious discussion raises the more critical problem of a complete lack of communication or citation from any anthropologist. Hmm, I guess the last glacial maximum does roughly correspond to the "end of the last ice age and the origin of agriculture," in the usual manner of genetics confidence intervals. That is to say, it is only twice as old as either, so it might as well be the same.

    Speaking of confidence intervals, again in this paper there are none. No confidence intervals on the demographic estimates, no confidence intervals in the supporting text, no figure showing the likelihood surface, none, nothing, nada.

    What remains?

    In a sense, the bone I am picking is different from that pursued by Williamson et al. (2005). What I care about is evidence for ancient demography. What they care about is better quantifying selection. I think that their paper is incomplete on their own terms, because of the problem quantifying positive selection, but that it is a credible theoretical effort. In particular, the insights about the frequency of genetic disorders based on their findings are a likely contribution to the future study of genetic variation in coding gene regions.

    But the inclusion of demography in this study confuses much more than it clarifies from the perspective of the anthropologist. Its estimates of demographic changes are clearly false, and the lack of detail about confidence intervals makes them impossible to evaluate. In the face of this fatal problem, it is fair to wonder whether the apparent insights about purifying selection have any value.

    The main importance of the data from these many genes is what they do not show. They do not show an expansion of many orders of magnitude. They do not show a current effective size that is anywhere near the current human population size (or a size sufficient to settle any large part of the world). They do not show evidence for expansion coincident with an "out of Africa" movement of people, over 50,000 years ago.

    Instead, the conclusion is concordant with the discussion of Eswaran and colleagues (2005:3):

    Thus, the nuclear data do not consistently signal expansion, and when they do, the signal is of a mild expansion, perhaps reflecting only post-Pleistocene population growth associated with the spread of agriculture.

    The summary of current work is that we can completely exclude the hypothesis that "neutral" genetic variation in humans is explained entirely by past human population size. It simply cannot be true, because if it were, there should be strong signs of expansion that we do not in fact observe.

    On the other hand, perhaps genetic data may tell us something about past human population size, even if population size is not the only explanation for genetic variation. We might expect that some demographic changes may have influenced genetic variation in distinctive ways that could be separated from the effects of selection. If so, then the results of the current paper may be relevant. If natural selection -- especially purifying selection -- explains most rare alleles at nonsynonymous coding sites, then perhaps the residue of rare alleles at synonymous or noncoding sites is a sign of recent changes in demographic patterns?

    This possibility is suggestive, but it appears at present to be fairly far from the data. If unknown factors (which may include selection) have altered "neutral" genetic variation by an order of magnitude or more from their neutral predictions, then it is hard to believe that a relatively small change in population size will be accurately measured by any genetic observations.

    References:

    Eswaran V, Harpending HC, and Rogers AR. 2005. Genomics refutes an exclusively African origin of humans. J Hum Evol Online advance before print.

    Gillespie JH. 2000. Genetic drift in an infinite population: the pseudohitchhiking model. Genetics 155:909-919.

    Williamson SH, Hernandez R, Fledel-Alon A, Zhu L, Nielsen R, and Bustamante CD. 2005. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc Nat Acad Sci USA 102:7882-7887. PNAS online

  • A future without men?

    Mon, 2005-04-25 22:48 -- John Hawks

    H. Allen Orr reviews Brian Sykes' book, Adam's Curse: A Future Without Men in the May 12, 2005 New York Review of Books. This is a great review (with short comparisons to Steve Jones' Y: The Descent of Men and David Bainbridge's The X in Sex: How the X Chromosome Controls Our Lives

    From the review:

    Sykes's case for the extinction of men hinges on an unusual problem plaguing many genes on the Y chromosome -- they tend to pick up debilitating mutations and to ultimately degenerate into genetic junk. A couple of hundred million years ago or so, the X and Y were a pair of perfectly ordinary chromosomes that each carried a full complement of the same thousand genes. Since then, however, the Y has been slowly degenerating. As a result, while the human X still carries its thousand genes, the Y carries only about a hundred. Sykes believes that the genes that remain on the Y -- including SRY as well as others required for the fertility of men -- will also degenerate. The disastrous consequence, he says, will be the disappearance of fertile males. (Sykes sometimes says that males will become sterile, while at other times he suggests they'll disappear. Genetically, at least, the difference doesn't make a difference: if all males are sterile, they may as well not be there.)

    I'm afraid that this is all just silly. ... The critical point is that most of the male fertility genes now residing on the human Y exist only on that chromosome and there's no way that selection will allow their loss.

    Sykes's calculation suggests otherwise because it's wrong. He seems to assume that Y chromosomes carrying mutations that partially sterilize men will get passed on to future generations as often as normal, unmutated chromosomes. But they won't -- that's what it means to be partially sterile. This misstep leads Sykes astray. There are simply no sound evolutionary grounds to support his sensational claims of the extinction of men.

    In this book, Sykes constructs and defends a fairly extreme model of biological determinism for the Y chromosome, drawing historical and prehistoric human events into the fold of this model. So war, empire, and Genghis Khan himself is drawn into the story. It is good to see Orr skewering this model and its lack of fundamental population genetic logic.

    Personally, I can't see the appeal of reading a book entirely about a single chromosome. Not that most chromosomes don't have interesting stories -- hey, why not chromosome 11? -- or that you can't associate human stories with a chromosome. I regularly assign Matt Ridley's Genome in my intro-level course as a quick overview to how genetics relates to human lives, and that book is essentially a series of 24 essays riffing off each of the human chromosomes (X and Y separate). But it seems to me that chromosomes are a pretty poor way to organize human experience.

    Tags: 
  • Different recombination hotspots in humans and chimpanzees

    Tue, 2005-04-19 10:54 -- John Hawks

    Winckler et al. (2005) (Science online) surveyed sequence data from humans and chimpanzees to examine whether recombination was happening at similar rates in both species. They found that even though the human and chimpanzee sequences were 99 percent identical, recombination hotspots were highly different, and rarely occurred in the same places.

    At present it is not known what molecular factors result in recombination at particular genomic locations, so it is unclear what accounts for the difference between humans and chimpanzees in hotspot locations. For this reason, the authors interpret their findings in terms of several possible hypotheses:

    The lack of correlation in recombination patterns between humans and chimpanzees demonstrates that fine-scale recombination rates evolve rapidly, to an extent disproportionate to the change in nucleotide sequence. Rapid evolution of hotspots has previously been hypothesized on the basis of examples of meiotic drive at hotspots and the mechanism of DSB repair (9, 12). Our observations argue against models in which hotspots are directed solely by short, neutrally evolving DNA motifs, which would almost always be identical between the two species. Epigenetic factors, which are known to play a role in recombination hotspots (7), may vary more substantially across closely related species than does DNA sequence. Alternatively, if the trans-acting molecular machinery that initiates crossover events has nucleotide site preferences, then it is possible that substitutions in these components could dramatically alter site preference across the genome. Although DNA sequence is typically shared across human and chimpanzee, the polymorphisms in each species are not (26). It is intriguing to speculate that polymorphisms could themselves play a role in shaping fine-scale recombination; this could also explain why different alleles of a given locus can have substantially different recombination rates (9). Finally, we note that if recombination rates evolve rapidly, then in some cases, rates from "historical" polymorphism data might truly differ from contemporaneous rates in sperm (Winckler et al. 2005:110).

    To me, the research raises an interesting question: if humans and chimpanzees are so divergent in recombination parameters, shouldn't we expect humans to be fairly different from each other also? On average, human alleles are about a tenth as different from each other in sequence as human alleles are from chimpanzee alleles. If the rate of change between humans and chimpanzees has been high, then human polymorphism should include a substantial recombinational component -- perhaps more significant in magnitude than conventional sequence polymorphism. As the study puts it:

    By applying these analytical methods to genome-wide polymorphism surveys, an extensive collection of recombination hotspots will soon be available across the human genome. Studying these hotspots should ultimately illuminate the as yet mysterious factors that direct the location and frequency of recombination in our species (Winckler et al. 2005:110).

    I wonder whether these results will ultimately affect our interpretation of diversity within and outside of Africa -- especially in light of the suggestion that human populations within Africa have undergone adaptation to several fairly distinct local environments. If there are recombinational differences that may act as either impediments or facilitators to selection on particular genomic regions, that might influence the dispersal of adaptive genes (or genetic elements). Likewise, although microsatellites are not directly related to mutational hotspots, there are substantial differences between humans and chimpanzees in terms of variable microsatellite loci. In both cases, human variability may ultimately be the result not only of the factors affecting human populations globally, but also the evolution of the systems themselves in terms of some loci becoming more mutationally active or less active in some populations over time. It is an interesting genomic world out there, that we are just beginning to understand.

    References:

    Winckler W, Myers SR, Richter DJ, Onofrio RC, McDonald GJ, Bontrop DE, McVean GAT, Gabriel SB, Reich D, Donnelly P, Altshuler D. 2005. Comparison of Fine-Scale Recombination Rates in Humans and Chimpanzees. Science 308:107-111.

    Tags: 
  • Human genetic variation in a (very large) nutshell

    Sun, 2005-02-20 00:43 -- John Hawks

    Hinds and colleagues (2005) report in Science on a study that involved determining the genotypes in a sample of 71 people of 1,586,383 single nucleotide polymorphisms (SNPs). The sample is drawn from Americans in three subsamples representing African, Asian, and European ancestry. The goal of the study was to add to knowledge about the frequencies of SNP variation in different medically relevant populations, while assessing the linkage among SNPs. These data would help formulate better strategies for tracing the genetic correlates of disease and other phenotypic traits.

    The data were acquired with these medical goals in mind, which limits to some extent their ability to address interesting issues about human evolution. For example, they select known SNPs that were judged to be likely to be high in frequency in multiple populations. This process, called ascertainment, was complicated enough to make it difficult to use the data in models of genetic evolution. For example, a large set of the candidate SNPs were selected from public databases, which are not random representatives of the three subpopulations considered here, making it likely that the three would differ in allele frequencies in ways characteristic of this bias. Because of the ascertainment complexity, it is unlikely that geneticists would be able to use these data to accurately reconstruct ancient evolutionary events (although it may not stop them from trying).

    The most interesting part is a brief consideration of the role of natural selection in differentiating populations from each other. As the authors note, one suggestion concerning the distribution of genetic differentiation (as measured by FST) is that different genes have undergone very different patterns of global or local selection. The suggestion from this hypothesis would be that candidate genes to examine local selection could be identified from relatively large FST values. (Such genes would have high FST in any event; the distinction is that if genetic drift were largely responsible for human differentiation, then many non-locally-adapted genes might also have high FST values.) As they put their findings:

    If this is true, then larger FST values should be found near functional genetic elements. We looked at the distribution of FST for SNPs that were genic or nongenic, coding or noncoding, and synonymous or nonsynonymous. We performed the analysis within subsets of SNPs grouped by MAF [mean allele frequency], so that effectively, we looked at the fraction of between-population variance for SNPs with the same total genetic variance. Common SNPs in genetic regions do have slightly but significantly higher FST values than nongenic SNPs with the same MAF . . . and common coding SNPs have slightly higher FST values than noncoding SNPs in genic regions. . . . These results are consistent with local selection changing the distribution of FST near functional sequences. However, because the distributions of FST among genic and nongenic SNPs are very similar, large FST values by themselves appear to be very weak evidence of selection (1074).

    Of course there is another reason that genic and coding SNPs might not be much more differentiated than the average: if global selection has constrained them to similar frequencies. Given the huge range of genes in the scope of this analysis, it is hard to say which force of selection should be predominant, or if they should be nearly balanced in the way they would appear to be to explain the data. Certainly genes like the MHC genes would be expected to be held at broadly similar frequencies across populations. But then some of those are precisely the genes that should be very different among populations, as a result of different microbial histories. The authors also examined the private (confined to one sample) SNPs to see if they were more likely to be genic, finding that they were not. This is not surprising, since these alleles are by definition rare, and therefore unlikely to underlie strong selected differences between populations. The few that might be locally selected are surely lost in the volume of rare alleles that are either deleterious or subject entirely to drift.

    It seems to me that the way to address the FST issue is to examine the distribution of FST estimates for the SNPs. Given the observed sample frequencies of the SNPs and some assumptions about population histories, it should be possible to derive an expected distribution of FST. Comparing that expected distribution to the observed distribution would give some information about whether the genes had been subject to drift alone, or whether they had been significantly perturbed in some way.

    The data are
    publicly available; if you can think of a good use for them, have at it!

    References:

    Hinds DA, Stuve LL, Nilson GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, and Cox DR. 2005. Whole-genome patterns of common DNA variation in three human populations. Science 307:1072-1079.
    Science Online

    Tags: 
  • Mayr on speciation

    Sat, 2005-02-19 14:35 -- John Hawks

    OK, that headline looks like the title to a dissertation, which this isn't. But in honor of Mayr's recent death, I was looking through some of the things he has written about hominids, and I came across his book review of Jeffrey Schwartz's book, Sudden Origins. Reading this at once reminded me why Mayr has been such a giant in evolution that he spilled over into anthropology, and saddened me that there are so few representatives of such wisdom left.

    Here are some quotes:

    [Schwartz] correctly criticizes the strictly linear view of descent held by most anthropologists (p. 43), but by not thinking in terms of populations, Schwartz does not convert hominid history into a dynamic picture of the movement of geographically vicariant populations and subspecies. Such multidimensional thinking, introduced by the founders of the Evolutionary Synthesis, is not yet popular among physical anthropologists (978).

    Phenotypic discontinuity does not conflict with Darwinian theory. If, for instance, a phyletic line evolves form the possession of two to the possession of three molars, the change does not occur by mutations giving one tenth, later one fifth, and one half of a new molar, but by one tenth, later one fifth, and then one half of the population having one new molar (978).

    And here's rubbing it in:

    What is the reason for Schwartz's failure in spite of his extensive reading and his efforts to make use of some of the most recent findings of molecular biology? Perhaps it is due to an insufficient consideration of some of the basic concepts of the synthetic theory. For instance, nowhere does he adequately emphasize that evolution takes place in populations and consists of the replacement of individuals, generation after generation. Furthermore, in numerous discussions of mutation in this volume, it is always implied that the gene (mutation) is the target of selection rather than the phenotype of the individual, and this favors acceptance of a theory of a saltational role of homeobox genes. Nor does Schwartz seem to appreciate that natural selection is a two-step process. Homeobox mutations occur during the first step, the production of variation. The fate of these mutations, after they have become components of new genotypes, however, is decided at the second step, the actual selection. Therefore, no conflict exists between the occurrence of homeobox mutations and the classical Darwinian process (979).

    Consider that here, Mayr was in his early 90's. That some of us forget the lessons of the Synthesis is a discredit to us and our teachers, certainly not to the founders. Yet he patiently explains the way that today's developments in genetics should be incorporated into an evolutionary model, using the understanding that he helped the field to develop some sixty years before.

    References:

    Mayr E. 1999. Sudden origins (book review). BioEssays 21(11):978-979.
    Wiley InterScience

  • The probability of parallel evolution

    Sat, 2005-02-19 13:52 -- John Hawks

    Orr (2005) considers the likelihood of the same mutants being fixed in two populations as a function of parallel selection, compared to drift. The model used is a very simple one, basically involving a single locus in each population with a limited number of advantageous mutants that may be presented to both populations.

    The argument for the idea that beneficial mutations are limited is probably right:

    Throughout this analysis, I make a major assumption: the number of beneficial mutations is small. This will almost certainly be true for two reasons. First, environments are autocorrelated through time, making it unlike [sic] that a previously highly fit wild-type allele would suddenly plummet in relative fitness; second, random changes in a functional protein are much more likely to worsen than to improve protein function (216).

    The result of the paper is that parallel evolution is likely under such circumstances. This is not especially surprising, and the innovative aspects of the paper are the demonstration that this is true under many models of the distribution of fitnesses of mutations. The equations in the paper are derived from extreme value theory, with the basic theme being that the fittest possible new mutations are also the rarest, so these will preferentially be incorporated into populations.

    Does this study apply to natural populations? Even most closely related populations typically differ in ecology in some respects, so it is hard to say that the model where mutations have the same fitness characteristics in two different populations is always relevant. Likewise, over the long term it is likely that a natural population will be as near to an optimum allele as is practicable. That is to say, the argument above that wild-type alleles are unlikely to plummet in relative fitness, carried to its logical extreme, would predict that any natural population of substantial size would already have had the opportunity to explore all the adaptive space available to it by recurring mutations.

    Only in fairly unusual circumstances will populations be limited from achieving higher fitness (for any single gene) because mutations don't occur often enough. Instead, they will be limited by the fact that the mutations that do occur are never more adaptive than the current wild-type. The unusual circumstances would include cases in which the adaptive landscape really is complex; for example, where the phenotypic characters influenced by the gene are themselves subject to complex patterns of stabilizing selection. Here, the possibility for stepped advantages among many genes creates the opportunity for a progression of mutations. That is to say, many genes that interact with each other are all highly optimized and adaptive mutations at each of them are incredibly rare. But when an adaptive mutation occurs at one of these genes, it may shift the interaction in ways that make a new (perhaps recurring and previously neutral or deleterious) mutation at one or more of the other genes more likely to be adaptive. In this way, a highly polygenic trait might be mutation-limited in its evolution, while no individual gene can be said to be mutation-limited.

    References:

    Orr HA. 2005. The probability of parallel evolution. Evolution 59(1):216-220.

  • Patents and human chimeras

    Sun, 2005-02-13 21:31 -- John Hawks

    This article in the Boston Globe (Feb. 13, 2005) summarizes a recent US patent office ruling on whether an application for a human-animal chimera could be approved. The office rejected the claim, holding that the method specified in the application (creation of an embryo consisting of a mixture of human and animal cells) would result in the creation of a living being too close to a human to be patentable.

    The story describes that this is a waypoint in a long legal battle over the patenting of life. The applicant in this case,
    Stuart Newman of New York Medical College is a collaborator of biotechnology gadfly
    Jeremy Rifkin. This patent application was part of an effort to get the patent office to make a precedent for future applications--they reasoned that if the patent were approved they could forestall all such research for the patent term, while if it were rejected it would block future patent applications for human-animal chimeras.

    It's not entirely obvious that this will be the effect, since there are many labs currently working at creating mice with various levels of human tissue included. The most famous is perhaps the
    mouse strain with a human immune system, but the article mentions the prospect of mice with brains made entirely of human neurons.

    There is no definition of human that would determine how much human tissue or human genetic material would be enough to qualify an genetically engineered organism as human. In a legal sense, there is no real protection against the creation of such organisms, beyond the 13th amendment, which bans slavery, and would presumably preclude the extention of property rights over genetically engineered people. But how human does an organism have to be to qualify for this protection? Nobody knows.

    Ironically, the patent office appears to be pretty friendly to Rifkin's position. As the article points out, the Supreme Court forced the issue of patenting organisms in the 1980 case of Diamond v. Chakrabarty
    (FindLaw), deciding that any artifically created organism (i.e. not naturally occurring) was eligible for patent protection as a genuine human innovation. Since that time, hundreds of patents have been issued for living organisms, and tens of thousands on genes or gene products.

    Now that human cloning has been kicked into high gear in Britain, we can expect that there will be an increase in the potential for mixing genetically engineered sequences into humans, including sequences taken from animals or plants. And there will be increasing attempts to make animals with human genes and cells, as experimental models for human drug and treatment testing as well as for other purposes. We're not quite at the island of Dr. Moreau, but we are separated from it by our motives, not our methods.

    Tags: 
  • How much selection does it take?

    Wed, 2005-02-02 23:58 -- John Hawks

    I was involved in a discussion this weekend that I think reveals much about the current state of evolutionary genomics. The forum was the "Neanderthals Revisited" conference at NYU, although I might have had the same discussion almost anywhere. Earlier, I had given my presentation about the importance of selection when considering Neandertal relationships. My point was that ignoring selection leads to a preordained result, since a small amount of selection can have the same effect as very extreme hypotheses of demographic change. I applied to both morphological examples and mitochondrial genetics as examples of the way that small magnitudes of selection can have great influence on the pattern of variation. Needless to say, this line of argument did not go over very well with geneticists who had previously been engaged in mitochondrial research, employing the paradigm that assumes that mtDNA is neutral.

    The comment that surprised me quite a bit (and I can say that at least a few others in the crowd were surprised as well) was given the next day by Mark Stoneking, a geneticist at the Max-Planck Institute for Evolutionary Anthropology in Leipzig. Stoneking was commenting on research the Leipzig lab has been doing in the area of regulatory control across the genome. His observation--and this was the surprise--was that regulatory changes in the genome appear consistent with the hypothesis of neutral change over time. In other words, his argument was that neutral change was the predominant mode of genomic evolution leading to living humans. He used this as a point of departure to suggest that a neutral evolution of phenotypes during the course of human evolution was likely productive hypothesis to pursue when considering the pattern of variation in ancient hominids.

    I must say that I was fairly stunned by this statement. It really struck me as being inconsistent with my knowledge of genomic variation in humans. So I looked up some of the recent research from the Leipzig lab, and compared it with the papers that I had been aware of that examined the level of positive selection responsible for human evolutionary change.

    The promise of evolutionary genomics toward driving information about the pattern of selection leading to human-specific traits lies mostly in the comparison of human genetic sequences with those of other primates. By examining the rate of evolution for different genes and parts of genes, it is possible to address whether the changes responsible for human phenotypic evolution were mainly due to changes in coding regions of genes, regulatory elements, or other regions. And by comparing genes with each other, it is possible to judge whether most of the changes were concentrated at a few important genes, or whether they were more broadly distributed across the genome.

    But most important is the need to discover exactly what types of mutations were common in the human lineage, and for that matter in the chimpanzee lineage and other primate lineages as well. For example, a gene that has repeatedly experienced positive selection during the course of human evolution will show a greater difference between humans and chimpanzees in whatever kinds of changes were under selection. This might mean that the gene will exhibit a greater degree of the amino acid substitutions then expected at random from the total number of mutational changes. It might mean that the certain areas of the gene, such as upstream regulatory regions, will exhibit more divergence than others.

    Geneticists have developed tests for these different patterns of departure from neutrality. They can test whether the number of amino acid changes is significantly higher than expected given a certain rate of mutations. They can test for equality of rates between different genetic loci. And they can test directly for violation of mutation-drift equilibrium. It is this last test that is violated by human mitochondrial DNA, for example, which leads me along with many other geneticists to believe that the molecule has been under positive selection during human prehistory.

    But there are problems in applying tests of neutrality to genome-wide questions of the abundance or frequency of positive selection. Tests of neutrality are notoriously conservative, meaning that it is difficult for genetic data to demonstrate that selection actually took place. One of the reasons why tests of neutrality are so conservative is that the effects of positive selection are actually counterbalanced by other forms of selection on the genome. For example, if the coding sequence of a gene has been under positive selection during evolutionary history, then one expected effect on the distribution of variation is that the number of amino acid changes separating two species will be relatively high, especially compared to the number of amino acid differences noted within each of the species. In other words, one of the species or both of them should have been driven further apart by selection on the amino acid sequence than we would expect from their neutral level of variation. But exactly the opposite effect is expected for purifying selection. As purifying selection consistently eliminates new amino acid mutations, it should leave to species relatively close to each other in their amino acid sequences compared to the level of differences within each of those species. What this means is that if both positive selection and purifying selection have affected the gene over the course of its evolutionary history, the two opposing forces will to some extent cancel each other out. The effect of this cancellation causes an underestimation of the rate of positive selection, because purifying selection is usually continuous and positive selection is much more episodic. In essence, for most genes we may predict that the positive selection never happened, at the same time we are estimating that it purifying selection is substantially weaker than actually was.

    This is the point raised by Justin Fay and colleagues (2001). Their paper was principally concerned with estimating the rate of negative, or purifying, selection across the human genome. Their research was motivated by the question of what the typical effects of mutations are--are mutations usually neutral, or they usually deleterious? In setting out to answer this question they realized that the rate of negative selection could not be independently estimated without first considering the effect of positive selection on the same genes. To estimate the rate of positive selection, Fay and colleagues devised a test that depended upon dividing polymorphisms into three subsets. One subset consisted of polymorphisms at low frequency (

    In a review article in Nature, Sean Carroll (2003) applied this estimate of 35 percent adaptive substitutions to gain an understanding of the number of selected changes during the past 5 to 7 million years. Based on our understanding of a subset of human genes, Carroll extrapolated that approximately 200,000 amino acid changes have occurred on the human lineage during human evolution. If 35 percent of these changes were positively selected, then this number implies that some 70,000 adaptive substitutions happened during human evolution. This is a stunningly large number. It evens out to over two adaptive changes for every one of our 30,000 genes, or one adaptive change per every hundred years. If these were evenly distributed over time, then any time period of 10,000 years during our evolution was likely to have seen 100 adaptive substitutions underway. Nor does the figure of 70,000 include the adaptive changes in regulatory elements and other non-coding portions of the DNA.

    An independent line of inquiry has provided support for the idea that human genes have been regularly under positive selection. Vallender and Lahn (2004) review a list of genes that are now believed to have been under repeated positive selection during the course of hominoid evolution. Some of these display strong evidence of selection during the evolution of ancestral anthropoids or hominoids, others apparently have been under selection during the course of human evolution during the past seven million years or less.

    The work of the Leipzig lab that Stoneking was referring to is reflected by the paper by Hellmann and colleagues (2003). The innovation of this research is the ability to compare human genomic regions with the same regions in chimpanzees. In contrast to earlier studies like that of Fay and colleagues (2001), who relied mainly upon macaque comparisons, this study gives a very close analogue as a comparison sample for human variation. But unlike the earlier study, Hellmann and colleagues (2003) basically ignore the issue of positive selection on the genome. They do not examine the ratio of amino acid changing variants for different frequency sets, nor do they attempt to quantify the number of adaptive substitutions in humans as compared to chimpanzees. The information in the data that would address positive selection is therefore hidden by the overall pattern of purifying selection against deleterious mutations that the researchers find.

    One interesting addition to our information about positive selection is present in Hellmann and colleagues' data, however. They find that the 5' untranslated region of the genes in their study is significantly more divergent between humans and chimpanzees than other parts of the genes. This observation is consistent with the idea that this region has been under positive selection in many of these genes. Presumably the source of this positive selection is adaptive change in regulatory elements upstream of the coding regions of these genes. If this pattern is widespread, it offers an additional large number of episodes of positive selection beyond those necessary to explain the pattern of amino acid changes in humans.

    So the data from the Leipzig lab do not contradict the findings of high levels of positive selection on human genes. They do confirm the idea that tests of neutrality will not pick up evidence of positive selection when purifying selection against deleterious mutations has also been acting on the genes. These data do not suggest that genetic drift has been the predominant force affecting human evolutionary change. Instead, purifying selection is shown to be a very strong force affecting the current variation of most genes, while the same estimate of positive selection made by Fay and colleagues (2001), added to the evidence for positive selection on 5' untranslated regions, are applicable to these data.

    What are the stakes in this inquiry? That is, why does it matter what the level of positive selection has been in human evolution or the evolution of any other lineage?

    One implication of a very high rate of adaptive evolution is that most of the molecular changes affect cellular metabolism, homeostatic processes, and other small-scale molecular features of the cell and organism. This conclusion stems from the idea that there just have not been that many changes in gross structural and anatomical aspects of organisms to require tens of thousands of selected changes. Carroll (2003) goes so far as to suggest that developmental genes may have been relatively conserved compared to the molecular processes that account for this widespread adaptive evolution. Vallender and Lahn (2003:R245) say, "Many other aspects of human biology not necessarily related to the 'branding' of our species, such as host-pathogen interactions, reproduction, dietary adaptation, and physical appearance, have also been the substrate of varying levels of positive selection."

    The most important implication of a high rate of selection, at least to me, is that ancient demography is likely not a major cause of human genetic evolution and differentiation. John Gillespie's work has made clear over the past five years that positive selection can explain the pattern of genetic variation in many species. In essence, he explains a low level of polymorphism in most species as the possible result of widespread positive selection and linkage between selected and neutral sites. As large areas of the genome undergo genetic hitchhiking (reductions in variation caused by linkage to positively selected sites), the variation ultimately becomes greatly limited in ways that resemble the effects of genetic drift in a small population. The difference is that with positive selection and linkage, the level of polymorphism that can persist in a population has little if any relation to the size of the population. This theory is called "genetic draft."

    If the variation of most human genes is limited by selection across the genome, then it follows that genes cannot be used to estimate ancient population size or other demographic characteristics.

    Likewise, if positive selection on human genes is common enough, it appears very likely that some areas of the genome often used for demographic inquiry are themselves direct targets of selection. The most obvious are the mitochondrial DNA and the Y chromosome, both of which are completely linked over their entire lengths. For example, the Y chromosome contains around 80 genes. If these were selected at the rate typical of nuclear genes in the rest of the genome, then we can estimate that the Y chromosome underwent some 200 adaptive substitutions during the past 7 million years, or one in approximately 35,000 years. In this context, it is interesting that the most recent human ancestor of human Y chromosomes appears to have lived within the past hundred thousand years (and possibly less), and that the genetic sequences thus far studies show clear violations of neutrality. This has previously been considered evidence of a large-scale demographic replacement in recent human evolution, but it is better explained as the consequence of positive selection on the Y chromosome. A similar line of argument could apply to the mtDNA, although estimates of the frequency of adaptive changes are more difficult because the mtDNA has a substantially different rate of mutations compared to nuclear DNA.

    If positive selection is such a common force in human evolution, then the possibility is clearly opened that many of the genetic differences among human populations are the result of local positive selection. We have tended to examine interpopulation differences under the assumption that genetic drift and migration are the only important parameters. But if selection plays an important role in diversifying populations, then we can expect that the level of genetic differences (for example Fst) between populations has little to do with classical correlates of genetic drift, such as population size. Instead of drift-migration equilibrium, the important factor is migration-selection equilbrium, probably constantly changing in a dynamical sense.

    What mysteries remain?

    Although it appears that positive selection has been very common, this says little about the distribution of such selection across the genome. Evidence from some genes and sets of genes suggests that they have been subjected to repeated episodes of adaptive evolution during the course of primate evolution. The study by Dorus and colleagues on the adaptive evolution of brain-related genes provides an example of the way that positive selection has been focused on some areas of the genome. Certainly some individual genes have probably undergone dozens of instances of positive selection--probably including many genes involved in host-pathogen interactions.

    If positive selection is really so common across the genome, then a greater frequency of selection in the human lineage as opposed to other animal species might explain the level of human genetic variation. But if positive selection has been so common, then why do some animal genes appear to have fairly great variation? For example, why do so many animal species have relatively great mtDNA variation, when the mitochondrial DNA appears to be a very likely target of selective sweeps?

    References:

    Carroll SB. 2003. Genetics and the making of Homo sapiens. Nature 422:849-857.

    Fay JC, Wyckoff GJ, Wu CI. 2001. Positive and negative selection on the human genome. Genetics 158:1227-1254.

    Hellmann I, Zollner S, Enard W, Ebersberger I, Nickel B, Paabo S. 2003. Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res 13:831-837.

    Vallender EJ, Lahn BT. 2004. Positive selection on the human genome. Hum Mol Genet 13:R245-R254.

Pages

Subscribe to genomics

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.