How much selection does it take?

13 minute read

I was involved in a discussion this weekend that I think reveals much about the current state of evolutionary genomics. The forum was the "Neanderthals Revisited" conference at NYU, although I might have had the same discussion almost anywhere. Earlier, I had given my presentation about the importance of selection when considering Neandertal relationships. My point was that ignoring selection leads to a preordained result, since a small amount of selection can have the same effect as very extreme hypotheses of demographic change. I applied to both morphological examples and mitochondrial genetics as examples of the way that small magnitudes of selection can have great influence on the pattern of variation. Needless to say, this line of argument did not go over very well with geneticists who had previously been engaged in mitochondrial research, employing the paradigm that assumes that mtDNA is neutral.

The comment that surprised me quite a bit (and I can say that at least a few others in the crowd were surprised as well) was given the next day by Mark Stoneking, a geneticist at the Max-Planck Institute for Evolutionary Anthropology in Leipzig. Stoneking was commenting on research the Leipzig lab has been doing in the area of regulatory control across the genome. His observation--and this was the surprise--was that regulatory changes in the genome appear consistent with the hypothesis of neutral change over time. In other words, his argument was that neutral change was the predominant mode of genomic evolution leading to living humans. He used this as a point of departure to suggest that a neutral evolution of phenotypes during the course of human evolution was likely productive hypothesis to pursue when considering the pattern of variation in ancient hominids.

I must say that I was fairly stunned by this statement. It really struck me as being inconsistent with my knowledge of genomic variation in humans. So I looked up some of the recent research from the Leipzig lab, and compared it with the papers that I had been aware of that examined the level of positive selection responsible for human evolutionary change.

The promise of evolutionary genomics toward driving information about the pattern of selection leading to human-specific traits lies mostly in the comparison of human genetic sequences with those of other primates. By examining the rate of evolution for different genes and parts of genes, it is possible to address whether the changes responsible for human phenotypic evolution were mainly due to changes in coding regions of genes, regulatory elements, or other regions. And by comparing genes with each other, it is possible to judge whether most of the changes were concentrated at a few important genes, or whether they were more broadly distributed across the genome.

But most important is the need to discover exactly what types of mutations were common in the human lineage, and for that matter in the chimpanzee lineage and other primate lineages as well. For example, a gene that has repeatedly experienced positive selection during the course of human evolution will show a greater difference between humans and chimpanzees in whatever kinds of changes were under selection. This might mean that the gene will exhibit a greater degree of the amino acid substitutions then expected at random from the total number of mutational changes. It might mean that the certain areas of the gene, such as upstream regulatory regions, will exhibit more divergence than others.

Geneticists have developed tests for these different patterns of departure from neutrality. They can test whether the number of amino acid changes is significantly higher than expected given a certain rate of mutations. They can test for equality of rates between different genetic loci. And they can test directly for violation of mutation-drift equilibrium. It is this last test that is violated by human mitochondrial DNA, for example, which leads me along with many other geneticists to believe that the molecule has been under positive selection during human prehistory.

But there are problems in applying tests of neutrality to genome-wide questions of the abundance or frequency of positive selection. Tests of neutrality are notoriously conservative, meaning that it is difficult for genetic data to demonstrate that selection actually took place. One of the reasons why tests of neutrality are so conservative is that the effects of positive selection are actually counterbalanced by other forms of selection on the genome. For example, if the coding sequence of a gene has been under positive selection during evolutionary history, then one expected effect on the distribution of variation is that the number of amino acid changes separating two species will be relatively high, especially compared to the number of amino acid differences noted within each of the species. In other words, one of the species or both of them should have been driven further apart by selection on the amino acid sequence than we would expect from their neutral level of variation. But exactly the opposite effect is expected for purifying selection. As purifying selection consistently eliminates new amino acid mutations, it should leave to species relatively close to each other in their amino acid sequences compared to the level of differences within each of those species. What this means is that if both positive selection and purifying selection have affected the gene over the course of its evolutionary history, the two opposing forces will to some extent cancel each other out. The effect of this cancellation causes an underestimation of the rate of positive selection, because purifying selection is usually continuous and positive selection is much more episodic. In essence, for most genes we may predict that the positive selection never happened, at the same time we are estimating that it purifying selection is substantially weaker than actually was.

This is the point raised by Justin Fay and colleagues (2001). Their paper was principally concerned with estimating the rate of negative, or purifying, selection across the human genome. Their research was motivated by the question of what the typical effects of mutations are--are mutations usually neutral, or they usually deleterious? In setting out to answer this question they realized that the rate of negative selection could not be independently estimated without first considering the effect of positive selection on the same genes. To estimate the rate of positive selection, Fay and colleagues devised a test that depended upon dividing polymorphisms into three subsets. One subset consisted of polymorphisms at low frequency (< 5%), which would include most deleterious alleles, since selection prevents these from increasing in frequency beyond a very rare level. A second subset were "moderate" frequency alleles occuring between 5 and 15 percent, while the third subset included "common" alleles with frequencies between 15 percent and 50 percent (a folded frequency spectrum considers only the frequency of the minor allele). The researchers reasoned that common (high-frequency) mutations were very unlikely to be deleterious, and so the ratio of nonsynonymous to synonymous sites for only these common alleles should not reflect the effects of purifying selection. Using this subset, they tested whether the divergence between species was unusually high compared to the proportion of amino acid polymorphisms within species. They found an excess of 35 percent in amino acid changes between species, compared to the expectation based on within-species polymorphism, "a large proportion, 35%, of amino acid substitutions between humans and old world [sic] monkeys are estimated to have been driven by positive selection" (Fay et al. 2001: 1232).

In a review article in Nature, Sean Carroll (2003) applied this estimate of 35 percent adaptive substitutions to gain an understanding of the number of selected changes during the past 5 to 7 million years. Based on our understanding of a subset of human genes, Carroll extrapolated that approximately 200,000 amino acid changes have occurred on the human lineage during human evolution. If 35 percent of these changes were positively selected, then this number implies that some 70,000 adaptive substitutions happened during human evolution. This is a stunningly large number. It evens out to over two adaptive changes for every one of our 30,000 genes, or one adaptive change per every hundred years. If these were evenly distributed over time, then any time period of 10,000 years during our evolution was likely to have seen 100 adaptive substitutions underway. Nor does the figure of 70,000 include the adaptive changes in regulatory elements and other non-coding portions of the DNA.

An independent line of inquiry has provided support for the idea that human genes have been regularly under positive selection. Vallender and Lahn (2004) review a list of genes that are now believed to have been under repeated positive selection during the course of hominoid evolution. Some of these display strong evidence of selection during the evolution of ancestral anthropoids or hominoids, others apparently have been under selection during the course of human evolution during the past seven million years or less.

The work of the Leipzig lab that Stoneking was referring to is reflected by the paper by Hellmann and colleagues (2003). The innovation of this research is the ability to compare human genomic regions with the same regions in chimpanzees. In contrast to earlier studies like that of Fay and colleagues (2001), who relied mainly upon macaque comparisons, this study gives a very close analogue as a comparison sample for human variation. But unlike the earlier study, Hellmann and colleagues (2003) basically ignore the issue of positive selection on the genome. They do not examine the ratio of amino acid changing variants for different frequency sets, nor do they attempt to quantify the number of adaptive substitutions in humans as compared to chimpanzees. The information in the data that would address positive selection is therefore hidden by the overall pattern of purifying selection against deleterious mutations that the researchers find.

One interesting addition to our information about positive selection is present in Hellmann and colleagues' data, however. They find that the 5' untranslated region of the genes in their study is significantly more divergent between humans and chimpanzees than other parts of the genes. This observation is consistent with the idea that this region has been under positive selection in many of these genes. Presumably the source of this positive selection is adaptive change in regulatory elements upstream of the coding regions of these genes. If this pattern is widespread, it offers an additional large number of episodes of positive selection beyond those necessary to explain the pattern of amino acid changes in humans.

So the data from the Leipzig lab do not contradict the findings of high levels of positive selection on human genes. They do confirm the idea that tests of neutrality will not pick up evidence of positive selection when purifying selection against deleterious mutations has also been acting on the genes. These data do not suggest that genetic drift has been the predominant force affecting human evolutionary change. Instead, purifying selection is shown to be a very strong force affecting the current variation of most genes, while the same estimate of positive selection made by Fay and colleagues (2001), added to the evidence for positive selection on 5' untranslated regions, are applicable to these data.

What are the stakes in this inquiry? That is, why does it matter what the level of positive selection has been in human evolution or the evolution of any other lineage?

One implication of a very high rate of adaptive evolution is that most of the molecular changes affect cellular metabolism, homeostatic processes, and other small-scale molecular features of the cell and organism. This conclusion stems from the idea that there just have not been that many changes in gross structural and anatomical aspects of organisms to require tens of thousands of selected changes. Carroll (2003) goes so far as to suggest that developmental genes may have been relatively conserved compared to the molecular processes that account for this widespread adaptive evolution. Vallender and Lahn (2003:R245) say, "Many other aspects of human biology not necessarily related to the 'branding' of our species, such as host-pathogen interactions, reproduction, dietary adaptation, and physical appearance, have also been the substrate of varying levels of positive selection."

The most important implication of a high rate of selection, at least to me, is that ancient demography is likely not a major cause of human genetic evolution and differentiation. John Gillespie's work has made clear over the past five years that positive selection can explain the pattern of genetic variation in many species. In essence, he explains a low level of polymorphism in most species as the possible result of widespread positive selection and linkage between selected and neutral sites. As large areas of the genome undergo genetic hitchhiking (reductions in variation caused by linkage to positively selected sites), the variation ultimately becomes greatly limited in ways that resemble the effects of genetic drift in a small population. The difference is that with positive selection and linkage, the level of polymorphism that can persist in a population has little if any relation to the size of the population. This theory is called "genetic draft."

If the variation of most human genes is limited by selection across the genome, then it follows that genes cannot be used to estimate ancient population size or other demographic characteristics.

Likewise, if positive selection on human genes is common enough, it appears very likely that some areas of the genome often used for demographic inquiry are themselves direct targets of selection. The most obvious are the mitochondrial DNA and the Y chromosome, both of which are completely linked over their entire lengths. For example, the Y chromosome contains around 80 genes. If these were selected at the rate typical of nuclear genes in the rest of the genome, then we can estimate that the Y chromosome underwent some 200 adaptive substitutions during the past 7 million years, or one in approximately 35,000 years. In this context, it is interesting that the most recent human ancestor of human Y chromosomes appears to have lived within the past hundred thousand years (and possibly less), and that the genetic sequences thus far studies show clear violations of neutrality. This has previously been considered evidence of a large-scale demographic replacement in recent human evolution, but it is better explained as the consequence of positive selection on the Y chromosome. A similar line of argument could apply to the mtDNA, although estimates of the frequency of adaptive changes are more difficult because the mtDNA has a substantially different rate of mutations compared to nuclear DNA.

If positive selection is such a common force in human evolution, then the possibility is clearly opened that many of the genetic differences among human populations are the result of local positive selection. We have tended to examine interpopulation differences under the assumption that genetic drift and migration are the only important parameters. But if selection plays an important role in diversifying populations, then we can expect that the level of genetic differences (for example Fst) between populations has little to do with classical correlates of genetic drift, such as population size. Instead of drift-migration equilibrium, the important factor is migration-selection equilbrium, probably constantly changing in a dynamical sense.

What mysteries remain?

Although it appears that positive selection has been very common, this says little about the distribution of such selection across the genome. Evidence from some genes and sets of genes suggests that they have been subjected to repeated episodes of adaptive evolution during the course of primate evolution. The study by Dorus and colleagues on the adaptive evolution of brain-related genes provides an example of the way that positive selection has been focused on some areas of the genome. Certainly some individual genes have probably undergone dozens of instances of positive selection--probably including many genes involved in host-pathogen interactions.

If positive selection is really so common across the genome, then a greater frequency of selection in the human lineage as opposed to other animal species might explain the level of human genetic variation. But if positive selection has been so common, then why do some animal genes appear to have fairly great variation? For example, why do so many animal species have relatively great mtDNA variation, when the mitochondrial DNA appears to be a very likely target of selective sweeps?


Carroll SB. 2003. Genetics and the making of Homo sapiens. Nature 422:849-857.

Fay JC, Wyckoff GJ, Wu CI. 2001. Positive and negative selection on the human genome. Genetics 158:1227-1254.

Hellmann I, Zollner S, Enard W, Ebersberger I, Nickel B, Paabo S. 2003. Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res 13:831-837.

Vallender EJ, Lahn BT. 2004. Positive selection on the human genome. Hum Mol Genet 13:R245-R254.