At least 10 percent of human genes under recent selection

4 minute read

It's hard to beat the abstract of this paper by Eric Wang and colleagues (2006):

By using the 1.6 million single-nucleotide polymorphism (SNP) genotype data set from Perlegen Sciences [Hinds, D. A., Stuve, L. L., Nilsen, G. B., Halperin, E., Eskin, E., Ballinger, D. G., Frazer, K. A. & Cox, D. R. (2005) Science 307, 1072-1079], a probabilistic search for the landscape exhibited by positive Darwinian selection was conducted. By sorting each high-frequency allele by homozygosity, we search for the expected decay of adjacent SNP linkage disequilibrium (LD) at recently selected alleles, eliminating the need for inferring haplotype. We designate this approach the LD decay (LDD) test. By these criteria, 1.6% of Perlegen SNPs were found to exhibit the genetic architecture of selection. These results were confirmed on an independently generated data set of 1.0 million SNP genotypes (International Human Haplotype Map Phase I freeze). Simulation studies indicate that the LDD test, at the megabase scale used, effectively distinguishes selection from other causes of extensive LD, such as inversions, population bottlenecks, and admixture. The 1,800 genes identified by the LDD test were clustered according to Gene Ontology (GO) categories. Based on overrepresentation analysis, several predominant biological themes are common in these selected alleles, including host-pathogen interactions, reproduction, DNA metabolism/cell cycle, protein metabolism, and neuronal function.

Most tests of selection are blunt instruments. They depend on observations of the frequency spectrum of mutations, but mutations don't happen very often for most genetic loci. With most methods, recent selection is very difficult to find. It's like trying to find potholes when you're driving a tank -- it takes a pretty big pothole to notice anything. To find a higher proportion of the selection that happened, you need a more sensitive metric.

The mark of a selected allele is a rapid increase in frequency. If the selection is recent, then the allele should have appear to originate recently. A rapid increase in the frequency of an allele leaves a pattern of linkage disequilibrium (LD), because recombination does not have a chance to break the selected locus apart from nearby neutral loci. The longer ago the allele increased in frequency, the more recombination and the less LD.

Wang et al. (2006) used the prediction that the LD should decrease over time to establish a test of recent selection. They surveyed the linkage among nearby SNPs to determine whether a variant has increased rapidly in frequency during the recent past. The sensitivity of this test depends on the SNP coverage of the genome. At present, SNP coverage is very good for variants with moderate to high frequencies, so although low-frequency selected variants (those with less than a 5 - 10 percent global frequency) were missed by the current survey, it has found a huge number of selected loci.

In conclusion, we have introduced a simple probabilistic method to detect unusual genetic architectures associated with recent selection that does not require haplotype information. It is, therefore, suitable for large chromosomal scans with large population samples. Homo sapiens have undoubtedly undergone strong recent selection for many different phenotypes, including but certainly not limited to the general categories we have defined in this work (Fig. 5). Such inferred selective events are not rare (Fig. 3). The numbers obtained, however, are similar to estimated numbers obtained for artificial selection (by humans) on the maize genome (45). Given that most of these selective events likely occurred in the last 10,000 40,000 years, a time of major population expansion out of Africa followed by regional shifts from huntergatherer to agrarian societies, it is tempting to speculate that gene culture interactions directly or indirectly shaped our genomic architecture (46, 47). As such, we suggest that such recently selected alleles may provide
useful "markers" for investigating the evolutionary migrations of our species, as an adjunct to studies using neutral markers. We also propose that many of these alleles, because of their high prevalence and recent selection, should be considered likely "functional candidates" for association with human variability and the common disorders afflicting humankind.

They also assign the loci with evidence of recent selection to different functional categories. Pathogen-host interaction loci have a high representation in the recently selected genes, as do genes related to protein and gene metabolism. And this:

One of the more intriguing categories overrepresented in inferred selective events is neuronal function. We define this category to include a diverse assortment of genes, including the serotonin transporter (SLC6A4), glutamate and glycine receptors (GRM3, GRM1, and GLRA2), olfactor y receptors (OR4C13 and OR2B6), synapse-associated proteins (RAPSN), and a number of brain-expressed genes with largely unknown function (ASPM, RNT1; see Fig. 4).

It would be hard for me to overstate how important this paper is. Even if it weren't central to my own current research (about which you will just have to wait for more), it brings home the vast importance of adaptive change during the most recent parts of human evolution.


Wang ET, Kodama G, Baldi P, Moyzis RK. 2006. Global landscape of recent inferred Darwinian selection for Homo sapiens. Proc Nat Acad Sci USA 103:135-140. Abstract