john hawks weblog

paleoanthropology, genetics and evolution

positive selection

  • Selection is for the dogs

    Wed, 2013-01-23 16:17 -- John Hawks

    I was really pleased to see the new paper by Erik Axelsson and colleagues [1] on the pattern of recent selection on domesticated dogs. As we began working on recent selection in humans, we expected that domesticated animals might exhibit similar patterns genome-wide. They are among the organisms most similar to humans in demography and ecological change: Domesticated animals have all undergone rapid shifts in diet, predator ecology and social dynamics after domestication, at the same time that they have experienced rapid increases in population size. That is a recipe for rapid adaptive evolution.

    As in humans, the paper shows that dogs were selected strongly for a new agricultural diet. Just as in humans who descend from early agriculturalists, dogs have extensive duplication of the amylase gene. Humans express amylase in saliva, but as explained in the paper dogs only produce amylase in the pancreas, where it digests starches intestinally. Where this paper gets really exciting is when the authors began to investigate the entire metabolic pathway underlying starch digestion. The amylase gene AMY2B underwent duplications similar to those in humans, and not found in wolves. Two other genes that interact in starch digestion and glucose uptake did not undergo duplication but do show near-fixed haplotypes in dogs that are absent or very rare in wolves, and the paper shows using both biochemistry and phylogenetic comparison with herbivores and omnivores that the dog versions of these genes increase enzymatic activity on starches and glucose uptake.

    In conclusion, we have presented evidence that dog domestication was accompanied by selection at three genes with key roles in starch digestion: AMY2B, MGAM and SGLT1. Our results show that adaptations that allowed the early ancestors of modern dogs to thrive on a diet rich in starch, relative to the carnivorous diet of wolves, constituted a crucial step in early dog domestication. This may suggest that a change of ecological niche could have been the driving force behind the domestication process, and that scavenging in waste dumps near the increasingly common human settlements during the dawn of the agricultural revolution may have constituted this new niche6. In light of previous results describing the timing and location of dog domestication, our findings may suggest that the development of agriculture catalysed the domestication of dogs.

    So for those of you wondering why we feed dogs kibble instead of raw beef, here's the reason.

    After finding candidate regions for selection across the genome, the authors ran a gene ontology analysis to see whether functional gene loci in these regions fall into any consistent categories. Along with the metabolic and digestive genes, they found

    The most conspicuous cluster (11 terms) relates to the term ‘nervous system development’. The eight genes belonging to this category (Supplementary Tables 7 and 8) include MBP, VWC2, SMO, TLX3, CYFIP1 and SH3GL2, of which several affect developmental signalling and synaptic strength and plasticity. We surveyed published literature and identified 11 additional CDR genes with central nervous system function (Supplementary Table 9), adding to a total of 19 CDRs that contain brain genes. These findings support the hypothesis that selection for altered behaviour was important during dog domestication and that mutations affecting developmental genes may underlie these changes7.

    That is a similar story to humans. We don't know what such genes might do, and unraveling what difference these genes may have made to behavior will take a lot of additional understanding of developmental biology. Much easier to work out what is going on when you can examine the biochemistry in vitro as with starch enzymes.

    The paper also makes clear why finding evidence of selection can be a difficult empirical problem at the moment:

    Uniquely placed sequence reads from pooled DNA representing 12 wolves of worldwide distribution and 60 dogs from 14 diverse breeds (Supplementary Table 1) covered 91.6% and 94.6%, respectively, of the 2,385 megabases (Mb) of autosomal sequence in the CanFam 2.0 genome assembly11. The aligned coverage depth was 29.8× for all dog pools combined and 6.2× for the single wolf pool (Supplementary Table 1 and Supplementary Fig. 1). We identified 3,786,655 putative single nucleotide polymorphisms (SNPs) in the combined dog and wolf data, 1,770,909 (46.8%) of which were only segregating in the dog pools, whereas 140,818 (3.7%) were private to wolves (Supplementary Table 2). Similarly we detected 506,148 short indels and 26,619 copy-number variations (CNVs) (Supplementary Files 1 and 2). We were able to experimentally validate 113 out of 114 tested SNPs (Supplementary Table 3 and Supplementary Discussion, section 1).

    If that sounds confusing, that's because it is confusing. Right now whole-genome sequencing is not yet routine, and whole-exome sequencing is not routine for creatures other than people. So maximizing the available data means working with partial genomes at varying levels of coverage, often accumulated for other purposes by other research groups using different sequencing platforms. Verifying sequence differences is not trivial. Generating a sample of gene sequences from many individuals is challenging, particularly as different individuals may be covered or not for different parts of their genomes.

    Studying selection requires a fairly large sample of genomes. This paper establishes evidence of selection on a few things in which domesticated dogs are mostly the same, and all are different from wolves. In other words, these are "complete sweeps" or "near-complete sweeps", in which a new genetic variant has become mostly fixed within the domesticated dog sample. A larger sample of dogs would be able to test selection with a broader range of strength and initial date, including "partial sweeps" and selection on standing variation that may have already existed in ancestral wolves before being subject to selection in domesticated dogs. So this paper opens a new area of inquiry on the causes of domestication without ruling out that we will discover much, much more about the history of selection in dogs.

    One really cool possibility is that we will uncover convergent or parallel patterns of selection in dogs with different geographic origins. Already we know that body size and pigmentation have been subject to selection in different dog breeds, and that single genes transferred across breeds have been important parts of that process. There are a few cases in humans where the extensive geographic dispersal of a single adaptive variant can explain the present distribution of a trait. But in many more cases, different human groups have attained traits by parallel selection on different genetic variants. Because humans control the breeding of dogs and traded dogs across long distances in historic times, we may find that dogs are much less affected by parallelism and much more by long-distance gene flow than humans. But we won't know until we put that hypothesis to the test.


    References

    Synopsis: 
    A paper finds evidence of recent selection on starch digestion in dog domestication.
  • Building bigger dolphin brains

    Tue, 2012-09-11 18:14 -- John Hawks

    Ed Yong reports on a new study demonstrating a history of positive selection on the gene ASPM in cetaceans. Bruce Lahn's group previously showed that this gene has been positively selected in primate lineages, including recent humans: "Same gene involved in bigger brains of dolphins and primates".

    Now, Shixia Xu from Nanjing Normal University has found that a gene called ASPM played an important role in the evolution of cetacean brains. The gene shows clear signatures of adaptive change at two points in history, when the brains of some cetaceans ballooned in size. But ASPM has also been linked to the evolution of bigger brains in another branch of the mammal family tree – ours. It went through similar bursts of accelerated evolution in the great apes, and especially in our own ancestors after they split away from chimpanzees.

    It seems likely that both primates and cetaceans—the intellectual heavyweights of the animal world—both owe our bulging brains to changes in the same gene. “It’s a significant result,” says Michael McGowen, who studies the genetic evolution of whales at Wayne State University. “The work on ASPM shows clear evidence of adaptive evolution, and adds to the growing evidence of convergence between primates and cetaceans from a molecular perspective.”

    Molecular mechanisms of convergence have proved to be very common in the evolution of different mammalian orders. Mechanistically, evolution seems to select the same pathways when the same general functional requirements are adaptive. It is interesting that cetaceans and primates have broadly similar social and communication constraints, but very different ecological constraints in other respects, such as diet, thermoregulation navigation and home range.

  • Modern humans in with a whimper

    Fri, 2012-07-20 16:10 -- John Hawks

    A short, open access review paper by Isabel Alves and colleagues [1] registers two important points:

    Until recently, the out-of-Africa model of human evolution was favoured by most genetic analyses, but this model collapsed when the sequencing of the Neanderthal genome revealed that 1%–3% of the genome of Eurasians was of Neanderthal origin. At the same time, refined analyses of modern human genomic data [1]–[3] have changed our view of evolutionary forces acting on our genome. While most people assumed that the out-of-Africa expansion had been characterized by a series of adaptations to new environments [4]–[6] leading to recurrent selective sweeps [7], our genome actually contains little trace of recent complete sweeps [2], [3], [8] and the genetic differentiation of human population has been very progressive over time, probably without major adaptive episodes [9].

    I disagree slightly with the latter point about selection -- in fact, we have abundant signs of recent positive selection in the genome, but those signs are nearly all very recent partial sweeps in different human populations. Complete sweeps and near-complete sweeps are indeed few, suggesting that there was relatively little directional adaptive evolution associated with the "origin of modern humans." Measuring by genetic change, agriculture was many times more important than the appearance of modern humans throughout the world. The important point with respect to archaic humans is that there are precious few genetic changes shared by all (or even most) humans today, that are not also shared with Neandertals, Denisovans, or plausible other archaic human groups (such as archaic Africans).

    That of course follows from the fact that a fraction of today's gene pool actually comes from those ancient groups. Their variation is (by and large) human variation..

    Most anthropologists do not yet fully understand this genetic picture. We cannot presently define "human" in a genetic sense without including Neandertals.

    Alves and colleagues discuss some important corollaries of the two key observations above. An important one:

    Even though our simulated scenario is unrealistically simple, it is likely that differential admixture should affect population genetic affinities under more complex models of population differentiation. The proper interpretation of human genetic affinities should thus probably be re-evaluated in the light of these results.

    A lot of studies of human genetic variation have assumed no mixture with archaic humans. Such studies are now obsolete. Whole-genome evidence is coming online, and with that evidence we must apply new analytical methods that incorporate more complex demographic hypotheses. These more complex models will require greater attention from anthropologists and population geneticists, but they should give us a more accurate picture of the causes and background of human diversity.


    References

  • Gorilla genomics and hearing evolution

    Thu, 2012-03-08 00:37 -- John Hawks

    The Nature News story on the gorilla genome includes this section relevant to the evolution of hearing in gorillas and humans:

    Some of these rapid changes are puzzling: the gene LOXHD1 is involved in hearing in humans and was therefore thought to be involved in speech, but the gene shows just as much accelerated evolution in the gorilla. “But we know gorillas don’t talk to each other — if they do they’re managing to keep it secret,” says Scally.

    This weakens the connection between the gene and language, says [Wolfgang] Enard. “If you find this in the gorilla, this option is out of the window.”

    This is one of the genes that I have been working on with reference to its acceleration on the human lineage. It is a mistake to view the evolution of hearing to be directed specifically to language; instead human and gorilla lineages are both adapting to an aural environment different from ancestral hominoids. In both these lineages, there was an increase in body size and reduction in the mean frequency of vocalizations, enough to prompt adaptive changes. In humans, we have had additionally the addition of language as a communication system, which has its own auditory requirements. The connection with language is only indirect, in that human-specific changes to this and other genes provide evidence of adaptive change in the auditory system.

  • When genes break: validating loss-of-function variants

    Fri, 2012-02-17 12:20 -- John Hawks

    Daniel MacArthur and colleagues have an important paper in Science, titled "A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes" [1]. They took 1000 Genomes Project pilot data and systematically looked at every allelic variant in the sample that appeared to cause the loss of function of a protein-coding gene. Mutations that de-activate genes in this way are not rare, but they are often eliminated from the population rapidly by purifying natural selection, because the normal function of a protein is necessary to survival or reproduction. However, not every protein is so important, and MacArthur and colleagues confirmed that more than 1200 alleles in this sample genuinely occur in one or more of the 1000 Genomes Project individuals.

    Some of these are common but most occur in fewer than 2% of individuals in the sample, as expected if purifying selection were affecting many of them.

    MacArthur is one of the authors of the Genomes Unzipped group blog, and has written a great summary and introduction to his research paper: "All genomes are dysfunctional: broken genes in healthy individuals". It's free and well-written, so it will probably work better for many readers than the original paper.

    Science is running a commentary to accompany the research article, by Lluis Quintana-Murci [2]. This paragraph encompasses a lot of the numerical facts about these loss-of-function variations, and discusses the idea that some of them were positively selected -- that is advantageous in recent human populations.

    MacArthur et al. estimated that, depending on ethnic background, each individual's genome carries 26 to 37 variants that introduce a stop codon (which signals the termination of translation of nucleic acids into protein), with up to 6 present in the homozygous state. When considering other types of LoF variants, including those that disrupt splice-sites, large deletions, or insertions or deletions of nucleotides that change the DNA reading frame, the total number per individual is extended to 103 to 121, with ∼20 present in homozygosity. A large proportion of LoF variants were enriched in low-frequency alleles, suggesting that the removal of deleterious alleles has prevented them from increasing to high frequencies. Furthermore, some have already been associated with severe human diseases, supporting the less-is-less hypothesis. Other LoF variants, which can reach higher population frequencies, fall into poorly evolutionarily conserved genes or belong to multigene families displaying high paralogous sequence identity. This suggests that the functions of the corresponding genes are highly redundant, explaining their greater tolerance for LoF variants and supporting a less-is-nothing scenario. Also, although no substantial enrichment in positive selection signals was observed among LoF variants at the genomewide level, 20 of them fell into regions displaying signatures of positive selection, as predicted by the less-is-more hypothesis, suggesting that they may have conferred a selective advantage in human evolution.

    Common loss-of-function variants that are evolutionarily recent are very interesting to us as we work to understand the changes that accompanied modern human origins and the later invention and spread of agriculture. I am really excited that these analyses were carried out using the 1000 Genomes samples because that means we can use the sequence data to estimate the ages of these functional losses. We can do quite a lot better than to say that they "fall into regions displaying signatures of positive selection": In fact, we can determine whether these variants themselves were selected, or hitchhiked to high frequency along with some other variant that was selected.

    Many of loss-of-function variants are in genes that may not matter much to selection. Olfactory receptor genes, for example, comprise a very large family with recurrent duplications and pseudogenizations during primate evolution. We have scores of olfactory receptor pseudogenes, many of which are polymorphic in living human populations. Some may continue to make a noticeable difference to the phenotype, such as the asparagus-urine-smelling polymorphism. But many are probably invisible to us. Still, a few of these do look like they've been positively selected in recent human populations.

    Sometimes less really is more.


    References

    Synopsis: 
    A "punishing" resequencing project validates mutations in the 1000 Genomes Project individuals that deactivate protein-coding genes.
  • Mailbag: Exaptation and standing variation

    Tue, 2012-01-24 12:11 -- John Hawks
    This may sound like a dumb question, but I am trying to understand the difference between “selection on standing variation” and the concept of “exaptation”. They seem to mean the same thing? Am I missing something?

    Thanks for any help you can provide.

    No problem. Exaptation almost always refers to a phenotypic trait, and specifically the case where it used to do one thing, and has changed because of natural selection for some other function.

    Selection on standing variation is usually just a contrast with selection on a new mutation. A new mutation that comes under positive selection will rapidly increase in frequency and thereby generate lots of signs we can recognize, for example genetic hitchhiking.

    Selection on an old mutation that has already existed in the population for a long time (and is therefore "standing" variation) also can cause the mutation to increase in frequency, but this will not necessarily cause hitchhiking or other easily recognizable patterns, because copies of the mutation that have existed in the population for a long time probably are not all linked to the same set of mutations at other loci.

    Practical example: Lactase persistence. We know that lactase persistence in Europeans is selection on a new mutation. If people carrying the key lactase persistence mutation did not all share near-identical region of chromosome 2 around that mutation, we would suspect it was selection on standing variation (when we learned about lactase persistence more than 10 years ago, this was not resolved yet and many geneticists thought it would turn out to be standing variation). Lactase persistence is *arguably* an exaptation, because it uses the mechanism that evolved for one purpose (babies digesting mothers' milk) and changed it under selection for another purpose (adults digesting cow milk).

  • Did Denisovans have genetic adaptations to high altitude?

    Tue, 2011-06-21 12:26 -- John Hawks

    We don't really know the extent of territory that might have been occupied by the population represented by the Denisova genome. The signs of mixture into the Melanesian/New Guinea population suggests that the Denisova individual shared many genes with people who lived somewhere along the South or Southeast Asian coast. Denisova itself, however, is in the Altai Mountains.

    Last week I wrote some thoughts about the possible introgression of HLA alleles from Denisovans into more recent populations. HLA genes pose many problems for testing this hypothesis -- including the difficulty of identifying the alleles in a low-coverage genome and the high chance of incomplete lineage sorting of ancient alleles in recent populations. Other parts of the genome in principle may be much easier to find evidence of introgression.

    If an allele that originated in Denisovans had some advantage in later populations, it might today be found very widely spread across Asian populations, even if the amount of Denisovan ancestry in most of these populations is very small. This was the theme of my paper with Gregory Cochran several years ago [1] ("The inevitability of introgression"). The probability that a single copy of an advantageous allele will survive and increase in the population is roughly 2s, where s is the fitness advantage in a heterozygote carrying the allele. A relatively small number of copies of an allele might have entered a recent human population by introgression from some ancient population, but these few copies would have a high likelihood of surviving and increasing in frequency, possibly toward fixation. HLA alleles could easily be in this category, but the challenges identifying them and high chance of ILS make the hypothesis hard to test.

    Another strategy is to identify genes that have been selected in recent populations and see if the linked haplotype shows up in the Denisova genome. Recently, several studies have attempted to identify genes related to high altitude adaptation in Tibetans. At least some Denisovans lived in the mountainous areas of central Asia, and so I'm curious whether they might have some alleles adapted to this environment. The Altai are not nearly as high as the Tibetan plateau (in fact Denisova itself is not much higher than western Kansas), and we don't know how long Denisovan people might have been resident in Central Asia, but if we're looking for selected alleles there are some strong candidates in this category of genes.

    So let's look at some of them. All positions here are mapped to the hg18 human genome assembly.

    Yi and colleagues [2] find a strong frequency difference between China and Tibet for a SNP in EPAS1, at chr2:46441523. The derived allele, G, has a frequency of 87% in their Tibetan sample but only 9% in their Chinese sample (and zero in Denmark). The Denisova genome is represented by two reads at this site, both C, the ancestral allele. We don't necessarily have to accept that this is a functional site, but as the marker most strongly differentiating the high altitude population it would likely be closely linked to any functional variant. So the Denisova allele suggests that this ancient individual lacked whatever functional variant might currently be common in Tibetans for this gene.

    Simonson and colleagues [3] took a different approach, focusing on candidate genes that they argued a priori were likely to be involved in adaptation to hypoxia because of their physiological role. They evaluated these genes for evidence of positive selection in Tibetans, finding several candidate haplotypes for recent adaptive evolution to high altitude.

    For each of five genes, they identified a three-locus "core selection haplotype" that shows signs of selection within Tibet. The purpose of these three-SNP haplotypes was to examine the correlation of haplotypes and phenotypes in a sample of people where physiological data were taken. So they are intended as tags, not as comprehensive and unique identifiers of the candidates at the genetic level. But the three-locus haplotypes are the only ones reported in the supplement to the paper, so that's what I have to compare.

    EGLN1: The three-allele candidate selected haplotype consists of A at chr1:229793717, T at chr1:229667980 and T at chr1:229665156. Denisova apparently has the selected haplotype with A at chr1:229793717 (2/2 reads), T at chr1:229667980 (3/3 reads) and T at chr1:229665156 (1/1 reads). However, it is not obvious whether this is significant. All three alleles on the candidate selected haplotype are the ancestral (present in chimpanzees and gorillas) alleles, which are much more likely to show up in the archaic genomes than derived alleles. These ancestral alleles are also present in several of the whole genomes provided along with the Denisova sequence reads. So it's not clear to me how good a candidate for selection the haplotype really is.

    CYP17A1: Here the three-allele candidate selected haplotype includes G at chr10:104568521, G at chr10:104594906, and C at chr10:104517420. Denisova has C (5/5 reads, ancestral), T (4/4 reads, ancestral), and C (3/3 reads, ancestral). Again, Denisova has the all-ancestral haplotype here, but in this case it is not the selection candidate.

    PTEN: The selected candidate haplotype is G at chr10:89770364, C at chr10:89790851 and C at chr10:89778618. Denisova has G (5/5 reads, ancestral), T (2/2 reads, derived), and C (4/4 reads, ancestral). Not selected.

    I always find it interesting when the Denisova genome has a derived allele at an interesting site -- it is the shared derived alleles between these archaic genomes and living people that constitute evidence of genetic persistence of the archaic people. No single site carries that information (any one allele may be shared by incomplete lineage sorting), but I still like to note them. The Papuan and half the Native American, Sardinian and Mongolian reads share the derived T at chr10:89790851 with Denisova.

    HMOX2: The candidate selected haplotype has C at chr16:4456093, T at chr16:4465266, T at chr16:4442515. Denisova has this candidate selected haplotype: C (3/3 reads, ancestral), T (4/4 reads, ancestral), T (5/5 reads, ancestral). That haplotype may also be in the Cambodian whole genome accompanying the Denisova data, and can't be ruled out for the Mongolian. Again, the all-ancestral haplotype and wider distribution argue against the hypothesis that this haplotype was specifically selected in Tibet.

    PPARA: The core candidate selected haplotype has A at chr22:44827140, C at chr22:44832376 and T at chr22:44842095. Denisova has A (8/8 reads, ancestral), A (5/5 reads, ancestral), and C (2/2 reads, ancestral). Notice again, Denisova has the all-ancestral haplotype. As an ancient sequence, we are finding this is the usual case, human-derived alleles are just rarer in this genome.

    OK, where are we? Out of six genes that are candidates for selection on altitude adaptation in Tibetans, the Denisova genome has two -- at ELGN1 and HMOX2. In both cases, the core selected haplotype consists entirely of ancestral alleles, and so I think they are actually poor evidence of introgression on the surface. I would test them by looking at more SNPs linked to the presumed selected haplotype, hoping to find some derived SNPs shared by the Denisovan genome and the presumed selected haplotypes. Unfortunately, publications do not yet routinely report long haplotypes, so it will take some more digging to test these cases.


    References

    Synopsis: 
    Noodling through the Denisova genome data for signs of candidate altitude adaptations.
Subscribe to positive selection

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.