john hawks weblog

paleoanthropology, genetics and evolution

altitude

  • Anthropology 105, lecture 5: Hemoglobin

    Mon, 2012-02-13 13:12 -- John Hawks
    Synopsis: 
    Hemoglobin is the oxygen transport system in the blood, with a unique evolutionary history.

    In this lecture, I do a bit of a departure by discussing a body part that is microscopic: the hemoglobin molecule that carries oxygen inside of our red blood cells.

    The lecture covers the genetics of the beta globin cluster, including the origin of beta and alpha globin subunits by gene duplication in ancient vertebrates, the convergence of hemoglobin in the jawed fishes with the oxygen transport system in hagfish and lampreys, the changes in the pattern of gene duplications in the beta globin cluster in anthropoid primates versus ancestral eutherians and some prosimian primates, and the importance of hemoglobin expression in human adaptations to altitude. As the lecture gets going, I give some more detail about the geological timescale and how it relates to the origin of anthropoids, following up on the short introduction in the last lecture.

    Study questions: 
    • Do you think other kinds of animals have systems for oxygen transport that involve molecules similar to hemoglobin?
    • The human adaptation to altitude differs in different populations that live in high places. Why do you think this is the case?
    • The beta globin cluster includes pseudogenes in different species of primates. Why do these nonfunctioning genes persist in the genome if they are just junk?
  • Did Denisovans have genetic adaptations to high altitude?

    Tue, 2011-06-21 12:26 -- John Hawks

    We don't really know the extent of territory that might have been occupied by the population represented by the Denisova genome. The signs of mixture into the Melanesian/New Guinea population suggests that the Denisova individual shared many genes with people who lived somewhere along the South or Southeast Asian coast. Denisova itself, however, is in the Altai Mountains.

    Last week I wrote some thoughts about the possible introgression of HLA alleles from Denisovans into more recent populations. HLA genes pose many problems for testing this hypothesis -- including the difficulty of identifying the alleles in a low-coverage genome and the high chance of incomplete lineage sorting of ancient alleles in recent populations. Other parts of the genome in principle may be much easier to find evidence of introgression.

    If an allele that originated in Denisovans had some advantage in later populations, it might today be found very widely spread across Asian populations, even if the amount of Denisovan ancestry in most of these populations is very small. This was the theme of my paper with Gregory Cochran several years ago [1] ("The inevitability of introgression"). The probability that a single copy of an advantageous allele will survive and increase in the population is roughly 2s, where s is the fitness advantage in a heterozygote carrying the allele. A relatively small number of copies of an allele might have entered a recent human population by introgression from some ancient population, but these few copies would have a high likelihood of surviving and increasing in frequency, possibly toward fixation. HLA alleles could easily be in this category, but the challenges identifying them and high chance of ILS make the hypothesis hard to test.

    Another strategy is to identify genes that have been selected in recent populations and see if the linked haplotype shows up in the Denisova genome. Recently, several studies have attempted to identify genes related to high altitude adaptation in Tibetans. At least some Denisovans lived in the mountainous areas of central Asia, and so I'm curious whether they might have some alleles adapted to this environment. The Altai are not nearly as high as the Tibetan plateau (in fact Denisova itself is not much higher than western Kansas), and we don't know how long Denisovan people might have been resident in Central Asia, but if we're looking for selected alleles there are some strong candidates in this category of genes.

    So let's look at some of them. All positions here are mapped to the hg18 human genome assembly.

    Yi and colleagues [2] find a strong frequency difference between China and Tibet for a SNP in EPAS1, at chr2:46441523. The derived allele, G, has a frequency of 87% in their Tibetan sample but only 9% in their Chinese sample (and zero in Denmark). The Denisova genome is represented by two reads at this site, both C, the ancestral allele. We don't necessarily have to accept that this is a functional site, but as the marker most strongly differentiating the high altitude population it would likely be closely linked to any functional variant. So the Denisova allele suggests that this ancient individual lacked whatever functional variant might currently be common in Tibetans for this gene.

    Simonson and colleagues [3] took a different approach, focusing on candidate genes that they argued a priori were likely to be involved in adaptation to hypoxia because of their physiological role. They evaluated these genes for evidence of positive selection in Tibetans, finding several candidate haplotypes for recent adaptive evolution to high altitude.

    For each of five genes, they identified a three-locus "core selection haplotype" that shows signs of selection within Tibet. The purpose of these three-SNP haplotypes was to examine the correlation of haplotypes and phenotypes in a sample of people where physiological data were taken. So they are intended as tags, not as comprehensive and unique identifiers of the candidates at the genetic level. But the three-locus haplotypes are the only ones reported in the supplement to the paper, so that's what I have to compare.

    EGLN1: The three-allele candidate selected haplotype consists of A at chr1:229793717, T at chr1:229667980 and T at chr1:229665156. Denisova apparently has the selected haplotype with A at chr1:229793717 (2/2 reads), T at chr1:229667980 (3/3 reads) and T at chr1:229665156 (1/1 reads). However, it is not obvious whether this is significant. All three alleles on the candidate selected haplotype are the ancestral (present in chimpanzees and gorillas) alleles, which are much more likely to show up in the archaic genomes than derived alleles. These ancestral alleles are also present in several of the whole genomes provided along with the Denisova sequence reads. So it's not clear to me how good a candidate for selection the haplotype really is.

    CYP17A1: Here the three-allele candidate selected haplotype includes G at chr10:104568521, G at chr10:104594906, and C at chr10:104517420. Denisova has C (5/5 reads, ancestral), T (4/4 reads, ancestral), and C (3/3 reads, ancestral). Again, Denisova has the all-ancestral haplotype here, but in this case it is not the selection candidate.

    PTEN: The selected candidate haplotype is G at chr10:89770364, C at chr10:89790851 and C at chr10:89778618. Denisova has G (5/5 reads, ancestral), T (2/2 reads, derived), and C (4/4 reads, ancestral). Not selected.

    I always find it interesting when the Denisova genome has a derived allele at an interesting site -- it is the shared derived alleles between these archaic genomes and living people that constitute evidence of genetic persistence of the archaic people. No single site carries that information (any one allele may be shared by incomplete lineage sorting), but I still like to note them. The Papuan and half the Native American, Sardinian and Mongolian reads share the derived T at chr10:89790851 with Denisova.

    HMOX2: The candidate selected haplotype has C at chr16:4456093, T at chr16:4465266, T at chr16:4442515. Denisova has this candidate selected haplotype: C (3/3 reads, ancestral), T (4/4 reads, ancestral), T (5/5 reads, ancestral). That haplotype may also be in the Cambodian whole genome accompanying the Denisova data, and can't be ruled out for the Mongolian. Again, the all-ancestral haplotype and wider distribution argue against the hypothesis that this haplotype was specifically selected in Tibet.

    PPARA: The core candidate selected haplotype has A at chr22:44827140, C at chr22:44832376 and T at chr22:44842095. Denisova has A (8/8 reads, ancestral), A (5/5 reads, ancestral), and C (2/2 reads, ancestral). Notice again, Denisova has the all-ancestral haplotype. As an ancient sequence, we are finding this is the usual case, human-derived alleles are just rarer in this genome.

    OK, where are we? Out of six genes that are candidates for selection on altitude adaptation in Tibetans, the Denisova genome has two -- at ELGN1 and HMOX2. In both cases, the core selected haplotype consists entirely of ancestral alleles, and so I think they are actually poor evidence of introgression on the surface. I would test them by looking at more SNPs linked to the presumed selected haplotype, hoping to find some derived SNPs shared by the Denisovan genome and the presumed selected haplotypes. Unfortunately, publications do not yet routinely report long haplotypes, so it will take some more digging to test these cases.


    References

    Synopsis: 
    Noodling through the Denisova genome data for signs of candidate altitude adaptations.
  • More on Tibet, demography and selection

    Tue, 2010-07-06 12:30 -- John Hawks

    My post about the Tibetan high altitude selection story last Friday summarized the research and included some criticism of the demographic model applied in the paper by Yi and colleagues. This weekend, I had some correspondence from study coauthor Rasmus Nielsen.

    Nielsen was kind enough to provide a lot of information about how they arrived at their demographic model. Also, his comments are of substantial interest as a perspective on science journalism. I have posted them in their entirety, and have added my own perspective below them. Click through to read on:

    Nielsen:

    I read your blog on the EPAS1 gene. You write that my answers to Nicholas Wade in the NYT article are lame. I couldn't agree more. Reading the quotes Wade put together from a long phone interview and two replies to follow-up requests by email for further information - I could get quite convinced about my lameness myself. Let me give you our side of the story:

    (1) Regarding effective population size estimation: we fit several different demographic models to the data. The best fitting one according to the Akaike information criterion was chosen in the paper to use for the coalescence simulations. But notice that we made no strong claims about population sizes in the paper. They appear in the supplementary information to ensure that other people could reproduce our study. The main objective for fitting a demographic model was to allow us to perform coalescence simulations under a model that fit the data well. The model described in the paper fits the data very well and was the best fitting model we could find. As such - it was our best option for how to calculate p-values - and was certainly, in our opinion, better than providing no p-values, or use p-values based on some simpler model that did not fit the data. Had we used another model with different values of Ne, we would have obtained less accurate p-values.

    However, we did not interpret the effective population size estimates strongly - mostly because we do not believe they have very much to do with census population sizes. I would argue that this is true for both this study and other similar studies on other populations. Estimated effective population sizes are not only a function of changes in population size, natural selection, male/female ratios and variance in offspring number. They also rely on the structure of the populations. A population organized into many small sub-populaiton might have an Ne that is substantially larger than N, while a population without sub-structure might have a much smaller Ne than the census size if there has been fluctuations in the population size or higher variance in offspring number than that expected from a Poisson. Therefore, it is wrong to interpret estimates of Ne as estimates of actual number of individuals - or to believe that there is some simple general relationship between effective population size and true number of individuals. For this reason, we did purposefully not provide an interpretation of the estimates of Ne in terms of actual values of N and I feel that our work is not being represented accurately by arguing that we obtained estimates of the number of Han individuals or Tibetans living 3000 years ago. That does not mean that we cannot try to understand why we get such a small Ne for Hans 3000 years ago and such a large estimate for Tibetans. The most likely explanation for the Hans is that there have been other bottlenecks that we have not modeled - before or after. If we estimate Ne for Europeans today using a model that does not take all the bottlenecks into account, we get estimates of about 5-15,000 individuals. I don't think anybody would claim that there are only 5-15,000 Europeans alive today. Similarly, our estimate for Ne for the Hans 3000 years ago is in the hundreds presumably because there were some previous bottlenecks that we have not modeled. Ancestral bottlenecks can be extremely hard to date from frequency spectrum data - and you end up getting the same likelihood for a long time period with small population sizes and a short time period with extremely small population sizes. The have been several published papers making this point, the first one I believe to be Adams and Hudson. 2004. Genetics 168:1699-171. Changing our model to having a larger population size 3000 years ago but with an appropriately modeled preceding bottleneck would produce more or less the same p-values - because it would produce the same expected frequency spectrum (or at least something very similar).

    Regarding the large Tibetan population size, it may likely be affected by population structure within Tibet and/or by admixture with other individuals. Both of these factors would inflate the estimate of Ne. We did try some other models - but ended choosing this particular model because if fit the data the best. It seemed, therefore, most appropriate for the coalescence simulations. Again, I want to emphasize that we did not attempt to estimate number of individuals living in particular places during particular times - we were interested in finding a model which fit the distribution of allele frequencies well so that we at least could make some attempt at estimating relevant p-values. We never claimed that there were just a few hundred Han individuals alive 3000 years ago - in the same way that we are not arguing that there are only 5-15,000 Europeans alive today.

    (2) Regarding the divergence time: none of the models we fitted could explain the data with a divergence time much larger than 3000 years. If you look at the figure in the paper, you can see that there is an extremely strong correlation between the allele frequencies in Hans and in Tibetans. This is very difficult to explain with a long divergence time of genetically separated populations. To maintain such a strong correlation for a large amount of time, the Tibetan population (and the Han population) would have to be enormously large - and this is incompatible with the observed levels of variation in the population. We could not find a model that fit the data and which included a large divergence time no matter what we did. But there are of course many factors going into these estimates - including a calibration of number of mutations with the chimp, a number of demographic assumptions, and assumptions regarding generation times. If we are making errors on these assumptions - the estimates could change in one way or another. For that reason I feel it is most conservative to avoid arguing that our analysis definitely rejects that the divergence time could be 6000 years. The main objective of the paper was after all to investigate the evolution of altitude adaptation. The demographic analysis was there mostly to allow us to do the coalescence simulations - but we also used them to make the argument that this selection has occurred quite recently - and not say 10k or 20k years ago. It is quite clear from the data that such long divergence times cannot be supported by the data

    This being said, we of course want to know if this short genetic divergence time is compatible with other evidence. I would argue that it is. There has been several migrations into Tibet. It is entirely compatible with the archaeological record that individuals living in Tibet today genetically mostly are descendants of migrants arriving around 3000 years ago even though the first migrants appeared much earlier. In terms of the selection - and when it has been acting - we want to determine when selection acted to increase the frequency on EPAS1 mutations in the ancestry of the individuals living in Tibet today. If they are genetically descendants of individuals migrating into Tibet just a few thousand years ago - then this is the relevant data for describing when selection has been acting on the EPAS1 mutations. As an aside I should also say that this has nothing to do with when the mutation(s) arose. Selection has in this case most likely been acting on standing variation.

    You argue in your blog that more could be done with this data in terms of demography. We agree. The paper was about altitude adaptation not demography. We are still working on the data and are hoping to produce a follow-up paper on the demographic analyses. We weren't sure how much interest there would be in the results - but the interest from you and other people in this is certainly a motivation to keep working on it as hard as possible.

    I hope you will post this reply on your blog and comment on it. If you do so - I would ask that you post it in its entirety. I learned a lot from the interview with Wade. I certainly now understand why politicians keep giving the same 2-line reply over and over again to journalists asking them questions. If a journalist talks sufficiently long with an interviewee - it will be possible for them to find some sentences that they can put together in some way to make the interviewee look foolish - if that's what they want to do.

    Me:

    Thanks so much for writing with this! I will of course post your comments, and I appreciate very much the time you spent detailing the work, especially on a holiday weekend.

    What you've written here basically agrees with my take on the text of your paper; the demographic model is useful as a test because it is conservative, it is not an attempt at population history. I've reviewed effective size at some length [readers can find a review that I wrote, and I can forward reprints on request]. As you write, this study does not differ substantially from many others in the use of effective size estimates.

    As an anthropologist I am very concerned at the proliferation of population models that are nonsensical from a demographic standpoint. Yes, the p-value will be much the same for EPAS1, but the model is hugely conservative with respect to anything with less extreme differentiation. Other studies are essentially alike; lowball demographic numbers are useful in their conservatism but give an incorrect view about the relation of demography and selection.

    Besides, you have to consider the mechanism by which the best-fit model has come to be so extreme. As you note, the effective size estimated under the assumption of neutrality actually will reflect the non-neutral dynamics across the exome. The HapMap doesn't give rise to anything like the model of an extreme and recent bottleneck that the exome data yield, yet of course both these genome-wide sets must have undergone the same demography. The difference is that the exome is limited to the coding fraction of the genome, pointing to selection on some (probably large) fraction of coding loci. The small effective size within the last 3000 years is mathematically equivalent to a statement that the data include genealogies with many coalescences in those 3000 years. Again, this doesn't happen in a population of hundreds of thousands of individuals unless there was rapid selection.

    So it seems to me that the data must reflect the high incidence of recent selection within mainland China. This is exactly what we expect based on the real demography of massive population growth across the same interval and adaptation to post-agricultural ecologies. Although the headline of the paper is about high altitude adaptation in Tibet, the real story is the massive selection in China of other genes.

    If this is correct, then I think there is much promising work to do by using real demographic estimates. Deriving the demographic model from the data themselves is really just throwing away useful information that is abundantly documented archaeologically and historically.

  • Fast selection in high altitude, but how fast?

    Fri, 2010-07-02 15:56 -- John Hawks

    Did the altitude of the Tibetan plateau lead to the fastest instance of human adaptation yet known?

    That's the claim in the new paper by Xin Yi and colleagues [1]:

    Given our estimate that Han and Tibetans diverged 2750 years ago and experienced subsequent migration, it appears that our focal SNP at EPAS1 may have experienced a faster rate of frequency change than even the lactase persistence allele in northern Europe, which rose in frequency over the course of about 7500 years (26). EPAS1 may therefore represent the strongest instance of natural selection documented in a human population, and variation at this gene appears to have had important consequences for human survival and/or reproduction in the Tibetan region.

    I have a significant criticism of that conclusion, but first I want to say I think this is really cool work. They sequenced 50 whole exomes of people of Tibetan ancestry. An exome is the coding fraction of the genome, leaving out the non-coding stuff. This let them do a genome-wide association including every SNP they found. As it turns out, the key gene (EPAS1) has no coding SNPs that differentiate strongly in these samples. It's an intronic SNP that shows a really large frequency difference (87% in Tibetans, 9% in Han Chinese). That's a really big difference.

    And it takes a big difference to test neutrality in this sample. Fifty exomes is a whole lot of sequencing, but it's really a small sample for finding selection. It takes a really big frequency change to exceed chance. Besides that, most new adaptive mutations will be missed because they haven't gotten off the ground yet. Finding one major allele that correlates strongly with population, and then doing the work to show its association with red blood cell production, that's all pretty neat stuff. This paper should be added to the paper last month by Cynthia Beall and colleagues [2], who also found an association with Tibetans and made a functional link with high altitude adaptation. This gene is part of the system that adapts people to hypoxia in the Tibet/Nepal area, although it certainly does not act alone and we don't yet know how the system works. It's a solid first step.

    OK, so what's my problem with the paper? Hypoxia is a strong selective agent, affecting performance, health, and -- maybe most important -- birth weight. As soon as people began living on the Tibetan Plateau, they were in a compromised environment. That makes this a really great example of recent selection associated with a novel environment. But the archaeological evidence suggests that people have been living in this environment for a lot longer than 3000 years. The population model in the paper is a mess.

    People have been living on the Tibetan Plateau for more than 15,000 years. They may have occupied the area intermittently before the Last Glacial Maximum, and certainly were in nearby medium-altitude areas of northwestern China before that time. The Paleolithic-era occupation of northeastern highland Tibet was reviewed by Madsen and colleagues [3] and Brantingham and colleagues [4]. Aldenderfer [5] reviewed what is known about Neolithic-era occupation of highland Tibet. Sites with ceramics, evidence of sedentary village occupation and domesticated animals occur between 4000 and 6500 calendar years B.P. That doesn't mean that today's Tibetan population derives entirely from these early Neolithic settlers or the even earlier Paleolithic occupants. But the archaeological record does show that the opportunity for genetic adaptation would have been present long before 3000 years ago.

    So there's a potential inconsistency. The inconsistency could be resolved by recognizing that selection is stochastic. Selection cannot start changing the frequency of an allele until after the mutation has occurred.

    The following passage comes from Nicholas Wade's account of the research, in the NY Times. Wade also picked up on the problem with the demography in the paper, and probed the authors about it:

    Geneticists have a more elastic view of dates than do archaeologists, and the estimate of a Han-Tibetan population split at 3,000 years ago could probably have been adjusted to 6,000 if the geneticists had taken any account of any other kind of evidence.

    Rasmus Nielsen, a Danish researcher at the University of California, Berkeley, did the statistical calculations for the Beijing study. “We feel fairly confident that something on the order of 3,000 years is correct,” he said. But in a later e-mail message, Dr. Nielsen said, “I cannot with confidence rule out that the divergence time is 6,000 instead of 3,000.”

    There is similar flexibility in the estimates of population sizes. The Beijing team calculates that at the time of divergence there were only 288 Han Chinese and 22,642 Tibetans. These estimates have bewildered archaeologists, given that rice cultivation in southern China started 10,000 years ago and that there was an extensive civilization by 3,000 years ago. Dr. Nielsen said that the figure of 288 people was meant simply to indicate a bottleneck in the Han population, meaning a time when it was very small, and that this bottleneck could just as easily have occurred 10,000 years ago.

    I think that's totally remarkable. "Geneticists have a more elastic view of dates than do archaeologists"! I think that phrase should be framed and hung in every classroom teaching anthropological genetics.

    Look at the expansion model. In what universe were there only 288 ancestors of Han Chinese people in the last 3000 years? We're talking about the late Bronze Age, here! This is just after the end of the Shang Dynasty, whose capital at Anyang had a walled area of 1000 hectares. That's 1000 soccer pitches full of city, within an empire that spanned the northern half of China.

    It is completely lame to claim that the model could represent a bottleneck as long ago as 10,000 years. You see, the size of the population determines the rate of differentiation under genetic drift. If the population was big, it shouldn't have changed very fast, so the present populations shouldn't be very different. Putting it into numbers, if there hasn't been a bottleneck for 10,000 years, then the divergence must be a lot older than 3000 years. Probably older than 10,000 years.

    These hypotheses can be tested directly with genetics, and the data are certainly rich enough now to do it. If they point to a genetic bottleneck in China during the last 10,000 years, we should be very, very surprised. Because then who was farming all the millet and rice, and domesticating pigs?

    Does it matter? For EPAS1, the timing really doesn't affect the interpretation of selection -- there's no way that drift made the populations as different as they are for this one locus. But it seems clear that this is not a new mutation because it has no long, linked haplotype around it that also differs in frequency in the two populations. Selection on a standing variant is indeed newsworthy, as these are hard to find. Since we don't have a long haplotype to date, the only way that we can estimate the timing of selection is with the population model. Use the wrong model, and you get the wrong time. That is probably what has happened here.

    Also, using this weird population model vastly increases the chance that genetic drift could cause large frequency changes in Tibet or China. This makes us much less likely to recognize genes that really have been subject to selection in either population. With respect to EPAS1 the test is conservative, but the genome-wide comparison will miss a lot of genes and give less significant p-values to others. It's a waste, because it means that we have to collect that much more data to get the same result.

    UPDATE (2010-07-06): Rasmus Nielsen has written me to clarify his remarks to the Times and give more information about the demographic model in the paper. I have posted his full remarks along with some comments of my own. It is well worth reading.


    References

  • Bending spy planes

    Fri, 2010-03-26 08:30 -- John Hawks

    An unexpected source of decompression sickness: the U-2 spy plane.

    As the number of flights increases, some of the plane’s 60 pilots have suffered from the same disorienting illness, known as the bends, that afflicts deep-sea divers who ascend too quickly.

    Relaxing recently in their clubhouse at Beale Air Force Base near Sacramento, Calif., the U-2’s home base, several pilots said the most common problems are sharp joint pain or a temporary fogginess.

    I lecture every year about high-altitude adaptation in humans, but this is a nice example of the opposite!

Subscribe to altitude

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.