Fast selection in high altitude, but how fast?

Did the altitude of the Tibetan plateau lead to the fastest instance of human adaptation yet known?

That's the claim in the new paper by Xin Yi and colleagues Yi:2010:

Given our estimate that Han and Tibetans diverged 2750 years ago and experienced subsequent migration, it appears that our focal SNP at EPAS1 may have experienced a faster rate of frequency change than even the lactase persistence allele in northern Europe, which rose in frequency over the course of about 7500 years (26). EPAS1 may therefore represent the strongest instance of natural selection documented in a human population, and variation at this gene appears to have had important consequences for human survival and/or reproduction in the Tibetan region.

I have a significant criticism of that conclusion, but first I want to say I think this is really cool work. They sequenced 50 whole exomes of people of Tibetan ancestry. An exome is the coding fraction of the genome, leaving out the non-coding stuff. This let them do a genome-wide association including every SNP they found. As it turns out, the key gene (EPAS1) has no coding SNPs that differentiate strongly in these samples. It's an intronic SNP that shows a really large frequency difference (87% in Tibetans, 9% in Han Chinese). That's a really big difference.

And it takes a big difference to test neutrality in this sample. Fifty exomes is a whole lot of sequencing, but it's really a small sample for finding selection. It takes a really big frequency change to exceed chance. Besides that, most new adaptive mutations will be missed because they haven't gotten off the ground yet. Finding one major allele that correlates strongly with population, and then doing the work to show its association with red blood cell production, that's all pretty neat stuff. This paper should be added to the paper last month by Cynthia Beall and colleagues Beall:2010, who also found an association with Tibetans and made a functional link with high altitude adaptation. This gene is part of the system that adapts people to hypoxia in the Tibet/Nepal area, although it certainly does not act alone and we don't yet know how the system works. It's a solid first step.

OK, so what's my problem with the paper? Hypoxia is a strong selective agent, affecting performance, health, and -- maybe most important -- birth weight. As soon as people began living on the Tibetan Plateau, they were in a compromised environment. That makes this a really great example of recent selection associated with a novel environment. But the archaeological evidence suggests that people have been living in this environment for a lot longer than 3000 years. The population model in the paper is a mess.

People have been living on the Tibetan Plateau for more than 15,000 years. They may have occupied the area intermittently before the Last Glacial Maximum, and certainly were in nearby medium-altitude areas of northwestern China before that time. The Paleolithic-era occupation of northeastern highland Tibet was reviewed by Madsen and colleagues Madsen:2006 and Brantingham and colleagues Brantingham:2003. Aldenderfer Aldenderfer:2007 reviewed what is known about Neolithic-era occupation of highland Tibet. Sites with ceramics, evidence of sedentary village occupation and domesticated animals occur between 4000 and 6500 calendar years B.P. That doesn't mean that today's Tibetan population derives entirely from these early Neolithic settlers or the even earlier Paleolithic occupants. But the archaeological record does show that the opportunity for genetic adaptation would have been present long before 3000 years ago.

So there's a potential inconsistency. The inconsistency could be resolved by recognizing that selection is stochastic. Selection cannot start changing the frequency of an allele until after the mutation has occurred.

The following passage comes from Nicholas Wade's account of the research, in the NY Times. Wade also picked up on the problem with the demography in the paper, and probed the authors about it:

Geneticists have a more elastic view of dates than do archaeologists, and the estimate of a Han-Tibetan population split at 3,000 years ago could probably have been adjusted to 6,000 if the geneticists had taken any account of any other kind of evidence.
Rasmus Nielsen, a Danish researcher at the University of California, Berkeley, did the statistical calculations for the Beijing study. We feel fairly confident that something on the order of 3,000 years is correct, he said. But in a later e-mail message, Dr. Nielsen said, I cannot with confidence rule out that the divergence time is 6,000 instead of 3,000.
There is similar flexibility in the estimates of population sizes. The Beijing team calculates that at the time of divergence there were only 288 Han Chinese and 22,642 Tibetans. These estimates have bewildered archaeologists, given that rice cultivation in southern China started 10,000 years ago and that there was an extensive civilization by 3,000 years ago. Dr. Nielsen said that the figure of 288 people was meant simply to indicate a bottleneck in the Han population, meaning a time when it was very small, and that this bottleneck could just as easily have occurred 10,000 years ago.

I think that's totally remarkable. "Geneticists have a more elastic view of dates than do archaeologists"! I think that phrase should be framed and hung in every classroom teaching anthropological genetics.

Look at the expansion model. In what universe were there only 288 ancestors of Han Chinese people in the last 3000 years? We're talking about the late Bronze Age, here! This is just after the end of the Shang Dynasty, whose capital at Anyang had a walled area of 1000 hectares. That's 1000 soccer pitches full of city, within an empire that spanned the northern half of China.

It is completely lame to claim that the model could represent a bottleneck as long ago as 10,000 years. You see, the size of the population determines the rate of differentiation under genetic drift. If the population was big, it shouldn't have changed very fast, so the present populations shouldn't be very different. Putting it into numbers, if there hasn't been a bottleneck for 10,000 years, then the divergence must be a lot older than 3000 years. Probably older than 10,000 years.

These hypotheses can be tested directly with genetics, and the data are certainly rich enough now to do it. If they point to a genetic bottleneck in China during the last 10,000 years, we should be very, very surprised. Because then who was farming all the millet and rice, and domesticating pigs?

Does it matter? For EPAS1, the timing really doesn't affect the interpretation of selection -- there's no way that drift made the populations as different as they are for this one locus. But it seems clear that this is not a new mutation because it has no long, linked haplotype around it that also differs in frequency in the two populations. Selection on a standing variant is indeed newsworthy, as these are hard to find. Since we don't have a long haplotype to date, the only way that we can estimate the timing of selection is with the population model. Use the wrong model, and you get the wrong time. That is probably what has happened here.

Also, using this weird population model vastly increases the chance that genetic drift could cause large frequency changes in Tibet or China. This makes us much less likely to recognize genes that really have been subject to selection in either population. With respect to EPAS1 the test is conservative, but the genome-wide comparison will miss a lot of genes and give less significant p-values to others. It's a waste, because it means that we have to collect that much more data to get the same result.

UPDATE (2010-07-06): Rasmus Nielsen has written me to clarify his remarks to the Times and give more information about the demographic model in the paper. I have posted his full remarks along with some comments of my own. It is well worth reading.