My post about the Tibetan high altitude selection story last Friday summarized the research and included some criticism of the demographic model applied in the paper by Yi and colleagues. This weekend, I had some correspondence from study coauthor Rasmus Nielsen.
Nielsen was kind enough to provide a lot of information about how they arrived at their demographic model. Also, his comments are of substantial interest as a perspective on science journalism. I have posted them in their entirety, and have added my own perspective below them. Click through to read on: Nielsen:
I read your blog on the EPAS1 gene. You write that my answers to Nicholas Wade in the NYT article are lame. I couldn't agree more. Reading the quotes Wade put together from a long phone interview and two replies to follow-up requests by email for further information - I could get quite convinced about my lameness myself. Let me give you our side of the story: (1) Regarding effective population size estimation: we fit several different demographic models to the data. The best fitting one according to the Akaike information criterion was chosen in the paper to use for the coalescence simulations. But notice that we made no strong claims about population sizes in the paper. They appear in the supplementary information to ensure that other people could reproduce our study. The main objective for fitting a demographic model was to allow us to perform coalescence simulations under a model that fit the data well. The model described in the paper fits the data very well and was the best fitting model we could find. As such - it was our best option for how to calculate p-values - and was certainly, in our opinion, better than providing no p-values, or use p-values based on some simpler model that did not fit the data. Had we used another model with different values of Ne, we would have obtained less accurate p-values. However, we did not interpret the effective population size estimates strongly - mostly because we do not believe they have very much to do with census population sizes. I would argue that this is true for both this study and other similar studies on other populations. Estimated effective population sizes are not only a function of changes in population size, natural selection, male/female ratios and variance in offspring number. They also rely on the structure of the populations. A population organized into many small sub-populaiton might have an Ne that is substantially larger than N, while a population without sub-structure might have a much smaller Ne than the census size if there has been fluctuations in the population size or higher variance in offspring number than that expected from a Poisson. Therefore, it is wrong to interpret estimates of Ne as estimates of actual number of individuals - or to believe that there is some simple general relationship between effective population size and true number of individuals. For this reason, we did purposefully not provide an interpretation of the estimates of Ne in terms of actual values of N and I feel that our work is not being represented accurately by arguing that we obtained estimates of the number of Han individuals or Tibetans living 3000 years ago. That does not mean that we cannot try to understand why we get such a small Ne for Hans 3000 years ago and such a large estimate for Tibetans. The most likely explanation for the Hans is that there have been other bottlenecks that we have not modeled - before or after. If we estimate Ne for Europeans today using a model that does not take all the bottlenecks into account, we get estimates of about 5-15,000 individuals. I don't think anybody would claim that there are only 5-15,000 Europeans alive today. Similarly, our estimate for Ne for the Hans 3000 years ago is in the hundreds presumably because there were some previous bottlenecks that we have not modeled. Ancestral bottlenecks can be extremely hard to date from frequency spectrum data - and you end up getting the same likelihood for a long time period with small population sizes and a short time period with extremely small population sizes. The have been several published papers making this point, the first one I believe to be Adams and Hudson. 2004. Genetics 168:1699-171. Changing our model to having a larger population size 3000 years ago but with an appropriately modeled preceding bottleneck would produce more or less the same p-values - because it would produce the same expected frequency spectrum (or at least something very similar). Regarding the large Tibetan population size, it may likely be affected by population structure within Tibet and/or by admixture with other individuals. Both of these factors would inflate the estimate of Ne. We did try some other models - but ended choosing this particular model because if fit the data the best. It seemed, therefore, most appropriate for the coalescence simulations. Again, I want to emphasize that we did not attempt to estimate number of individuals living in particular places during particular times - we were interested in finding a model which fit the distribution of allele frequencies well so that we at least could make some attempt at estimating relevant p-values. We never claimed that there were just a few hundred Han individuals alive 3000 years ago - in the same way that we are not arguing that there are only 5-15,000 Europeans alive today. (2) Regarding the divergence time: none of the models we fitted could explain the data with a divergence time much larger than 3000 years. If you look at the figure in the paper, you can see that there is an extremely strong correlation between the allele frequencies in Hans and in Tibetans. This is very difficult to explain with a long divergence time of genetically separated populations. To maintain such a strong correlation for a large amount of time, the Tibetan population (and the Han population) would have to be enormously large - and this is incompatible with the observed levels of variation in the population. We could not find a model that fit the data and which included a large divergence time no matter what we did. But there are of course many factors going into these estimates - including a calibration of number of mutations with the chimp, a number of demographic assumptions, and assumptions regarding generation times. If we are making errors on these assumptions - the estimates could change in one way or another. For that reason I feel it is most conservative to avoid arguing that our analysis definitely rejects that the divergence time could be 6000 years. The main objective of the paper was after all to investigate the evolution of altitude adaptation. The demographic analysis was there mostly to allow us to do the coalescence simulations - but we also used them to make the argument that this selection has occurred quite recently - and not say 10k or 20k years ago. It is quite clear from the data that such long divergence times cannot be supported by the data This being said, we of course want to know if this short genetic divergence time is compatible with other evidence. I would argue that it is. There has been several migrations into Tibet. It is entirely compatible with the archaeological record that individuals living in Tibet today genetically mostly are descendants of migrants arriving around 3000 years ago even though the first migrants appeared much earlier. In terms of the selection - and when it has been acting - we want to determine when selection acted to increase the frequency on EPAS1 mutations in the ancestry of the individuals living in Tibet today. If they are genetically descendants of individuals migrating into Tibet just a few thousand years ago - then this is the relevant data for describing when selection has been acting on the EPAS1 mutations. As an aside I should also say that this has nothing to do with when the mutation(s) arose. Selection has in this case most likely been acting on standing variation. You argue in your blog that more could be done with this data in terms of demography. We agree. The paper was about altitude adaptation not demography. We are still working on the data and are hoping to produce a follow-up paper on the demographic analyses. We weren't sure how much interest there would be in the results - but the interest from you and other people in this is certainly a motivation to keep working on it as hard as possible. I hope you will post this reply on your blog and comment on it. If you do so - I would ask that you post it in its entirety. I learned a lot from the interview with Wade. I certainly now understand why politicians keep giving the same 2-line reply over and over again to journalists asking them questions. If a journalist talks sufficiently long with an interviewee - it will be possible for them to find some sentences that they can put together in some way to make the interviewee look foolish - if that's what they want to do.
Thanks so much for writing with this! I will of course post your comments, and I appreciate very much the time you spent detailing the work, especially on a holiday weekend.
What you’ve written here basically agrees with my take on the text of your paper; the demographic model is useful as a test because it is conservative, it is not an attempt at population history. I’ve reviewed effective size at some length [readers can find a review that I wrote, and I can forward reprints on request]. As you write, this study does not differ substantially from many others in the use of effective size estimates.
As an anthropologist I am very concerned at the proliferation of population models that are nonsensical from a demographic standpoint. Yes, the p-value will be much the same for EPAS1, but the model is hugely conservative with respect to anything with less extreme differentiation. Other studies are essentially alike; lowball demographic numbers are useful in their conservatism but give an incorrect view about the relation of demography and selection.
Besides, you have to consider the mechanism by which the best-fit model has come to be so extreme. As you note, the effective size estimated under the assumption of neutrality actually will reflect the non-neutral dynamics across the exome. The HapMap doesn’t give rise to anything like the model of an extreme and recent bottleneck that the exome data yield, yet of course both these genome-wide sets must have undergone the same demography. The difference is that the exome is limited to the coding fraction of the genome, pointing to selection on some (probably large) fraction of coding loci. The small effective size within the last 3000 years is mathematically equivalent to a statement that the data include genealogies with many coalescences in those 3000 years. Again, this doesn’t happen in a population of hundreds of thousands of individuals unless there was rapid selection.
So it seems to me that the data must reflect the high incidence of recent selection within mainland China. This is exactly what we expect based on the real demography of massive population growth across the same interval and adaptation to post-agricultural ecologies. Although the headline of the paper is about high altitude adaptation in Tibet, the real story is the massive selection in China of other genes.
If this is correct, then I think there is much promising work to do by using real demographic estimates. Deriving the demographic model from the data themselves is really just throwing away useful information that is abundantly documented archaeologically and historically.