Two recent bottleneck studies


Marth, Gabor, et al.. 2003. "Sequence variations in the public human genome data reflect a bottlenecked population history. Proceedings of the National Academy of Sciences, USA 100:376--381. PubMed

Marth, G. T., E. Czabarka, J. Murvai, and S. T. Sherry. 2004. "The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations." Genetics 166(1):351--372. PubMed


These two studies by Gabor Marth and colleagues are attempts to test hypotheses of demographic history from genome-wide surveys of human single nucleotide polymorphism (SNP) data. The power of these data are that they are widely available--they are dispersed throughout the genome and they have been taken from standard panels of populations for the purposes of localizing polymorphisms linked to disease expression. This means that they are probably the single largest source of evidence about human genetic variation at this point. However, they are also rather limited in scope--since they are surveyed for non-demographic purposes, the data present certain problems with ascertainment bias, potential non-independence, and lack of knowledge about their evolutionary dynamics that are unique. Marth and colleagues have taken pains to develop modes of analysis that correct for these problems to the extent possible. In other words, they apply certain corrections to the data to make it consistent with the assumptions of the Fisher-Wright population model.

The first of these studies examined the density of SNP candidates in different genomic regions as an indication of the frequency spectrum of mutations in the human genome. The second study went further by examining SNP's in three separate populations, including European, Asian, and African-American samples. Thus, the first study is informative only in a global context, as some kind of average of what may have happened to all populations, while the second study differentiates different populations from each other in terms of the frequency spectrum of mutations.

According to Marth et al. (2003), the SNP data are consistent with a global population bottleneck, dating to around the time of the Upper Paleolithic. This overstates their conclusions somewhat, because the support for the bottleneck came entirely from the assumption of a particular degree of recombination among the sites they examined. This assumption might be accurate, but it is notable that without it there was no support for a bottleneck. And interestingly, the bottleneck model that fit the data best included a net decline in the human population rather than an expansion. In other words, this was not a bottleneck with a subsequent exponential growth of the human population (as we know has been happening recently). This was a bottleneck followed by growth to a smaller level than before the bottleneck. The the best fitting bottleneck had an onset at 1,600 generations (or approximately 40,000 years) ago, and a release at 1,200 generations (30,000 years) ago. No confidence limits for these estimates were provided in the study.

The second study (Marth et al. 2004) provided best-fit models of population history for the three separate population samples. For African-Americans, the best-fit model was a simple expansion of population size, from around 10,000 to around 18,000 some 7,500 generations (187,500 years) ago. For Asians, the best-fit model was a bottleneck from 10,000 to 3000 individuals at 3,800 generations (95,000 years) ago, with an expansion to 25,000 individuals some 80,000 years ago. For Europeans, the bottleneck from 10,000 to 2,000 began 3500 generations (87,500 years) ago and ended 3,000 generations (75,000 years) ago with an expansion to 20,000 people. A model of confidence interval is presented for the European sample, which illustrates the relationship between the severity of the bottleneck (i.e. how many people there were then) and its duration. Simply put, a longer bottleneck can have a larger population and still fit the data (because of the longer time for an effect), while a more severe bottleneck can be shorter and have the same effect.

There is little point to dissecting these values, since they are simply best-fit numbers under the assumptions in the studies. It is worth pointing out the large differences between the two studies. Notice in particular the discrepancy in the timing of the putative bottlenecks, especially considering most of the sample used in the first study, taken from public genome datasets, must probably represent Americans of European descent. Why they do not have a population history more similar to the Europeans in the second study is unexplained (although they do note the discrepancy on page 362 of the second study). It is also interesting that the bottleneck in the second study is so ancient. Presumably if it actually reflects the population dynamics within Europe 80,000 years ago, then it is reflecting the population history of Neandertals! Or maybe the Levantine ancestors of later Europeans were facing population pressure when Neandertals moved south during the Würm glaciation to occupy Kebara and Amud caves? Whatever is the case, the numbers are certainly strange.

The reasons for the strangeness of these numbers are what interest me about the papers, and it is to these issues that I devote some more attention.


Both studies assume a model for population histories in which a population may either grow or shrink at discrete times in the past. These times divide the entire population history into "epochs." For example, if the population never changed in size, the history would be a "one-epoch" population history, because all of time would be described by a single population size. If the population changed in size (grew or shrank) at a single time in the past, then it has a two-epoch history, reflecting the population size before and after the event. A two-epoch history is a three-parameter model, because three separate values must be known to predict the genetic characteristics of the population: the size before the event, the size afterward, and the time the event happened. Certain values for these three independent parameters may alter the expected diversity and frequency spectrum of mutations in populations with such histories.

A test for a past expansion of population size is, then, a statistical power test of the hypothesis that the best three-parameter model is significantly better than the best-fitting one-parameter (one-epoch) model. A test for a past bottleneck takes this one step further. A bottleneck is a three-epoch model, with an ancient large population size crashing at one particular time in the past, then at a later time expanding to another, larger size again. This model has five parameters: three for the population sizes in each of the three epochs and two for the times of the population crash and subsequent growth. Testing for a past bottleneck is the test of whether the best five-parameter model fits the data significantly better than both the best three-parameter and the best one-parameter model.

You may notice that even a three-epoch (five-parameter) model is probably not very much like the actual behavior of ancient populations. A natural population decreases and increases in size on a generation-by-generation basis. There may have been large-scale changes in the past, but populations do not crash instantly at one time, or grow instantly. Instead, they grow gradually and fitfully, perhaps geometrically or perhaps not. In modeling terms, the demography of a natural population would require as many epochs to describe as there have been generations in its history, or even more.

But genetic data do not preserve evidence of every generation in a population's history. Most of these possible pleuriepochal histories are very similar to each other--so similar as to be indistinguishable. And genes are actually very weak discriminators, so that a three-epoch model is about as far as we can expect them to be informative. So the question we can pose is ultimately a very simple one: is the history of the population more similar to a constant population size, a single expansion, or a bottleneck?

What hypothesis are we testing?

But consider for a moment the opposite corollary: the fit of a model must get better as we add parameters. The statistical test asks whether the additional parameters make the fit significantly better, so that we don't automatically accept a more complicated model when a simpler one matches as well as we could expect it to. But in this case, our models are only representative of demographic processes, and not any other factor that may have affected human genes in the past. What if one of these other factors significantly reduced the fit of the one-epoch model? Clearly the two-epoch model would be a better fit, and might even be a significantly better fit if the unknown factor had a similar effect to a change in population size in the past.

There are good reasons to think that exactly this situation might affect human genetic variation. For example, natural selection on genetic loci can produce a similar frequency spectrum of mutations as population growth in the past. If we have a sample of genes including some that have been under positive selection in the past, then a two-epoch model may fit the distribution of variation much better than any one-epoch model entirely because of this history of selection. The magnitude of the effect of selection depends on the number of genes that have experienced selection and the pattern of that selection, but it wouldn't take very many to make a two-epoch model a better fit.

But a two-epoch demographic model obviously does not describe the effects of selection perfectly. What happens if we add another parameter? Presently, nobody knows the answer to this question. I speculate that it might very easily happen that a three-epoch model would significantly better fit a population with a combination of selective effects on different genes, because some of the genes would appear entirely unaffected by selection (these are the ones that look like their variation survived a "bottleneck") and other genes would be highly affected (these genes would look like variation had been lost during the "bottleneck"). But this is just speculation.

The real issue is that adding parameters is very misleading if the assumptions underlying the models cannot be rigorously verified. In the case of demographic models about the past, the biggest assumption is selective neutrality. This assumption is necessary to using genes to test demographic hypotheses, because only genetic drift and mutation among the evolutionary forces have effects that are strongly linked to the size of the population. But we know that many genes were not neutral.

Presently, most molecular geneticists do not take this concern seriously. Marth et al. (2004, p. 363) consider the issue as follows:

We must also acknowledge that the current shape of human variation structure is the result of a combination of neutral and nonneutral (selective) forces. The current state of the art in recognizing the effects of selection in variation data has been reviewed recently (Bamshad and Wooding 2003). Positive selection resulting in genetic hitchhiking can mimic the effects of population expansion in that it gives rise to an excess of low-frequency alleles (Kaplan et al. 1989; Braverman et al. 1995). Recent efforts have been aimed at detecting loci that exhibit signatures of positive selection (Cargill et al. 1999; Sunyaev et al. 2000; Akey et al. 2002; Payseur et al. 2002). However, the exact proportion of genes that have been targets of strong positive selection within our evolutionary past is unclear (Bamshad and Wooding 2003). It is also unclear, in general, how far the effects of hitchhiking extend beyond the locus under selection (Wiehe 1998). Given that only a few percent of the human genome represents coding DNA, and that not all genes are expected to be targets of positive selection, we speculate that the distortion due to selective forces on the AFS in our data set of >20,000 randomly selected genomic loci is small when compared to the global effects of drift modulated by long-term demography.

Basically boilerplate in studies like this one for, "we know our assumptions are not entirely accurate, but we think it doesn't matter too much." But does it? When studies vary so widely in their estimated demographic parameters, what reason should we logically adduce to explain the results?