exponential growth

Handling exponential growth in demographic models

Exponential growth is a feature of current human populations, and was may represent how the human population behaved during some episodes of its demographic history. However, "exponential" can mean different things to different people, if you're not used to thinking mathematically about growth. So I need to lay out some definitions:

Natural selection 101. Episode 1: The miracle of compound interest

--Originally posted August 24, 2007.

Once upon a time, somebody probably told you that biologists don't need to know any calculus. Well, I suppose they were right: it is certainly true that most biologists don't use any calculus in their work. A purely practical biologist is like a purely practical banker -- as long as the computers do their jobs, why does anybody need to know how to calculate?

Still, there is some point to knowing the theory that underpins the study of life. Math gives the theory its power. Understand the math, and you can unleash that power to find answers to new problems.

During the last year or so, I have written nothing here about natural selection, quite purposively, even though anyone who knows me at all can tell you I hardly talk about anything else. Well, I tend not to write about what I'm working on; especially when it involves other people's observations as well as my own. I don't like it that way, but sometimes it's necessary. It especially stings when the major news in biology is that the world has changed to make selection relevant again. Still, to do my part in this change, I've maintained a respectable silence.

Over this time, I have learned many mysteries about Darwin's force. Most geneticists approach natural selection as a kind of black magic. You see, find the right pattern of selection, and you can explain almost anything. You might think this is a desirable quality in a scientific hypothesis, but many people don't see it that way. Selection, in their view, is too often unfalsifiable. It's too hard to disprove. And besides, some things really do happen by chance alone. We have to give random chance at least a fair shot as an explanation, and if you can't disprove genetic drift (so the story goes), then you don't need to invoke selection.

Besides, genetic drift is a much happier, friendlier hypothesis than selection. If somebody dies by genetic drift, it's nobody's fault. "Ooops, just a spot of bad luck, there! Move along, nothing to see here." By contrast, selection thuggishly entails that deaths and births have causes. For some reason, the idea that something should have a cause is offensive to some biologists. That is, after all, the point of The Spandrels of San Marco: Adaptationism, the assumption that phenotypic "traits" have discrete (and identifiable) causes, is a metaphysical assumption, not a tenet of Darwinism. Even those biologists who don't conform to the philosophy of narrow adaptationism, as described by Gould and Lewontin, have often felt the sting of the word; a real scarlet "A" for their dossiers.

Perhaps more to the point, you can learn the essentials about genetic drift with a bit of algebra. Drift in a constant population is a linear process, and drift in non-constant populations can generally be approximated by linear modifications to the case of constant size. In contrast, natural selection is a logistic process, and understanding it requires differential equations.

A combination of philosophy and calculus. You can see how selection got its reputation as black magic.

Darwin's non-mathematical math

The foundations of Darwinism are economic. This should not come as a shock: Darwin took his inspiration from Thomas Malthus, who formalized the idea that the geometric growth in population would outstrip resources that grow at a linear rate. That's math -- math that Darwin found compelling and used as the basis for his concept of natural selection. Here's a passage from page 47 of "On the Variation of Organic Beings in a state of Nature":

It is the doctrine of Malthus (1826) applied in most cases with tenfold force. As in every climate there are seasons, for each of its inhabitants, of greater and less abundance, so all annually breed; and the moral restraint which in some small degree checks the increase of mankind is entirely lost. Even slow-breeding mankind has doubled in twenty-five years; and if he could increase his food with greater ease, he would double in less time. But for animals without artificial means, the amount of food for each species must, on an average, be constant, whereas the increase of all organisms tends to be geometrical, and in a vast majority of cases at an enormous ratio. Suppose in a certain spot there are eight pairs of birds, and that only four pairs of them annually (including double hatches) rear only four young, and that these go on rearing their young at the same rate, then at the end of seven years (a short life, excluding violent deaths, for any bird) there will be 2048 birds, instead of the original sixteen. As this increase is quite impossible, we must conclude either that birds do not rear nearly half their young, or that the average life of a bird is, from accident, not nearly seven years. Both checks probably concur. The same kind of calculation applied to all plants and animals affords results more or less striking, but in very few instances more striking than in man.

Darwin sat on this expressly mathematical insight for nearly twenty years, until Alfred Russel Wallace arrived at it independently. Wallace sent Darwin his manuscript, Darwin forwarded it to Charles Lyell, and Lyell arranged the remarkable double publication of Darwin's and Wallace's essays in the Journal of the Linnean Society. Wallace's essay contains a very similar section to Darwin's quoted above -- the observed birth rate of animals should lead to geometric growth, yet this is impossible except over the shortest time span, so the natural check on population growth must cause competition and selection of traits favorable to survival.

Math-avoiding biologists have a true hero in Darwin, who -- even allowing for his characteristic nineteenth-century modesty -- was profoundly self-conscious about his failure to master algebra. In an autobiographical chapter of the collected papers edited by his son Francis, Charles Darwin himself describes his resignment about math:

I attempted mathematics, and even went during the summer of 1828 with a private tutor (a very dull man) to Barmouth, but I got on very slowly. The work was repugnant to me, chiefly from my not being able to see any meaning in the early steps in algebra. This impatience was very foolish, and in after years I have deeply regretted that I did not proceed far enough at least to understand something of the great leading principles of mathematics, for men thus endowed seem to have an extra sense. But I do not believe that I should ever have succeeded beyond a very low grade (Darwin 1887:46).

So it is ironic that Darwin's greatest insight was so expressly mathematical. The force of natural selection emerges from the necessary conflict between the potential of geometric population growth and the constraint of limited resources. The conflict arises from excess reproduction itself, for if many are being born but the population still does not grow, then we can infer that just as many must die. Wallace's essay makes this point crystal clear, after considering that birds produce four or more offspring per year:

A simple calculation will show that in fifteen years each pair of birds would have increased to nearly ten millions! whereas we have no reason to believe that the number of the birds of any country increases at all in fifteen or in one hundred and fifty years. With such powers of increase the population must have reached its limits, and have become stationary, in a very few years after the origin of each species. It is evident, therefore, that each year an immense number of birds must perish — as many in fact as are born; and as on the lowest calculation the progeny are each year twice as numerous as their parents, it follows that, whatever be the average number of individuals existing in any given country, twice that number must perish annually,—a striking result, but one which seems at least highly probable, and is perhaps under rather than over the truth (Wallace 1858:55).

Many historians of science have found it very meaningful that the two men independently arrived at this formulation. It suggests that the idea of natural selection was in some sense "ripe" -- that the tenor of the times made science ready for Darwinism.

Maybe so. But this "zeitgeist" argument misses an important point: this mathematical theory went without any mathematical description for over fifty years.

To some extent, this lack of development can be blamed on the lack of a satisfactory theory of inheritance. When the mathematical development of a theory of natural selection was finally advanced by Haldane and Fisher, they had Mendelism to build it upon. If inheritance had turned out not to be Mendelian, a mathematical description of selection would likely have been harder. It is plausible that an earlier acceptance of Mendelian inheritance would have led to an earlier population genetic theory -- it certainly didn't take very long after Mendelism was rediscovered for G. H. Hardy and Wilhelm Weinberg to describe its statistical foundations (Jim Crow described the context of these discoveries in a 1999 perspective piece).

Demography and selection

Still, I don't find the lack of a gene theory to be a very satisfactory explanation. There is nothing genetic about Darwin's and Wallace's logic. Both men posed the problem in exclusively demographic terms. Certainly, both assumed that characters are inherited in some way, because without inheritance, natural selection would be impossible. But they were content to refer to the competition between varieties, which itself is quite sufficient as a basis for a theory of selection. The replacement of one variety by another shares a common demographic basis as the replacement of one gene by another.

In other words, Darwin's and Wallace's description of selection emerged from facts about demography, not inheritance. Both Darwin and Wallace make clear that selection depends on the conditions of existence -- it may be abated when resources are abundant, and it may intensify when populations decline. These demographic conditions could have been easily modeled along the lines that both Darwin and Wallace suggested. The essential facts are all there in the 1858 papers: when populations shrink, varieties that gain resources less effectively may disappear, and when populations grow, more fecund varieties will replace less fecund ones. This is the distinction between survival and fertility selection, already present in Darwin and Wallace.

We can imagine an alternative history in which these insights were rapidly developed into a demographic model of selection. Mathematical models of demography were not only available at the time Darwin and Wallace wrote, they were the advancing frontier of social science. Mathematical descriptions of demography became important in the 1800's for the same reason they remain important today: actuarial predictions. In the 1820's, Benjamin Gompertz considered the effects of changing mortality, while the logistic model had been formulated by Pierre Verhulst as early as 1838. Both models presented substantial refinements of Malthus' conception of geometric growth, including the very thing Darwin and Wallace most needed: a description of an equilibrium. For that matter, Euler developed a true age-structured model of population growth in 1760! When we consider that the demographic model of natural selection is entirely pre-Darwinian, the possibility of an earlier development of theoretical population genetics seems quite plausible.

Such speculations are something like steampunk, that narrow corner of fiction that supposes Babbage had really built his Difference Engine No. 2, and imagines what would have happened next. But there is a point to it: Nineteenth-century demography was already well-equipped to incorporate selection. Doing so may at the least have jump-started epidemiology, which could have made much of good actuarial records. Tracking thousands of people was already undertaken by governments. On the other hand, the development of genetics required somebody to track thousands of flies, and that wouldn't happen for a while. Still, a good demographic theory of selection might have been incorporated into developmental biology, giving Mendelism a run for its money.

So why didn't any biologist realize the potential of such modeling for understanding evolution? I can't find any historians of science who have considered this question, but we have some hints. Darwin and Wallace changed the direction of biology, but not its main research approaches. The nascent study of embryology and morphology, what we now would call "evolutionary developmental biology," was not based on demography, and had a radically different conception of possible mathematical descriptions of change. This may also account for the failure of biology to recognize the importance of Mendel's work -- another example of the power of algebra.

Another reason for the tardy mathematical development: Rather than limiting themselves to a simplistic reductionist approach, biological theorists immediately tried to take in the full scope of nature in their evolutionary explanations. Haeckel was well known for this tendency in comparative biology -- he had to subsume every aspect of morphology into his Biogenetic Law. But the problems of demography could be equally baffling, if not reduced into a consideration of a single species at a time. For example, Alfred Lotka (1925:62) quotes this passage from Herbert Spencer's First Principles:

Groups of organisms display this universal tendency towards a balance very obviously. In § 85, every species of plant and animal was shown to be perpetually undergoing a rhythmical variation in number -- now from abundance of food an absence of enemies rising above its average, and then by a consequent scarcity of food and abundance of enemies being depressed below its average. And here we have to observe that there is thus maintained an equilibrium between the sum of those forces which result in the increase of each race, and the sum of those forces which result in its decrease. Either limmit of variation is a point at which the one set of forces, before in excess of the other, is counterbalanced by it. And amid these oscillations produced by their conflict, lies that average number of the species at which its expansive tendency is in equilibrium with surrounding repressive tendencies. Nor can it be questioned that this balancing of the preservative and destructive forces which we see going on in every race must necessarily go on. Since increase of numbers cannot but continue until increase of mortality stops it; and decrease of number cannot but continue until it is either arrested by fertility or extinguishes the race entirely (Spencer 1867:502).

Spencer and others were not content with describing what happened to a single population, because the dynamics of one population obviously depend on the populations of other species -- predators, competitors, and prey. An equilibrium between "expansive and repressive" forces required a consideration of those other species. Interestingly, Lotka quoted this passage in the context of providing just such a complicated model -- a system of equations modeling the interactions of an entire community of species.

Demographic modeling would not make an impact on evolutionary theory until after 1900. Much of the revival was due to Lotka, who not only developed a continuous version of the Euler age-structured equation for population growth, but also extended the work of Vito Volterra to account for predator-prey relationships. Verhulst's logistic model was revived in 1920 by Raymond Pearl and Lowell Reed to describe the growth of the U.S. population.

By this time, the first population geneticists, including Haldane, Fisher, and Wright, were ready to think about the demographic foundations of natural selection. Fisher showed how Mendelian genes could explain the variation in quantitative traits. Haldane showed how an advantageous gene would behave in a population. And then, in rapid order, Fisher demonstrated the essential connection of natural selection to demography.

Compound interest

Most descriptions of natural selection begin with Mendelism, and follow Haldane's formulation of the replacement of a deleterious allele by an advantageous one. Certainly there is merit in this approach, but it's not especially Darwinian. Haldane's model is surprisingly complicated in its mathematics -- no doubt to the consternation of many would-be population geneticists. Moreover, its assumption of a static population bears little resemblance to the continuous demographic flux described by Darwin and Wallace.

So I'm going to do something very different. Instead of beginning with Haldane, I'm going to start with Fisher's demographic model. Fisher's model is based on the Euler-Lotka equation, and it is often overlooked by geneticists -- in fact I've never seen it in any population genetics text other than Gillespie's. But it is the foundation of life history theory and led directly to Hamilton's insights about strategy variants, later developed by Price and Maynard Smith. Plus, it takes a form that builds immediately upon the logic of Darwin and Wallace.

The essential insight is one that any nineteenth-century banker would understand: population growth is like compound interest.

A hundred dollars in the bank at four percent annual interest will grow to $104 in a year. In two years, you'll have $108.16. That's the initial $100 times 1.04 (104 percent) for one year, times 1.04 again for the second year.

A simple equation will give us that result: if t is the time in years, r is the rate of interest, and x0 is the original principal, then after t years the account balance will increase to:

x_t = x_0 * (1 + r)^t

Now, if you will have $104 in a year, how much will you have in your account in six months? Simply, if we allow t to equal one-half (0.5) in the equation above -- for half a year of interest -- we find that the right amount is $101.98.

The amount of interest in the first six months is different from that in the second six months -- and in general, the amount of interest in any period depends not only on the rate of interest but also the amount of principal at that instant. Banks generally simplify matters (to your slight disadvantage) by compounding interest only at long intervals of a month or more.

However, we can write these relations in another form that will make them much more useful to us. In the equation above, we can consider the term (1 + r)t as two parts: a base (1 + r) and an exponent (t). We may substitute a different exponent and base if we choose. In particular, if we substitute the base e, then the equation above may be written:

x_t = x_0 e^rt

The exponential base e is exceedingly handy. Transforming our growth equation into an exponential growth equation lets us examine change as an continuous process. What is k? The value of k that will satisfy the equation is k=ln(1 + r). It is often called the constant of proportionality -- it represents not the annual rate of change, but the instantaneous rate of change. For a four percent annual rate of interest, the value k ≅ 0.0392. In other words, a bank could pay our account 4 percent interest compounded annually by giving us the proceeds from 3.92 percent compounded continuously, and pocket the difference. It's not much of a margin, since r exceeds k by such a small amount. In fact, this amount is the interest on the interest earned continuously during the year.

The equation, xt = x0ekt, is a solution to the differential equation

dx/dt = kx

This equation says that the rate of change in x at each instant equals the product of k and x at that instant.

Malthusian population growth

Malthus translated this simple logic underlying compound interest to an insight about populations. To do this, he had to ignore all the complexities that would later be pointed out by Darwin and Wallace. True, the annual numbers of births and deaths within natural populations are always changing. Natural resources change, sources of food, enemies, diseases, and all of these cause fluctuations in the birth and death rates. But if we ignore these fluctuations, and assume that the birth and death rates are perfectly constant, then a population should behave just like a bank account. If the annual rate of births (per individual) is higher than the annual rate of deaths (per individual), then the population will grow according to the equations above. This kind of population growth is generally called Malthusian growth.

During the 1950's up to the 1970's, the human population of Earth grew by around 2 percent annually. Since that time the global population growth has been somewhat less, and the United Nations estimates that in the year 2000, the global population grew at an annual rate of 1.14 percent.

Biologists tend to measure time in generations rather than years. Anthropologists and geneticists often assume a generation length of 20 to 25 years, although these values vary in different populations. These times are intended to represent the average age at which people have children, but of course the actual times vary substantially. Why does all this variation matter? Well, for one thing, it's why we want to use a continuous model instead of a model that involves discrete generations. Since continuous means calculus, it's nice to have a reason for the effort!

In the end, we will do a bit better than this for a model of population growth, by directly considering the variation in the age at reproduction. That will take a bit more doing, which will come after a couple more episodes.

At the current annual rate of growth (1.14%), we can estimate the growth rate per 20-year generation as (1.14)20, or 25.4 percent. If this is the rate r per generation, we can estimate the constant of proportionality k as 0.226 per generation.

Clearly Malthus was right: over the long term, this kind of population growth is not sustainable. Indeed, over the very long term, no rate of population growth can be sustainable. And yet, over evolutionary time, no species that is incapable of long-term growth can survive: the inevitable consequence of an indefinite decline in numbers is extinction.

To examine natural selection, we will need a slightly more complicated model of demography -- one that combines the potential of growth with the fact that growth cannot continue indefinitely. In the next installment, we will see that model, and consider some of its distinctive predictions about the rate of change. These demographic conditions, as Darwin and Wallace saw, provide the context by which one variety may replace another.

References:

Crow JF. 1999. Hardy, Weinberg and language impediments. Genetics 152:821-825. Full text

Darwin C. 1858. On the tendency of species to form varieties; and on the perpetuation of varieties and species by natural means of selection. J Proc Linnean Soc Lond Zool 3:46-50.

Darwin C. 1868. The variation of animals and plants under domestication. 1 ed., vol. 1. John Murray, London.

Darwin F, ed. 1887. The life and letters of Charles Darwin, including an autobiographical chapter. vol. 1. London: John Murray.

Pearl R, Reed LJ. 1920. On the rate of growth of the population of the United States since 1790 and its mathematical representation. Proc Nat Acad Sci USA 6:275-288.

Spencer H. 1867. First principles. Williams and Norgate, London.

Wallace AR. 1858. On the tendency of varieties to depart indefinitely from the original type. J Proc Linnean Soc Lond Zool 3:53-62.

Two recent bottleneck studies

References:

Marth, Gabor, et al.. 2003. "Sequence variations in the public human genome data reflect a bottlenecked population history. Proceedings of the National Academy of Sciences, USA 100:376--381. PubMed

Marth, G. T., E. Czabarka, J. Murvai, and S. T. Sherry. 2004. "The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations." Genetics 166(1):351--372. PubMed

Conclusions

These two studies by Gabor Marth and colleagues are attempts to test hypotheses of demographic history from genome-wide surveys of human single nucleotide polymorphism (SNP) data. The power of these data are that they are widely available--they are dispersed throughout the genome and they have been taken from standard panels of populations for the purposes of localizing polymorphisms linked to disease expression. This means that they are probably the single largest source of evidence about human genetic variation at this point. However, they are also rather limited in scope--since they are surveyed for non-demographic purposes, the data present certain problems with ascertainment bias, potential non-independence, and lack of knowledge about their evolutionary dynamics that are unique. Marth and colleagues have taken pains to develop modes of analysis that correct for these problems to the extent possible. In other words, they apply certain corrections to the data to make it consistent with the assumptions of the Fisher-Wright population model.

The first of these studies examined the density of SNP candidates in different genomic regions as an indication of the frequency spectrum of mutations in the human genome. The second study went further by examining SNP's in three separate populations, including European, Asian, and African-American samples. Thus, the first study is informative only in a global context, as some kind of average of what may have happened to all populations, while the second study differentiates different populations from each other in terms of the frequency spectrum of mutations.

According to Marth et al. (2003), the SNP data are consistent with a global population bottleneck, dating to around the time of the Upper Paleolithic. This overstates their conclusions somewhat, because the support for the bottleneck came entirely from the assumption of a particular degree of recombination among the sites they examined. This assumption might be accurate, but it is notable that without it there was no support for a bottleneck. And interestingly, the bottleneck model that fit the data best included a net decline in the human population rather than an expansion. In other words, this was not a bottleneck with a subsequent exponential growth of the human population (as we know has been happening recently). This was a bottleneck followed by growth to a smaller level than before the bottleneck. The the best fitting bottleneck had an onset at 1,600 generations (or approximately 40,000 years) ago, and a release at 1,200 generations (30,000 years) ago. No confidence limits for these estimates were provided in the study.

The second study (Marth et al. 2004) provided best-fit models of population history for the three separate population samples. For African-Americans, the best-fit model was a simple expansion of population size, from around 10,000 to around 18,000 some 7,500 generations (187,500 years) ago. For Asians, the best-fit model was a bottleneck from 10,000 to 3000 individuals at 3,800 generations (95,000 years) ago, with an expansion to 25,000 individuals some 80,000 years ago. For Europeans, the bottleneck from 10,000 to 2,000 began 3500 generations (87,500 years) ago and ended 3,000 generations (75,000 years) ago with an expansion to 20,000 people. A model of confidence interval is presented for the European sample, which illustrates the relationship between the severity of the bottleneck (i.e. how many people there were then) and its duration. Simply put, a longer bottleneck can have a larger population and still fit the data (because of the longer time for an effect), while a more severe bottleneck can be shorter and have the same effect.

There is little point to dissecting these values, since they are simply best-fit numbers under the assumptions in the studies. It is worth pointing out the large differences between the two studies. Notice in particular the discrepancy in the timing of the putative bottlenecks, especially considering most of the sample used in the first study, taken from public genome datasets, must probably represent Americans of European descent. Why they do not have a population history more similar to the Europeans in the second study is unexplained (although they do note the discrepancy on page 362 of the second study). It is also interesting that the bottleneck in the second study is so ancient. Presumably if it actually reflects the population dynamics within Europe 80,000 years ago, then it is reflecting the population history of Neandertals! Or maybe the Levantine ancestors of later Europeans were facing population pressure when Neandertals moved south during the Würm glaciation to occupy Kebara and Amud caves? Whatever is the case, the numbers are certainly strange.

The reasons for the strangeness of these numbers are what interest me about the papers, and it is to these issues that I devote some more attention.

Parameters

Both studies assume a model for population histories in which a population may either grow or shrink at discrete times in the past. These times divide the entire population history into "epochs." For example, if the population never changed in size, the history would be a "one-epoch" population history, because all of time would be described by a single population size. If the population changed in size (grew or shrank) at a single time in the past, then it has a two-epoch history, reflecting the population size before and after the event. A two-epoch history is a three-parameter model, because three separate values must be known to predict the genetic characteristics of the population: the size before the event, the size afterward, and the time the event happened. Certain values for these three independent parameters may alter the expected diversity and frequency spectrum of mutations in populations with such histories.

A test for a past expansion of population size is, then, a statistical power test of the hypothesis that the best three-parameter model is significantly better than the best-fitting one-parameter (one-epoch) model. A test for a past bottleneck takes this one step further. A bottleneck is a three-epoch model, with an ancient large population size crashing at one particular time in the past, then at a later time expanding to another, larger size again. This model has five parameters: three for the population sizes in each of the three epochs and two for the times of the population crash and subsequent growth. Testing for a past bottleneck is the test of whether the best five-parameter model fits the data significantly better than both the best three-parameter and the best one-parameter model.

You may notice that even a three-epoch (five-parameter) model is probably not very much like the actual behavior of ancient populations. A natural population decreases and increases in size on a generation-by-generation basis. There may have been large-scale changes in the past, but populations do not crash instantly at one time, or grow instantly. Instead, they grow gradually and fitfully, perhaps geometrically or perhaps not. In modeling terms, the demography of a natural population would require as many epochs to describe as there have been generations in its history, or even more.

But genetic data do not preserve evidence of every generation in a population's history. Most of these possible pleuriepochal histories are very similar to each other--so similar as to be indistinguishable. And genes are actually very weak discriminators, so that a three-epoch model is about as far as we can expect them to be informative. So the question we can pose is ultimately a very simple one: is the history of the population more similar to a constant population size, a single expansion, or a bottleneck?

What hypothesis are we testing?

But consider for a moment the opposite corollary: the fit of a model must get better as we add parameters. The statistical test asks whether the additional parameters make the fit significantly better, so that we don't automatically accept a more complicated model when a simpler one matches as well as we could expect it to. But in this case, our models are only representative of demographic processes, and not any other factor that may have affected human genes in the past. What if one of these other factors significantly reduced the fit of the one-epoch model? Clearly the two-epoch model would be a better fit, and might even be a significantly better fit if the unknown factor had a similar effect to a change in population size in the past.

There are good reasons to think that exactly this situation might affect human genetic variation. For example, natural selection on genetic loci can produce a similar frequency spectrum of mutations as population growth in the past. If we have a sample of genes including some that have been under positive selection in the past, then a two-epoch model may fit the distribution of variation much better than any one-epoch model entirely because of this history of selection. The magnitude of the effect of selection depends on the number of genes that have experienced selection and the pattern of that selection, but it wouldn't take very many to make a two-epoch model a better fit.

But a two-epoch demographic model obviously does not describe the effects of selection perfectly. What happens if we add another parameter? Presently, nobody knows the answer to this question. I speculate that it might very easily happen that a three-epoch model would significantly better fit a population with a combination of selective effects on different genes, because some of the genes would appear entirely unaffected by selection (these are the ones that look like their variation survived a "bottleneck") and other genes would be highly affected (these genes would look like variation had been lost during the "bottleneck"). But this is just speculation.

The real issue is that adding parameters is very misleading if the assumptions underlying the models cannot be rigorously verified. In the case of demographic models about the past, the biggest assumption is selective neutrality. This assumption is necessary to using genes to test demographic hypotheses, because only genetic drift and mutation among the evolutionary forces have effects that are strongly linked to the size of the population. But we know that many genes were not neutral.

Presently, most molecular geneticists do not take this concern seriously. Marth et al. (2004, p. 363) consider the issue as follows:

We must also acknowledge that the current shape of human variation structure is the result of a combination of neutral and nonneutral (selective) forces. The current state of the art in recognizing the effects of selection in variation data has been reviewed recently (Bamshad and Wooding 2003). Positive selection resulting in genetic hitchhiking can mimic the effects of population expansion in that it gives rise to an excess of low-frequency alleles (Kaplan et al. 1989; Braverman et al. 1995). Recent efforts have been aimed at detecting loci that exhibit signatures of positive selection (Cargill et al. 1999; Sunyaev et al. 2000; Akey et al. 2002; Payseur et al. 2002). However, the exact proportion of genes that have been targets of strong positive selection within our evolutionary past is unclear (Bamshad and Wooding 2003). It is also unclear, in general, how far the effects of hitchhiking extend beyond the locus under selection (Wiehe 1998). Given that only a few percent of the human genome represents coding DNA, and that not all genes are expected to be targets of positive selection, we speculate that the distortion due to selective forces on the AFS in our data set of >20,000 randomly selected genomic loci is small when compared to the global effects of drift modulated by long-term demography.

Basically boilerplate in studies like this one for, "we know our assumptions are not entirely accurate, but we think it doesn't matter too much." But does it? When studies vary so widely in their estimated demographic parameters, what reason should we logically adduce to explain the results?

Syndicate content