demography

Handling exponential growth in demographic models

Exponential growth is a feature of current human populations, and was may represent how the human population behaved during some episodes of its demographic history. However, "exponential" can mean different things to different people, if you're not used to thinking mathematically about growth. So I need to lay out some definitions:

Acceleration's discontents

The June Scientific American (no link available) has an article on page 32 about the "therapeutic value of blogging." That's some relief, after the stories a couple of months ago about blogging being potentially deadly.

And it's no small irony, considering that the article I found on the previous two pages had great potential to give me therapeutic opportunities here.

In the article, titled, "Need for speed?" David Biello wrote up some of the human genetics results of the past 6 months, placing them as a point-counterpoint presentation of our acceleration result.

First, he cites Gregory Cochran, who does as good a job explaining our result in one sentence as I've seen:

"We found very many human genes undergoing selection" ... "We believe that this can be explained by an increase in the strength of selection as people became agriculturalists, a major ecological change, and a vast increase in the number of favorable mutations as agriculture led to increased population size."

In that form, it is hard to see how anyone could disagree. Clearly, agriculture was a major ecological shift for humans, and it imposed new selection pressures associated with diet, disease, social organization and other ecological factors. At the same time, the population grew and more people meant more mutations. That's the story; the rest is detail filled in by anthropology, genomics, and math.

Biello then cites another recent study that partially confirms our results. That study, by Lluis Quintana-Murci and colleagues, found a much smaller number of selected genes (55), but what is important is that every one of these genes has an FST greater than 0.65. In other words, in every one of these cases, an allele that is vanishingly rare in most of the world has reached a frequency over 80 percent in one population. As allele frequencies go, these are extreme differences -- much, much larger than the average genetic difference between populations, characterized by an FST around 0.1. We also found a few such alleles in our survey of selected genes, but the vast majority of genes have not generated such extreme differences in frequency -- mainly because they haven't been around long enough. In other words, the Quintana-Murci study confirms the distribution of positively selected alleles, across the range where it overlaps with other studies, including ours.

Then Biello turns to the doubters. Noah Rosenberg coauthored a study earlier this year that reported polymorphism data from a sample of populations around the world.

"We are a young species," remarks geneticist Noah Rosenberg of the University of Michigan at Ann Arbor, who participated in a comprehensive study of genetic variation that appeared in Nature in February. "Different human populations have not been separated for long enough periods of time to develop their own new alleles."

Now, I never hold quotes in the press against people, because they represent a very small portion of what they may have said to a writer, and there are many opportunities for miscommunication. Still, I have to write about this, because it's about my work! So I'll try to describe the misconceptions illustrated by the article.

I am pretty sure that Rosenberg must know that his statement in the article is false. For one thing, "developing" a new allele is simply mutation, and mutation occurs continuously. All human populations have rare alleles that have originated recently and remain distributed only across small areas. Rosenberg's surveys of gene variation have identified many such alleles.

But more important to the current question, positive selection carries an allele to high frequency very rapidly -- much more quickly than the 50,000-year or longer span of time we are talking about. An allele with a five percent fitness edge can go from zero to fixation in several hundred generations -- in humans, they can make very large frequency changes in a thousand years.

If we took the quote at face value, Rosenberg would be saying that human evolution is impossible -- and that new selected alleles like lactase persistence and sickle cell simply cannot exist. We may be a young species (although I would argue the point), but that doesn't mean that we have stopped evolving!

Two prominent geneticists quoted in the article suggest that a bottleneck may explain the pattern of human genetic variation. Here also, I have to be cautious interpreting their quotes -- because even though they may seem relevant, they are referring to their own research papers, which don't actually address the question of linkage disequilibrium and positive selection.

Marcus Feldman suggests that a series of bottlenecks are a likely explanation for the pattern of human genetic variation, in particular, the decreasing gradient of genetic diversity with increasing distance from Africa. This is the "serial founder effect" scenario that I have written about before. I criticized Feldman's and other papers on this subject this spring, referring to "the Stanford school of genetic orthodoxy." My basic point is that all of the results are assumed to support the idea of bottlenecks: no one has yet tested the hypothesis. Even simulations that show the credibility of the concept do not test the hypothesis, because they do not examine credible alternatives, either demographic or selective.

More important, bottlenecks during the dispersal from Africa 50,000 years ago cannot possibly explain linkage blocks concentrated in coding genes with a mean age of 5500 years!

Why is there such difficulty understanding natural selection? I find it quite incredible that many of the scientists who would rail against ignoring Darwin in public schools at the same time actively root out Darwin's theory from their graduate students. Still, there it is. One prominent geneticist (I won't give the name) recently asked me, "You don't really think that lactase was selected, do you?" Many really believe that natural selection has stopped and that recent human evolution reflects nothing more than the cumulative effects of bottlenecks.

What is amazing to me is that these same geneticists embrace hypotheses of population history that cannot possibly have happened. The other geneticists quoted in the article, Carlos Bustamante and his graduate student Kirk Lohmueller, wrote a paper earlier this spring arguing that deleterious mutations have reached high frequency in Europeans (moreso than Africans) because of a bottleneck during European history. The press reported this work as "Whites genetically weaker than blacks, study finds." The hypothesis in the paper is that protein-coding sites otherwise conserved in most mammals may differ among humans because of relaxed selection in a bottleneck.

Here's why they're wrong: their bottleneck is impossible. They propose that the European population was a small, isolated population of 5,700 effective individuals from 214,000 years ago up to the Last Glacial Maximum. I suppose I should take some encouragement that they believe Neandertals were European ancestors (because otherwise, where exactly would this small, isolated population of Europeans have lived). But it's still quite impossible -- it implies no gene flow between Africans and Europeans across that entire span. You see, that is the only way that genetic drift can lead to this kind of result -- large differences in frequencies between continents for hundreds of deleterious alleles. It takes a bottleneck of exceptional length, along with complete isolation.

In what has become a troubling trend, these details were hidden away in the online supplementary information of the paper. It is no surprise that most people read only the paper's conclusions, without critically evaluating the methods. But when the assumptions are hidden so that it takes an effort to look at them, you can understand that the paper does not receive the kind of scrutiny that it deserves. These are not obscure laboratory techniques; they are the basic evidence on which the conclusions were based.

Now, Bustamante knows that positive selection has been very important in recent human evolution, because he wrote an important paper on the subject in 2005. I wrote about the paper at the time -- it was one of the works that really got us thinking about acceleration in the first place. So why in the world did their more recent paper adopt such a ridiculous model of population history?

In any event, I don't think that either of these studies from earlier this year are relevant to our acceleration results. They address different aspects of genetic variation. However, acceleration may help to explain the high frequencies of some gene variants conserved in other mammals -- the results explained by Lohmueller and colleagues as relaxed selection under a bottleneck.

The acceleration of recent positive selection would predict that many otherwise conserved gene variants may be segregating in humans, because they are the targets of positive selection. These conserved sites are among those most likely to show a strong sign of recent selection, because adaptive changes on them are necessarily rare (we know they're rare, because they haven't happened very often among other species). Most such sites are still conserved in humans -- it's just not possible to change their function in adaptive ways. But the massive ecological changes of recent human history have created the opportunity for adaptive responses that are not present in other mammalian lineages. We shouldn't be surprised to see that some such changes are currently underway.

Now, that's a different interpretation of the same data, and it's a testable hypothesis. Are these conserved sites in regions that show other signs of positive selection? If they are, then acceleration explains the data. I'm looking into it now.

This magic moment

Today, the projected population of the Earth (available here) passed 6,666,666,666.

Around 9 years ago, I tried to put into the first sentence of a paper that the world's population was 6 billion. A reviewer wouldn't let me get away with rounding up, noting that "It's not 6 billion yet!"

Of course, by the time the paper was published, the sentence was true.

Filed under

Were ancient Africans divided into small, isolated bands?

Last week when I wrote about the study of African mtDNA variation by Behar and colleagues, I focused on the issue of population size. To me, that must be the first parameter that we try to estimate, because the simplest relevant model of population history -- the Wright-Fisher model -- is described by that single parameter: the number of individuals. If we are going to evaluate evidence for population structure, we first must deal with the question of size.

The claim in the press release is that the African population was divided into separate populations:

Doron Behar, Rambam Medical Center, Haifa, said: "We see strong evidence of ancient population splits beginning as early as 150,000 years ago, probably giving rise to separate populations localized to Eastern and Southern Africa. It was only around 40,000 years ago that they became part of a single pan-African population, reunited after as much as 100,000 years apart."

Is it true? Certainly that describes the model tested in the paper. But is it the right model? Is there evidence to justify that model as opposed to simpler alternatives?

A real population may be structured in many ways -- by age, by caste or class, by space. If we have samples that are taken from different geographic locations, as in this study, it is natural to test hypotheses about structuring across geography. That's what Behar and colleagues did: they tested a hypothesis of panmixia, or random mating across space.

Panmixia is the simplest model -- the null hypothesis -- about population structure. If everyone mates randomly, then there is no geographic structure. The population would be a single, unstructured gene pool. The paper refutes this model, demonstrating that people did not mate randomly across the geography of Africa during a certain period of time.

But the question is: which model do we adopt once we have refuted panmixia?

I rather like isolation-by-distance as a model for human population history. Isolation-by-distance (IBD) assumes that people travel some distance before they reproduce. It's a simple model -- the distance traveled may vary among individuals, but the variance in this value is the only parameter necessary to predict the structure of the population. IBD can explain quite a lot -- why people look like their neighbors, why intermediate populations on the map tend to look intermediate in allele frequencies, and why selected alleles take some time to disperse across space. It is generally consistent with what we know about hunter-gatherer demography. People tend to stay where they are, but a fairly large fraction move to marry into neighboring groups, and a smaller fraction go beyond the neighboring groups to marry further away. So I think this is the null hypothesis once panmixia is refuted. IBD is not a hypothesis of small, isolated bands -- it is a hypothesis of a geographically dispersed population with gene flow.

The Genographic Project has done more than any other single project to extend the sampling of human populations. The paper by Behar and colleagues is a testament to that -- they are able to work with a broader and deeper sampling of mitochondrial variation in Africa than has yet been available. This is a credit both to the ambitious goals of the project and to today's genetic technology, which has made it possible to sequence more whole mitochondrial genomes on the project's budget. It is a great example of how spending money can circumvent some theoretical problems.

Still, the Project likely wanted to maximize the effectiveness of its money, so it focused on sequencing only those variants that were underrepresented or rare in previous studies. From the Methods:

Samples were chosen to include the widest possible range of Hg L(xM,N) internal variation on the basis of the previously available sequence analysis of the mtDNA control region and are, therefore, biased toward rare variants. In addition, we attempted to focus on branches (e.g., L0d, L0k), populations (e.g., Khoisan), and geographic regions (e.g., Chad) for which the current data were scant. Last, we preferred to sequence variants that the current literature suggested to be rare or anecdotal in any given geographic region (e.g., L0k in the Near East).

Ummm... wait a minute. This is definitely not what you want to do if you're going to test hypotheses of population history. They have deliberately narrowed their sample in a way that distinguishes Khoisan from other peoples, and have excluded some proportion of variants already known to be common. We can predict, based on the sampling scheme alone, that Khoisan and other people ought to be more distinct that would be expected under a random sampling of each population, and certainly more so than expected under a random sampling of the African continent. This means that if the data were to reject IBD, we would have to examine whether that was because of the population history, or instead because of the sampling scheme.

Do the data reject IBD? Well, we don't actually know from the paper. The study employs an island model, in which Khoisan and all others are assumed to represent either one panmictic population or two isolated ones. They devised a test based on permuting the number of lineages that they inferred to have existed during past time intervals. An island model with isolation of two populations predicts that each will share some gene lineages lacking in the other -- so-called "private" haplotypes. In contrast, two samples taken from a single panmictic population would each have a small proportion of "private" haplotypes, as well as some number of common haplotypes shared by both samples.

So, the study (reasonably) tests the null hypothesis that the African mtDNA samples derive from a single panmictic population going back to the mtDNA coalescent. They estimate the date of this coalescent (based on their mutation rate model) as around 200,000 years ago, so this is a test of panmixia in Africa across this time period. They use a permutation test to evaluate the likelihood that some number of closely related lineages would all be private to the Khoisan population, under the hypothesis that they are randomly drawn from the African population as a whole. The lineages they examine are the ones they infer to have been present in the Khoisan population at various time intervals in the past -- again, based on their model of mutation rate. They can disprove panmixia across times after 100,000 years and before 80,000 years. Before this time, too few coalescent lineages are inferred to have existed to obtain a significant refutation of the test of panmixia. After 40,000 years, there are obvious shared lineages between Khoisan and other samples that could only have been shared by gene flow.

I worry that there is a bias in this test. The authors applied it only to a period of time earlier than the coalescence times of recent shared lineages, but after the diversification of the ancient lineages that are not shared. In other words, there appeared to be a gap in the coalescence times of shared haplogroups. Usually, you would correct the test for multiple comparisons not only across haplogroups, but also across time periods. Given that we are considering a range of 150,000 years, across which there is evidence for gene flow both early and late in that history, what is the significance of the fact that we see few shared lineages at intermediate times? That will be less significant than the values reported in the paper, but how much less it is difficult to predict.

In the end, what do the observations in the paper mean? In the simplest interpretation, either Africans were not random-mating after 100,000 years ago or regional selection differentiated southern and other African mtDNA pools.

Did ancient Africans live in two isolated groups? I wouldn't say that: the authors didn't test that hypothesis.

Did ancient Africans live in small bands scattered across the continent? Well, all ancient humans lived in small bands. The question of whether they were scattered is a question about the population size -- and as I showed last week, the population size during this period of time was not small. So we can imagine a population structure like recent historic hunter-gatherers -- with Africa possibly having something like the population size and structure of indigenous Australians.

What's the bottom line? The results are consistent with isolation-by-distance in ancient Africans. That model, followed by a subsequent global expansion, has been around for a long time. In 1993, Henry Harpending and colleagues called it the "Weak Garden of Eden" model: a geographically structured African population that underwent an expansion and dispersal to other regions. Certainly for the mitochondrial DNA, this seems to be the model that presently best fits the data.

What remains in question is how much of the subsequent spread of mtDNA was also reflected by spread of nuclear DNA haplotypes, and how much was induced by natural selection on mtDNA haplogroups. As I continue to write about population histories, we will meet this issue again.

References:

Behar DM, 14 others, and The Genographic Consortium (consortium again? Whoa). 2008. The dawn of human matrilineal diversity. Am J Hum Genet 82:1-11. doi:10.1016/j.ajhg.2008.04.002

Filed under
Syndicate content