john hawks weblog

paleoanthropology, genetics and evolution

demography

  • Cultural impedance, demographic growth, effective population size

    Wed, 2009-01-07 01:09 -- John Hawks

    This is a complicated story with many interlocking parts. Telling the whole story may well take me fifty posts. There's a lot of new science hiding in here waiting to get out.

    I'm starting now because of the new paper by Luke Premo and Jean-Jacques Hublin, titled "Culture, population structure, and low genetic diversity in Pleistocene hominins." This paper is not the final word on its topic, nor is it the first word. But it is very much worth reading.

    It makes an excellent point of departure to explain what we know and don't know about the genetics of prehistoric humans. Premo and Hublin propose an interesting model with interaction between culture and natural selection, as an explanation for a 35-year-old problem in human evolution: Our low level of genetic variation.

    Their model may be right. I certainly think there's a kernel of truth in it, shared with a number of other models, as I'll describe below. And it's testable -- a project to which we'll be returning in the next few months.

    Explaining a small effective size

    Humans today have relatively low genetic diversity compared to other hominoids. Chimpanzees, gorillas, and orangutans each harbor more genetic variation than humans worldwide.

    This observation is strange because under a simple genetic model, the amount of genetic diversity in a population should be proportional to the number of individuals. Since there are many more humans in the world than gorillas, chimpanzees, or orangutans, it seems like we ought to have more genetic diversity. But we don't. Strange.

    Or maybe not so strange. Many assumptions are floating under that "under a simple genetic model." My work, and the work of many other geneticists, has been focused on uncovering and examining these hidden assumptions.

    Genetic variation is only indirectly related to demography. Essentially, a population will be genetically diverse because many different alleles survive across generations. This genetic survival is less likely when there are few individuals. It is also less likely when most individuals are close relatives -- that is, when they are inbred. Natural selection can cause inbreeding. Certain kinds of mating behavior can cause inbreeding also.

    One simple explanation for low genetic diversity is simply that there aren't very many individuals. Few individuals means few chances for an allele to reproduce itself in the population. Rare alleles will therefore be rapidly lost in a small population. But of course, we know that there are a lot of people in the world. That explanation doesn't work.

    Bottlenecks

    The first people to point out that humans were short of genetic variation were John Maynard Smith and John Haigh, in 1974. They looked at the allelic variation of the beta globin gene and determined that it was consistent with a population of only 10,000 individuals. Since there are more than 10,000 people now, they needed some other explanation.

    They proposed a historical scenario, in which humans had been limited to very small numbers in prehistoric times. This scenario is a population bottleneck: a restriction for an unknown and unspecified length of time, followed by a recent expansion to the human population's present large size.

    The bottleneck scenario was revived again and again during the next 20 years. When human mtDNA -- like beta globin -- was found to have relatively low diversity, a bottleneck was the preferred explanation. Since diversity was highest in Africa, many authors proposed that Africa had been the location of this bottleneck population. And so, the Out of Africa hypothesis gained its genetic force.

    Meanwhile, in the last fifteen years, a number of people have set about finding other explanations for human genetic variation. A bottleneck can explain some observations well, but seems inconsistent with others. One of these inconvenient observations -- as Premo and Hublin point out -- is that Pleistocene human groups had low genetic variation, just like humans. We know this now because of the Neandertal genome work -- not only Neandertals, but also our common ancestor with Neandertals had low genetic variation. This coincidence of three hominid populations, two of which no longer exist, can't be the product of a single out-of-Africa bottleneck.

    So either we need three distinct bottlenecks, or we need something else. That, among other observations (such as the continuity of features in regions of the Old World outside of Africa), causes us to consider mechanisms that can reduce genetic variation without a bottleneck.

    Population structure, inbreeding, and diversity

    The fastest way to induce inbreeding is the same way that animal breeders do it: take one big horde, divide it up into little herds, and force each individual to mate only within her tiny group. After many generations, each of these little herds will be inbred. Each tiny herd will retain only a very small subset of the big horde's alleles. The genetic diversity of each tiny herd will be low.

    Here's a problem: We still have a bunch of these little herds. Sure, each one of them has low genetic diversity. But if we look at all of them, they probably still collectively retain most of the alleles that had been in the big horde. The variation in the total population will be great, even as the variation in the average subpopulation has been reduced. The imbalance between these values -- the total variation and the average subpopulation variation -- is measured by Wright's FST: a ratio measuring the reduction in diversity due to inbreeding.

    If one of these little herds expanded and wiped out all the others, it would be just like a population bottleneck. The original genetic variation of the horde would be gone, and only the variation of one single herd would remain. That's the Out of Africa hypothesis.

    The frequent extinction and recolonization model

    Consider the population of E. coli in your gut. There are billions of individuals, but all are descendants of a relatively small number of clones -- maybe only a handful. These clones migrated into your body from other people or animals, which each harbor their own population of billions. The global population of E. coli contains untold numbers of individuals -- upward of 1020.

    E. coli cannot really maintain so much variation. When you die, a few individuals of your E. coli population might make it into the gut of a lion or bear. But most of them are hosed. Your gut population will become extinct. Maybe a few lucky individuals will escape your body during your life and colonize a new host -- maybe your child, or the neighbor's dog. The mechanism that retains variation is not the billions of individuals in your gut, but instead the few that move into and out of your gut.

    Maruyama and Kimura realized that this mode of subpopulation extinctions might vastly reduce genetic variation. Takahata (1994) examined this as a mechanism for human genetic variation. The logic is that Pleistocene humans lived in small bands, and each small band of hunter-gatherers had a substantial risk of extinction. If these truly died and were replaced by new colonists from neighboring bands, then the genetic variation might be very small, even though the human population was spread across the Old World.

    Together with Elise Eller and John Relethford, I examined this model in a 2004 paper. We looked at the relation of different parameters in the model, and whether realistic values for hunter-gatherers would have a substantial effect on human genetic variation.

    If we want to reduce genetic variation with this model, then two things have to be true. First, groups need to be quite genetically different from each other. That is, they need to be inbred. And second, they really need to go extinct and be replaced.

    Recent hunter-gatherers tend not to simply die when times are tough. They may disappear from an area, but some numbers of them survive to move into other populations. And there are high levels of intermarriage among hunter-gatherer bands, and between hunter-gatherers and their neighbors. The values that are realistic for living hunter-gatherers will reduce genetic variation by a substantial amount -- perhaps by half. But not by a huge amount. We concluded that values in the Pleistocene may have been more extreme than in the present day, depending on the culture of prehistoric foragers.

    Notice the two factors important to the model. The groups need to be inbred. That means that some force must impede gene flow between them. And the groups need to be replaced with some regularity. That means that some mechanism must cause groups to die.

    The diffusion wave model

    Vinayak Eswaran (2002) proposed that the low genetic diversity of humans could be explained by selection. In his explanation, a coadapted gene complex arose within ancient Africans and dispersed through the Old World population within the last 100,000 years. It is economical to suppose that this coadapted gene complex generated some anatomical or behavioral trait of modern humans. Hence, a dispersal of an anatomy or behavior would lead to genetic dispersal.

    Yet, in this model local genes of populations outside of Africa would survive into the present day. The spread of the key phenotype in this model is not a replacement, it is a diffusion.

    The diffusion of a single advantageous gene will have relatively little effect on genetic variation across the genome. A small area near the selected gene may hitchhike to fixation as a result of selection. But most of the genome will be completely unaffected.

    But Eswaran proposed that several genes were required to work together to generate the adaptive phenotype. Hence, the selective advantage would need to push all these genes simultaneously for the adaptive phenotype to spread. Further, Eswaran supposed that individuals might mate assortatively based on the presence of the adaptive phenotype. This assortative mating is a kind of inbreeding, and would tend to impede the flow of genes from local populations into the growing population with the adaptive phenotype.

    In other words, the diffusion wave model can restrict genetic variation. It does so with the same two conditions as the extinction and recolonization model: Some force causes inbreeding within populations, and another force pushes some of those populations to expand while others contract. In this model, assortative mating and epistasis are the factors that promote inbreeding, while natural selection causes demographic imbalance.

    Premo and Hublin's model

    Now, we can consider the new paper by Premo and Hublin. As in the two models above, their model has a force that promotes inbreeding and another force that causes demographic flux.

    The inbreeding force is "culturally mediated migration" -- the idea that cultural differences between populations tend to impede gene flow between them. If the global population were divided into relatively small herds, each possessing a distinct culture, then we might expect these herds to be inbred. Premo and Hublin performed simulations in which the effects of culture on migration rates were allowed to vary. If individuals demand to settle down in groups with nearly identical cultures to the group of their birth, the inbreeding within populations will be very high.

    The demographic force in Premo and Hublin's model is natural selection. They suppose that advantageous mutations arise spontaneously, and that these mutations are sufficient to drive demographic expansion, as long as gene flow is impeded by cultural differences:

    In a panmictic population, a selectively advantageous mutation evolves to fixation with a probability and at a rate that share a simple relationship to population size and the strength of selection. The manner in which a favorable mutation spreads through a structured population is not so simple (25). In a structured population, gene flow between subpopulations is required for an advantageous mutation to spread beyond the boundaries of the group in which it first appears. However, [culturally mediated migration] can inhibit the spread of beneficial mutations by restricting gene flow to short cultural distances. One consequence of cultural isolation is that offspring inherit only those novel, beneficial mutations that spread to fixation within, but not beyond, the culturally defined boundaries of the group into which they are born. Another is that, when migration between groups is rare, the fate of each beneficial mut ation—its frequency in the metapopulation— depends upon the rate at which its carrier’s group fissions relative to other groups. Variance in groups’ fission rates depends on how relative indiv idual fitness is partitioned within and bet ween groups. A group-level selective sweep, whereby 1 group (and its daughter and granddaughter groups) fissions more rapidly than other groups, requires low within-group variance and high bet ween-group variance in relative individual fitness (26, 27). As long as these conditions persist, members of the group(s) that has accrued the most favorable mutations will contribute disproportionately more offspring to the metapopulation (28, 29) (Premo and Hublin 34-35).

    It may seem obvious that I would really like this idea -- in fact without knowing about Premo and Hublin's work I was lecturing in November about the demographic effects of selection impeded by cultural differences!

    But as in the case of extinction and recolonization, and the case of the diffusion wave with epistasis, the question is whether realistic parameters for humans will work with the model.

    Premo and Hublin don't answer this question. Their paper explores the interaction of several parameters across their entire range, finding some regions of the parameter space in which culturally mediated migration and selection may combine to exert a strong effect reducing neutral genetic variation. But aside from a general claim that cultural distinction among Pleistocene humans is plausible, they do not attempt to demonstrate the importance of these factors for ancient human groups.

    Given our lack of knowledge about the number of selective events and their timing during human evolution, their caution may be appropriate.

    Still, I think there is a great potential for testing this model as applied to the archaeological and genetic record. Taking the culture areas that appear to have characterized MSA/Middle Paleolithic populations and later, are those areas (and the populations contained within them) suitable for culturally mediated migration as predicted by this model? Given the number of selected mutations on the human lineage, within an order of magnitude, are there enough to generate the demographic flux predicted by the model?

    Despite the lack of attention to real Pleistocene population parameters, Premo and Hublin succeed in putting their model into a very interesting context. They connect the idea to Sewall Wright's shifting balance model, suggesting that an appropriately divided human population might give rise to favorable gene combinations -- small and repeated versions of Eswaran's diffusion wave model. And the spatial aspect of the model lends itself naturally to a comparison with spatial dynamics of group selection, which has been a topic of great theoretical interest in the last few years.

    Premo and Hublin claim that this process will only work in species where cultural factors are significant in mediating gene flow. For a narrow construal of the model -- which depends on culture -- that is of course true. But culture is not the only force that could mediate gene flow in this way. Humans set up similar breeding systems in domesticated animals by imposing artificial barriers to gene flow. And natural barriers to gene flow, such as fitness-reducing epistasis depending on genetic background, might do the same. At the extreme, natural barriers such as lakes or islands would lead to a similar consequence to the extinction and recolonization model.

    Next

    This post has added some additional context to Premo and Hublin's paper, connecting the model to other models that are formally similar in many ways. It is natural now to consider the general model that includes all these as special cases, and develop more specific cases that might have influenced human genetic evolution.

    However, that exercise will take some more background. I started out by writing that this is a complicated problem with many interlocking parts. You can now see the boundaries of the problem. But to take it further, we'll have to consider the quantitative analysis of movement.

    That means differential equations.

    References:

    Premo LS, Hublin J-J. 2009. Culture, population structure, and low genetic diversity in Pleistocene hominins. Proc Nat Acad Sci USA 106:33-37.doi:10.1073/pnas.0809194105

  • Human evolution stopping? Wrong, wrong, wrong.

    Fri, 2008-10-10 15:04 -- John Hawks

    I'm usually pretty measured when I respond to dumb ideas about evolution reported in the press. After all, scientists are often misquoted, or misunderstood by reporters. So, I didn't really think it was worth writing about this story covering a lecture by UCL geneticist Steve Jones. After all, I'm hardly going to attend a faculty talk in London, and there's really no news here -- Jones has been arguing for more than ten years that human evolution has slowed or stopped.

    For example, this 1995 article in the NY Times describes his book, The Language of Genes:

    "Natural selection has to some extent been repealed" in the case of humans, says Dr. Steve Jones, a geneticist at University College London. Most social changes "seem to be conspiring to slow down human evolution," he argues[.]

    His ideas have been publicized for years outside of his books; for example, a 2002 public debate.

    But this latest Steve Jones kerfuffle seems to have impressive reach. It hit Slashdot, for goodness' sake. The Guardian has pubished an exchange of opinion pieces about it. Bloggers of note have picked it up, almost universally to criticize it as a wrong idea.

    Why it's wrong

    What I haven't yet seen, in all the commentary, is a short and simple refutation for each element of his argument. Let me lay out the components of Jones' argument, as explained in the current article and previous works:

    1. Evolution includes natural selection, mutation, and random change.

    Jones excludes gene flow, one of the usual four mechanisms of evolution -- this allows him later to argue that population mixing is a sign of evolution stopping, when in fact it is evolution.

    2. Older fathers have a higher mutation rate than younger fathers or mothers, and the proportion of older fathers is now much less than in the past.

    This is true, but minor compared to the main factor affecting the introduction of mutations into human populations: the population size. The rate of new mutations in the population is 2Nu, where u is the rate per individual, and N the number of people. The population of the world has increased tenfold since 1700. All other things equal, this means ten times as many mutations -- and twice as many mutations per generation today as in 1960. There are a smaller proportion of older fathers now than in 1700, but the absolute number of older fathers is much, much greater.

    Besides all this, the story of paternal age at birth is not so simple. Over the past twenty years in the U.S., the birth rate to fathers over 35 has been increasing, while the birth rate to fathers under 30 years has been decreasing. Reproduction in men aged 20-35 grew markedly after World War II, but the fraction of births to older fathers has been climbing since 1970. To be sure, the current rate of births to older fathers remains substantially less than before 1940, but this is part of an overall reduction in fertility across all age classes. Over the past 200 years, a reduction in average male fertility has been made up by an increase in infant and juvenile survival, so even though the birth rate to older (and younger) fathers declined, the population size continued to grow.

    3. Mortality of young people has reduced to near-zero.

    Jones acknowledges that this is only true in industrialized economies, so I'll set aside the obvious point (also made by Chris Stringer in this article: Mortality is still high among young people in a global context.

    But Jones entirely neglects fertility. Fertility selection depends on the variance in lifetime reproduction (some people have more children than others), as well as the variance in age at reproduction (some people have children earlier than others). Selection does not stop, even if mortality does. He also neglects the high human rate of spontaneous abortion, a continuing source of mortality selection.

    Also, the decrease in mortality means that some mutations that once were deleterious are now neutral. These mutations now will be retained in the population rather than rapidly eliminated, and some of them will increase under genetic drift. In terms of the rate of change in frequency of these previously rare deleterious alleles, this means that the population will henceforth evolve more, not less.

    4. Small isolated populations allow rapid evolution by drift. But today's population is large and highly interconnected.

    There's no denying this, at least if we're talking about the rate of change of allele frequencies within each small population. However, the rate of fixation of neutral alleles is independent of population size. And the global rate of evolution is far slower in a network of partially isolated subpopulations than in a single large population. So this argument depends on what we mean by "evolution." Here's what we have from Jones in this context:

    “Small populations which are isolated can evolve at random as genes are accidentally lost. Worldwide, all populations are becoming connected and the opportunity for random change is dwindling. History is made in bed, but nowadays the beds are getting closer together. We are mixing into a global mass, and the future is brown.”

    Jones' definition of evolution (in argument 1, above) leads inevitably to confusion here. Clearly, this "mixing into a global mass" is actually rapid evolution of human populations, measured in terms of changes in allele frequencies. If Europe becomes "brown" in 500 years, that's a whole lot faster than the 20,000 years it took Europe to become "non-brown." Jones apparently means that sometime in the future, after this current, rapid period of evolutionary change, human evolution will be slow.

    But he's wrong. Mutations will be entering this large population faster than in the smaller global population of the past. This future population will be vastly more variable than any of the small human populations of today. Alleles under selection will be able to move and mix much faster than through the disjointed network of population contacts that existed 200 years ago. Only by one measure will evolution be slower: the rate of change in frequency of neutral mutations. But even that will be faster in the mixed future than the semi-isolated past, if we consider it globally instead of locally.

    Longevity

    Several respondents to Jones' arguments have taken an approach that I think is misguided. Pointing out that human longevity was much lower in the past, they argue that a much lower proportion of children were being born to older fathers.

    Actually, longevity has a very limited application to this argument. The high juvenile and infant mortality rates of the past have no influence on the average age of fathers at reproduction, since fathers are a subset of people who survive juvenile and infant mortality. For example, in the U.S. around 1850, only around 60 percent of all people born would survive to age 20. These deaths greatly reduced the average longevity, but had no effect at all on the proportion of births to older fathers.

    What matters is the fraction of young men (20-30) who survive to be older men (35-50). That fraction was high: roughly 85% of twenty-year-old men in the U.S. in 1850 survived to age 35, and two thirds survived to age 50. Older men were fathering a large fraction of the infants in early America and in pre-industrial populations. This is because once they started, they didn't stop, as long as they had a wife who could have children (and widowers often remarried). This was probably true in all agricultural societies, and likely in modern human hunter-gatherers back to 30,000 years ago or earlier. If we go back further in time, we find a much higher mortality rate among young men, so that fathers over 35 made much less of a contribution. But in historic times, older fatherhood has been very important to rate of new mutations per individual.

    What's important is that (a) the proportional reduction of older fatherhood is a small effect compared to the increase in mutations due to the growth of the population, (b) part of the decline of birth rates to older men is compensated by a reduction of infant and juvenile mortality, and (c) older fatherhood is now rising, not falling.

    I think Jones ought to pursue a far more interesting interpretation of these facts. European and American men today are increasingly pursuing part of a reproductive strategy that was common in the past, but less common in postwar Europe and America. Today, with lower infant and childhood mortality, the consequences of that strategy are potentially more powerful than in the past.

    Bottom line

    As always, claims about the rate of evolution in the future depend only slightly on empirical observations, and mainly on assumptions. In this case, Steve Jones has defined the "rate" of evolution in a very particular way, to come to the story that he prefers.

    I generally don't mind when prominent people say silly things about evolution. It gives the rest of us a chance to explain why they're wrong, and teach about the mathematical basis of evolution as we do it. In this case, it's sort of sad: Jones is out there making arguments and selling books, but he's clearly trapped in the pre-genome era. The exciting thing about genetics today is the extent to which we can observe human evolution happening!

    There's also an antiquated version of ethnocentrism here: how can we talk about the future of human evolution without considering the intense dynamics in today's developing nations? Relative to Africa and Asia, Europe is now a population sink.

  • Handling exponential growth in demographic models

    Fri, 2008-06-06 10:50 -- John Hawks

    Exponential growth is a feature of current human populations, and was may represent how the human population behaved during some episodes of its demographic history. However, "exponential" can mean different things to different people, if you're not used to thinking mathematically about growth. So I need to lay out some definitions:

    1. Linear population growth: The same number of individuals is added in each successive time interval. Hence, population size is a linear function of time. Think of driving your car at a constant velocity. Or, you deposit your paycheck every month into a bank account, without interest.

    2. Geometric population growth: The same proportion of individuals is added in each discrete time interval -- for example, in each generation. Time is not measured continuously. Consider a bank account, compounded annually.

    3. Instantaneous population growth: At one discrete time, the population is considered to transition immediately, without any time passing, from a small to a large size. Suddenly, a benefactor makes a large deposit in your bank account.

    4. Exponential population growth: The population grows by a constant proportion per unit time, measured continuously. Consider a petri dish with a growing colony of E. coli, or a bank account compounded continuously.

    If you drive your car at a constant speed, then in half the time it would take to reach your destination, you will be halfway there.

    But exponential growth does not work this way. Suppose you have a dollar in the bank now, and you invest at a continuous rate equivalent to 5 percent annually. In 100 years, you expect to have $148. If your account grew linearly, you would have $74 in 50 years. But at your exponential growth rate, you will have only $12. In fact, it will take 86 years for your account to reach halfway to its "destination" of $148.

    Now, what if we approached the question from the opposite direction? Suppose that our account really does grow exponentially, that we really did put in one dollar at the beginning, and we really did end up with $148 after 100 years. But suppose that we also really did have $74 in the account after 50 years. The form of the solution here is obvious: we are dealing with at least two different rates of increase -- one for the early part of the 100-year interval, and a different rate for the later part.

    In fact, there are an infinite number of ways that the rate might change over time to attain this result. Maybe it changed 30 years into the span, or 55 years in. Maybe it changed continuously. Maybe the account shrank at some times and grew at others.

    We can only attempt to deal with these unknowns by taking additional samples. What was the account balance after 20 years? After 21? 22? 73? I'll call these observations "signposts" -- because they give us markers along the path taken by the size of the account.

    You get the idea: this bank problem is very much like our problem reconstructing ancient demography in human populations. When we consider genetic variation, what we observe in today's genes was affected not only by the population sizes at the signposts that we observed in the past, but by every point in between.

    Suppose that our bank account was not merely symbolic money, but that the bank put in actual pennies when the amount increased. It's a simple enough matter to examine all 14,800 pennies at the end of the 100 years. We can ask, how many of those pennies will have mint marks dating 20 years into the span? How many will have mint marks dating 73 years in? The answers to those questions depend on the account balances across the entire 100-year span. That is the kind of question that we address about human history when we observe today's genetic variation. How many people today share haplotypes that originated 5000 years ago? What about 35,000 years ago? 143,000?

    When we make a prediction from evolutionary theory -- for example, the prediction of the age distribution of haplotypes in a population given the assumption of no selection -- then we must assert a model of demographic history. It used to be that you could simply assert a constant population size. But that's no longer any good for human evolution, since our population has obviously grown massively over time.

    If we want our predictions to relate to the real population history, then we ought to use as many signposts as we can find, so that we can constrain our models. For human demographic history, those signposts come from several sources, including the archaeological record, ethnographic comparisons, and increasingly genetic sampling. As I'm going to show, it's really not good enough to just pick numbers out of thin air. The reason is that there are many ways that your model can work against you unless you put in as accurate numbers as you can find.

    How not to handle exponential growth

    A simple exponential model has the benefit of simplicity. But if we don't choose our signposts carefully, a simple model will lead us badly wrong. Here, I'm going to examine the demographic simulations performed by Voight et al. (2006). I'm not picking on this paper in particular -- it actually stands out as a relatively good example of demographic modeling in genetics. This paper has been cited a lot of times, and it is valuable in part because of its detailed analysis of the power of detecting recent selection.

    Some of the power analyses were based on demographic models applied to the data from the Yoruba HapMap sample. Voight et al. (2006) considered only exponential growth models for the Yoruba (as opposed to the Asian and CEU HapMap samples, for which they also considered bottlenecks of various kinds). At the low end, the authors considered a model with no growth at all -- a constant effective population of 11,156 individuals. At the high end, they considered a model in which the population grew exponentially from an ancestral size of 10,018 individuals up to a current size of 1,910,000 individuals, with growth commencing 750 generations in the past. Other models were in between these extremes, although many had earlier onsets of population growth (up to 4000 generations ago). These values are reported in the online correction to the original article.

    At the outset, we can observe that these values are far too low, both for the ancestral and the current populations. The current population size of sub-Saharan Africa is on the order of 650 million individuals. This, of course, disproportionately represents the last few generations of rapid growth. But even in the year 1500, sub-Saharan Africa had a population on the order of 80 million people (Biraben 2003). The effective size of this population would be between 20 and 40 million. Of course, the Yoruba HapMap sample does not represent this population uniformly. The present population of Nigeria is 148 million, the number of Yoruba within this population approximately 30 million. Applying the same growth constant, we might estimate that this population had numbered around 5 million in the year 1500. But as we go back in time, we must encompass a wider cone of ancestry, as genes have flowed into the Yoruba from other populations. Hence, an effective 2 million individuals is certainly too small for the present population by a factor of five to ten, and plausibly too small for the population of 500 years ago by a smaller factor.

    The ancestral size is more seriously in error. Certainly, going back to 500,000 years ago or earlier, the long-term effective population size for humans really was on the order of 10,000 individuals. Since autosomal genes coalesce across that span or longer, we need to employ demographic models that incorporate this small ancestral size. However, we now know that this small size did not characterize any of the Late Pleistocene of Africa (as I discussed last month). Instead, the African population had reached an effective 38,000 individuals by 144,000 years ago, and grew after that time. So the initial size used by Voight et al. (2006) is small by a factor of more than four.

    But what matters much more is the combination of date and size. That's because the entire period matters to genetic variation, not merely the signposts.

    The models applied by Voight et al. (2006) may be fourfold too small at the beginning of the Late Pleistocene. But what does archaeology tell us about the African population in the early LSA, around 20,000 years ago, when Voight et al. (2006) suggest it had just begun to increase in numbers? Biraben (2003) puts the world population over 5 million individuals by that time. Taking this estimate, the sub-Saharan fraction of the global population at that time may have been substantial, more than a million individuals. That would mean that the Voight et al. (2006) estimate is perhaps only a thirtieth of the true value. Still, Atkinson et al. (2008), surveying mtDNA variation, found that the sub-Saharan population was apparently small compared to southern Asia around 20,000 years ago, with a sub-Saharan effective size less than 100,000 individuals. In that view, the Voight estimate is at least a tenth of the most accurate value.

    But what across the span from 10,000 to 5000 years ago -- the time range corresponding to the highest fraction of ascertained selection in their data? At the end of this time range, 5000 years ago, the best demographic estimates place the sub-Saharan African population around 6 million individuals, or perhaps 1.5 to 3 million effective individuals. The largest exponential growth model applied by Voight et al. (2006) predicts a continuous growth rate of 0.00028 per year during the last 750 generations. That would predict an effective size 5000 years ago of only 470,000 individuals -- perhaps a third to a sixth of the real value.

    In other words, the simulations conducted by Voight et al. (2006) have overestimated the power of genetic drift during the last 144,000 years, and most critically in the period around 20,000 to 5000 years ago. The problem is that the signposts are wrong: replace the demographic assumptions with better ones, and you bring them more into line with reality. In this case, the estimate of current effective size was wrong, but not unreasonably so -- it's possibly within factor of two. But the early values are wrong by a factor of ten or more, and the errors compound by the use of the simple exponential growth model. Replacing the more recent interpolated values with real estimates taken from archaeological and ethnographic models would be more complicated, but would actually remove uncertainty in the model.

    What are the effects of these models on the results of the paper? Figure 4 in the corrected paper shows the comparison of the real Yoruba data to the simulated datasets. In all cases, the simulated datasets have less variation in the critical statistic than the real data, which indicates the presence of widespread selection within the real data. If we incorporated a more accurate demographic model, the variation within the simulated data should reduce yet more, because genetic drift should have been much weaker than in the simulations performed by Voight et al. (2006). This would increase the proportion of inferred selection represented by the data. Likewise, the power to detect selection should increase for lower-frequency selected alleles -- because of the smaller chance that a long haplotype would increase by genetic drift alone.

    Next: Bottlenecks

  • Acceleration's discontents

    Sun, 2008-06-01 09:46 -- John Hawks

    The June Scientific American (no link available) has an article on page 32 about the "therapeutic value of blogging." That's some relief, after the stories a couple of months ago about blogging being potentially deadly.

    And it's no small irony, considering that the article I found on the previous two pages had great potential to give me therapeutic opportunities here.

    In the article, titled, "Need for speed?" David Biello wrote up some of the human genetics results of the past 6 months, placing them as a point-counterpoint presentation of our acceleration result.

    First, he cites Gregory Cochran, who does as good a job explaining our result in one sentence as I've seen:

    "We found very many human genes undergoing selection" ... "We believe that this can be explained by an increase in the strength of selection as people became agriculturalists, a major ecological change, and a vast increase in the number of favorable mutations as agriculture led to increased population size."

    In that form, it is hard to see how anyone could disagree. Clearly, agriculture was a major ecological shift for humans, and it imposed new selection pressures associated with diet, disease, social organization and other ecological factors. At the same time, the population grew and more people meant more mutations. That's the story; the rest is detail filled in by anthropology, genomics, and math.

    Biello then cites another recent study that partially confirms our results. That study, by Lluis Quintana-Murci and colleagues, found a much smaller number of selected genes (55), but what is important is that every one of these genes has an FST greater than 0.65. In other words, in every one of these cases, an allele that is vanishingly rare in most of the world has reached a frequency over 80 percent in one population. As allele frequencies go, these are extreme differences -- much, much larger than the average genetic difference between populations, characterized by an FST around 0.1. We also found a few such alleles in our survey of selected genes, but the vast majority of genes have not generated such extreme differences in frequency -- mainly because they haven't been around long enough. In other words, the Quintana-Murci study confirms the distribution of positively selected alleles, across the range where it overlaps with other studies, including ours.

    Then Biello turns to the doubters. Noah Rosenberg coauthored a study earlier this year that reported polymorphism data from a sample of populations around the world.

    "We are a young species," remarks geneticist Noah Rosenberg of the University of Michigan at Ann Arbor, who participated in a comprehensive study of genetic variation that appeared in Nature in February. "Different human populations have not been separated for long enough periods of time to develop their own new alleles."

    Now, I never hold quotes in the press against people, because they represent a very small portion of what they may have said to a writer, and there are many opportunities for miscommunication. Still, I have to write about this, because it's about my work! So I'll try to describe the misconceptions illustrated by the article.

    I am pretty sure that Rosenberg must know that his statement in the article is false. For one thing, "developing" a new allele is simply mutation, and mutation occurs continuously. All human populations have rare alleles that have originated recently and remain distributed only across small areas. Rosenberg's surveys of gene variation have identified many such alleles.

    But more important to the current question, positive selection carries an allele to high frequency very rapidly -- much more quickly than the 50,000-year or longer span of time we are talking about. An allele with a five percent fitness edge can go from zero to fixation in several hundred generations -- in humans, they can make very large frequency changes in a thousand years.

    If we took the quote at face value, Rosenberg would be saying that human evolution is impossible -- and that new selected alleles like lactase persistence and sickle cell simply cannot exist. We may be a young species (although I would argue the point), but that doesn't mean that we have stopped evolving!

    Two prominent geneticists quoted in the article suggest that a bottleneck may explain the pattern of human genetic variation. Here also, I have to be cautious interpreting their quotes -- because even though they may seem relevant, they are referring to their own research papers, which don't actually address the question of linkage disequilibrium and positive selection.

    Marcus Feldman suggests that a series of bottlenecks are a likely explanation for the pattern of human genetic variation, in particular, the decreasing gradient of genetic diversity with increasing distance from Africa. This is the "serial founder effect" scenario that I have written about before. I criticized Feldman's and other papers on this subject this spring, referring to "the Stanford school of genetic orthodoxy." My basic point is that all of the results are assumed to support the idea of bottlenecks: no one has yet tested the hypothesis. Even simulations that show the credibility of the concept do not test the hypothesis, because they do not examine credible alternatives, either demographic or selective.

    More important, bottlenecks during the dispersal from Africa 50,000 years ago cannot possibly explain linkage blocks concentrated in coding genes with a mean age of 5500 years!

    Why is there such difficulty understanding natural selection? I find it quite incredible that many of the scientists who would rail against ignoring Darwin in public schools at the same time actively root out Darwin's theory from their graduate students. Still, there it is. One prominent geneticist (I won't give the name) recently asked me, "You don't really think that lactase was selected, do you?" Many really believe that natural selection has stopped and that recent human evolution reflects nothing more than the cumulative effects of bottlenecks.

    What is amazing to me is that these same geneticists embrace hypotheses of population history that cannot possibly have happened. The other geneticists quoted in the article, Carlos Bustamante and his graduate student Kirk Lohmueller, wrote a paper earlier this spring arguing that deleterious mutations have reached high frequency in Europeans (moreso than Africans) because of a bottleneck during European history. The press reported this work as "Whites genetically weaker than blacks, study finds." The hypothesis in the paper is that protein-coding sites otherwise conserved in most mammals may differ among humans because of relaxed selection in a bottleneck.

    Here's why they're wrong: their bottleneck is impossible. They propose that the European population was a small, isolated population of 5,700 effective individuals from 214,000 years ago up to the Last Glacial Maximum. I suppose I should take some encouragement that they believe Neandertals were European ancestors (because otherwise, where exactly would this small, isolated population of Europeans have lived). But it's still quite impossible -- it implies no gene flow between Africans and Europeans across that entire span. You see, that is the only way that genetic drift can lead to this kind of result -- large differences in frequencies between continents for hundreds of deleterious alleles. It takes a bottleneck of exceptional length, along with complete isolation.

    In what has become a troubling trend, these details were hidden away in the online supplementary information of the paper. It is no surprise that most people read only the paper's conclusions, without critically evaluating the methods. But when the assumptions are hidden so that it takes an effort to look at them, you can understand that the paper does not receive the kind of scrutiny that it deserves. These are not obscure laboratory techniques; they are the basic evidence on which the conclusions were based.

    Now, Bustamante knows that positive selection has been very important in recent human evolution, because he wrote an important paper on the subject in 2005. I wrote about the paper at the time -- it was one of the works that really got us thinking about acceleration in the first place. So why in the world did their more recent paper adopt such a ridiculous model of population history?

    In any event, I don't think that either of these studies from earlier this year are relevant to our acceleration results. They address different aspects of genetic variation. However, acceleration may help to explain the high frequencies of some gene variants conserved in other mammals -- the results explained by Lohmueller and colleagues as relaxed selection under a bottleneck.

    The acceleration of recent positive selection would predict that many otherwise conserved gene variants may be segregating in humans, because they are the targets of positive selection. These conserved sites are among those most likely to show a strong sign of recent selection, because adaptive changes on them are necessarily rare (we know they're rare, because they haven't happened very often among other species). Most such sites are still conserved in humans -- it's just not possible to change their function in adaptive ways. But the massive ecological changes of recent human history have created the opportunity for adaptive responses that are not present in other mammalian lineages. We shouldn't be surprised to see that some such changes are currently underway.

    Now, that's a different interpretation of the same data, and it's a testable hypothesis. Are these conserved sites in regions that show other signs of positive selection? If they are, then acceleration explains the data. I'm looking into it now.

  • This magic moment

    Sat, 2008-05-10 00:19 -- John Hawks

    Today, the projected population of the Earth (available here) passed 6,666,666,666.

    Around 9 years ago, I tried to put into the first sentence of a paper that the world's population was 6 billion. A reviewer wouldn't let me get away with rounding up, noting that "It's not 6 billion yet!"

    Of course, by the time the paper was published, the sentence was true.

    Tags: 
  • Were ancient Africans divided into small, isolated bands?

    Thu, 2008-05-08 12:05 -- John Hawks

    Last week when I wrote about the study of African mtDNA variation by Behar and colleagues, I focused on the issue of population size. To me, that must be the first parameter that we try to estimate, because the simplest relevant model of population history -- the Wright-Fisher model -- is described by that single parameter: the number of individuals. If we are going to evaluate evidence for population structure, we first must deal with the question of size.

    The claim in the press release is that the African population was divided into separate populations:

    Doron Behar, Rambam Medical Center, Haifa, said: "We see strong evidence of ancient population splits beginning as early as 150,000 years ago, probably giving rise to separate populations localized to Eastern and Southern Africa. It was only around 40,000 years ago that they became part of a single pan-African population, reunited after as much as 100,000 years apart."

    Is it true? Certainly that describes the model tested in the paper. But is it the right model? Is there evidence to justify that model as opposed to simpler alternatives?

    A real population may be structured in many ways -- by age, by caste or class, by space. If we have samples that are taken from different geographic locations, as in this study, it is natural to test hypotheses about structuring across geography. That's what Behar and colleagues did: they tested a hypothesis of panmixia, or random mating across space.

    Panmixia is the simplest model -- the null hypothesis -- about population structure. If everyone mates randomly, then there is no geographic structure. The population would be a single, unstructured gene pool. The paper refutes this model, demonstrating that people did not mate randomly across the geography of Africa during a certain period of time.

    But the question is: which model do we adopt once we have refuted panmixia?

    I rather like isolation-by-distance as a model for human population history. Isolation-by-distance (IBD) assumes that people travel some distance before they reproduce. It's a simple model -- the distance traveled may vary among individuals, but the variance in this value is the only parameter necessary to predict the structure of the population. IBD can explain quite a lot -- why people look like their neighbors, why intermediate populations on the map tend to look intermediate in allele frequencies, and why selected alleles take some time to disperse across space. It is generally consistent with what we know about hunter-gatherer demography. People tend to stay where they are, but a fairly large fraction move to marry into neighboring groups, and a smaller fraction go beyond the neighboring groups to marry further away. So I think this is the null hypothesis once panmixia is refuted. IBD is not a hypothesis of small, isolated bands -- it is a hypothesis of a geographically dispersed population with gene flow.

    The Genographic Project has done more than any other single project to extend the sampling of human populations. The paper by Behar and colleagues is a testament to that -- they are able to work with a broader and deeper sampling of mitochondrial variation in Africa than has yet been available. This is a credit both to the ambitious goals of the project and to today's genetic technology, which has made it possible to sequence more whole mitochondrial genomes on the project's budget. It is a great example of how spending money can circumvent some theoretical problems.

    Still, the Project likely wanted to maximize the effectiveness of its money, so it focused on sequencing only those variants that were underrepresented or rare in previous studies. From the Methods:

    Samples were chosen to include the widest possible range of Hg L(xM,N) internal variation on the basis of the previously available sequence analysis of the mtDNA control region and are, therefore, biased toward rare variants. In addition, we attempted to focus on branches (e.g., L0d, L0k), populations (e.g., Khoisan), and geographic regions (e.g., Chad) for which the current data were scant. Last, we preferred to sequence variants that the current literature suggested to be rare or anecdotal in any given geographic region (e.g., L0k in the Near East).

    Ummm... wait a minute. This is definitely not what you want to do if you're going to test hypotheses of population history. They have deliberately narrowed their sample in a way that distinguishes Khoisan from other peoples, and have excluded some proportion of variants already known to be common. We can predict, based on the sampling scheme alone, that Khoisan and other people ought to be more distinct that would be expected under a random sampling of each population, and certainly more so than expected under a random sampling of the African continent. This means that if the data were to reject IBD, we would have to examine whether that was because of the population history, or instead because of the sampling scheme.

    Do the data reject IBD? Well, we don't actually know from the paper. The study employs an island model, in which Khoisan and all others are assumed to represent either one panmictic population or two isolated ones. They devised a test based on permuting the number of lineages that they inferred to have existed during past time intervals. An island model with isolation of two populations predicts that each will share some gene lineages lacking in the other -- so-called "private" haplotypes. In contrast, two samples taken from a single panmictic population would each have a small proportion of "private" haplotypes, as well as some number of common haplotypes shared by both samples.

    So, the study (reasonably) tests the null hypothesis that the African mtDNA samples derive from a single panmictic population going back to the mtDNA coalescent. They estimate the date of this coalescent (based on their mutation rate model) as around 200,000 years ago, so this is a test of panmixia in Africa across this time period. They use a permutation test to evaluate the likelihood that some number of closely related lineages would all be private to the Khoisan population, under the hypothesis that they are randomly drawn from the African population as a whole. The lineages they examine are the ones they infer to have been present in the Khoisan population at various time intervals in the past -- again, based on their model of mutation rate. They can disprove panmixia across times after 100,000 years and before 80,000 years. Before this time, too few coalescent lineages are inferred to have existed to obtain a significant refutation of the test of panmixia. After 40,000 years, there are obvious shared lineages between Khoisan and other samples that could only have been shared by gene flow.

    I worry that there is a bias in this test. The authors applied it only to a period of time earlier than the coalescence times of recent shared lineages, but after the diversification of the ancient lineages that are not shared. In other words, there appeared to be a gap in the coalescence times of shared haplogroups. Usually, you would correct the test for multiple comparisons not only across haplogroups, but also across time periods. Given that we are considering a range of 150,000 years, across which there is evidence for gene flow both early and late in that history, what is the significance of the fact that we see few shared lineages at intermediate times? That will be less significant than the values reported in the paper, but how much less it is difficult to predict.

    In the end, what do the observations in the paper mean? In the simplest interpretation, either Africans were not random-mating after 100,000 years ago or regional selection differentiated southern and other African mtDNA pools.

    Did ancient Africans live in two isolated groups? I wouldn't say that: the authors didn't test that hypothesis.

    Did ancient Africans live in small bands scattered across the continent? Well, all ancient humans lived in small bands. The question of whether they were scattered is a question about the population size -- and as I showed last week, the population size during this period of time was not small. So we can imagine a population structure like recent historic hunter-gatherers -- with Africa possibly having something like the population size and structure of indigenous Australians.

    What's the bottom line? The results are consistent with isolation-by-distance in ancient Africans. That model, followed by a subsequent global expansion, has been around for a long time. In 1993, Henry Harpending and colleagues called it the "Weak Garden of Eden" model: a geographically structured African population that underwent an expansion and dispersal to other regions. Certainly for the mitochondrial DNA, this seems to be the model that presently best fits the data.

    What remains in question is how much of the subsequent spread of mtDNA was also reflected by spread of nuclear DNA haplotypes, and how much was induced by natural selection on mtDNA haplogroups. As I continue to write about population histories, we will meet this issue again.

    References:

    Behar DM, 14 others, and The Genographic Consortium (consortium again? Whoa). 2008. The dawn of human matrilineal diversity. Am J Hum Genet 82:1-11. doi:10.1016/j.ajhg.2008.04.002

    Synopsis: 
    Revisiting a paper that claims an African bottleneck, I examine the subject of population structure

Pages

Subscribe to demography

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.