john hawks weblog

paleoanthropology, genetics and evolution

migrations

  • Denisovan DNA in the islands, and an Australian genome

    Thu, 2011-09-22 18:09 -- John Hawks

    David Reich and colleagues today report on the persistence of Denisova-like ancestry in island Southeast Asia and Australia (citation not yet available). Meanwhile, Morten Rasmussen and colleagues (citation not yet available) report on the whole-genome sequencing of hair from an Aboriginal Australian who lived some 100 years ago.

    The most obvious story: These data utterly destroy the hypothesis of a single out-of-Africa colonization of Southeast Asia by modern humans. Many human geneticists have argued our present pattern of diversity originated in a wave of successive founder effects coming from a single recent African origin. They were wrong.

    Instead, we can turn to a complex model with successive dispersals and episodes of population mixture. This is not a static model of isolation-by-distance; it is a dynamic model in which populations grow and spread across large spans of the Old World, again and again and again. By my count, at least three massive episodes of population dispersal and mixture are necessary in Reich and colleagues' model. A picture of their admixture hypothesis:

    Denisova admixture model from Reich et al. 2011

    This model depicts (a) an early divergence of an African (represented by Yoruba) and Asian/Australasian populations. These mix with first Neandertals and then (for the Australian/New Guinea/Mamanwa populations) with Denisova-like people. Later (b), after the initial habitation of the Philippines by the ancestors of Mamanwa, a population like Andamanese Onge pushes into the islands, mixing with the ancestors of New Guinea and Australian populations. Later still (c), a population ancestral to today's Chinese people mixes with Philippines and other Southeast Asian people.

    As complicated as it looks, even this model must be a vast oversimplification. I don't like or attribute much belief to mixture models like this, as they assume too much about relative population sizes and the timing of mixture. Many recent hunting and gathering populations of Southeast Asia are not included in the current samples, and the Chinese sample is itself the result of very recent demographic events, covering what once may have been a wider diversity of peoples. Depicting Australian and New Guinean populations as monolithic is an artifact of the small sample; these places themselves housed a tremendous diversity of peoples. Nevertheless, the true model won't be simpler than this one; it will involve many more events that the data cannot yet resolve.

    Hints of that complexity emerge from the Aboriginal Australian whole genome. Rasmussen and colleagues show that this individual shares some ancestry with East Asian peoples, but on the whole populations in Europe and East Asia are much more genetically similar to each other than to this genome. The picture from the whole genome is essentially the same as that drawn by the SNP comparisons by Reich and colleagues, but with the potential (in the long run) to actually trace the histories of individual genes. And I think the gene-by-gene account of history will be important, because we already have some evidence that a few Denisovan genes do persist in mainland Asia, even though most are gone.

    To explain why, we can look at the proportion of Denisovan ancestry in different populations as depicted in a map by Reich and colleagues. The pie charts are confusing here, because they report the fraction of ancestry from Denisovans in each population relative to the 5% estimate for New Guinea. So Australians also have 5% in this figure, Timorese have around 2.5%, and Bougainville has more than 4%.

    Notice the apparent lack of Denisovan ancestry in anyone who lives anywhere that was once connected by land with mainland Asia. I say "apparent" deliberately: Abi-Rached and colleagues reported last month on the widespread distribution of Denisovan HLA types among today's Asian populations, and those may well be products of Denisovan genes that were later selected. I've already identified a handful of other loci that seem to reflect Denisovan ancestry in mainland Asian people. According to the comparisons by Reich and colleagues, such loci must be exceptions.

    At the same time, the mixture model presents an important idea: Once there were people in Southeast Asia who had much more Denisovan ancestry than any populations still remaining today. Both Australian/New Guinea populations and Philippine populations like the Mamanwa have subsequently mixed with new immigrants who lacked any sign of Denisovan ancestry. Prior to this later mixture, the ancestors of those populations must have been more Denisovan -- Reich and colleagues estimate 7%. This is the first evidence that ancestry from archaic people of Eurasia was diluted to a lower value by later population movements. If the population mixture originally happened somewhere in mainland Asia, any traces of Denisovan ancestry in those areas has been diluted almost to nonexistence. But the persistence of some genes would be predicted if natural selection were maintaining them in the face of demographic pressure from elsewhere.

    About the Australian genome, there will be much more interesting analyses to come, I expect. As whole-genome data come to represent more of the variation within human populations, we get a larger store of information about how we came to be variable. Variation traces not only to population movements and demography, but also to natural selection. Australia's population history has been very different from many populations of the Old World, and this genome should give us new perspective on the effects of that demographic history.

    Synopsis: 
    The hypothesis of a single out-of-Africa dispersal is rejected by new data about Denisovan mixture and whole-genome sequencing of an Aboriginal Australian.
  • Double the migrations, double the fun

    Mon, 2009-01-12 00:25 -- John Hawks

    Several news stories have reported on an article by Ugo Perego and colleagues, titled "Distinctive Paleo-Indian Migration Routes from Beringia Marked by Two Rare mtDNA Haplogroups." The Discover blog, 80beats, has a good two-paragraph summary of the results:

    In the study, published in Current Biology [subscription required], a team led by geneticist Antonio Torroni analyzed entire genomic sequences of mitochondrial DNA, the genetic material in cells’ energy-generating units that gets passed from mothers to children…. The researchers focused on the disparate geographic distributions of two rare mitochondrial DNA haplogroups — which are characterized by a distinctive DNA sequence derived from a common maternal ancestor — that still appear in Native Americans [Science News]. Both haplogroups appear to have arisen about 16,000 years ago.

    The researchers found that all the people with the D4h3 haplogroup presently live in South America, while those with the X2a haplogroup live in Canada and the United States, which suggests that the two genetically distinct bands of early humans struck off in different directions around 16,000 years ago.

    I don't have a lot to say about this. Tracking the frequencies and geographic distribution of rare haplotypes poses different issues than doing so for common alleles. Two closely related populations might nevertheless differ in the presence or absence of rare alleles.

    I really just wanted to post with reference to a broader point. If the data don't distinguish between a single migration at one time and multiple migrations at different times, then it's pretty much certain that they won't distinguish between a single migration and multiple migrations at one time.

    The two-simultaneous-migrations model might solve problems so far unaddressed by other models. But it's not obvious that it solves any -- there's no test here, just a discussion of the plausibility of the scenario. Each of these scenarios for New World habitation involves the dispersal of many populations across thousands of years. That means lots of free parameters, even in the simplest of the models. Given that necessary complexity, it seems pretty likely that there's a way for the simplest model to account for the frequencies of two rare alleles. It will take a whole lot more genetic comparisons to really test hypotheses about the founding population.

    References:

    Perego UA and 15 others. Distinctive Paleo-Indian Migration Routes from Beringia Marked by Two Rare mtDNA Haplogroups. Curr Biol 19:1-8. doi:10.1016/j.cub.2008.11.058

  • Quote: Dating the Y

    Sun, 2008-11-02 11:35 -- John Hawks

    Dienekes comments on a new paper that attempts to estimate the age of a Y chromosomal clade:

    I am constantly amazed by how the tremendous amount of effort required to identify, sample, catalogue, process, and genotype great numbers of people from around the world is accompanied by an apparently complete lack of interest in checking the basic premises on which interpretation of this data is based.

    There are too few people who understand the assumptions underlying the computer programs they're using -- which, after all, are intended to be useful in a broad range of species, not just humans. Yet few species have demographic histories anything like humans.

  • Darwin, languages, and genetics

    Wed, 2008-08-27 23:42 -- John Hawks

    How are languages and genes related to each other? Anthropology is an interdiscipinary subject, and this is probably the topic that pushes that envelope the furthest, in terms of calling on the expertise of many different disciplines in the humanities and sciences.

    As an organizing principle, many workers have begun with the hypothesis that languages and genes each form genealogical relationships among populations, and that the coevolution of languages and populations should make these genealogies resemble each other. In other words, French, Spanish, Portuguese and Italian all descend from Latin, and the present-day populations of France, Spain, Portugal, and Italy all descend from the population of the Roman Empire. Hence, the relations of the languages and the relations of the populations are parallel to each other.

    This general idea is older than Darwin’s Origin of Species, but Darwin’s words on the subject have been quoted more often than anyone else’s:

    If we possessed a perfect pedigree of mankind, a genealogical arrangement of the races of man would afford the best classification of the various languages now spoken throughout the world (Darwin1859, 422).

    Like many of Darwin’s words, however, these are generally pulled from the surrounding context without further discussion. The sentence actually serves as an example in Darwin’s defense of the phylogenetic tree as a description of relationships. In the previous paragraph, he points out that the similarities among different species cannot be made to fit any simple series. Instead, a hierarchical, genealogical arrangement can account for many of their similarities and differences. And after this sentence, he describes differences in rate of language change as an analogy for the evolution of organisms:

    If we possessed a perfect pedigree of mankind, a genealogical arrangement of the races of man would afford the best classification of the various languages now spoken throughout the world; and if all extinct languages, and all intermediate and slowly changing dialects, had to be included, such an arrangement would, I think, be the only possible one. Yet it might be that some very ancient language had altered little, and had given rise to few new languages, whilst others (owing to the spreading and subsequent isolation and states of civilisation of the several races, descended from a common race) had altered much, and had given rise to many new languages and dialects. The various degrees of difference in the languages from the same stock, would have to be expressed by groups subordinate to groups; but the proper or even only possible arrangement would still be genealogical; and this would be strictly natural, as it would connect together all languages, extinct and modern, by the closest affinities, and would give the filiation and origin of each tongue (Darwin1859, 422–423).

    Thus, Darwin’s discussion—in which language relationships are an example—raises two separate issues: (1) Whether similarities are described by seriation or hierarchy, and (2) Whether differences arise at a constant or changing rate. His readers would have been aware of historical linguistics, including the observation that no ”Great Chain” of languages could be constructed out of grammatical and phonological changes that are manifestly hierarchical. Likewise, they would be familiar with two ancient languages that had manifested long-term stasis in a small community of speakers.

    These points discussed by Darwin remain active elements of debate about the relationships of recent human languages and genes:

    • How much of linguistic diversity is attributable to the genealogical relations among languages, and how much derives from horizontal modes of transmission, such as the borrowing of words and syntactic patterns?
    • How often do populations undergo language shifts?
    • How much of human genetic variation is attributable to ancient population divergences, and how much to recent gene flow?
    • Are genes today the same as those present in ancient populations, or have they been replaced by selection or other demographic processes?

    Each point considers a way that the genealogy of languages may come to differ from genetic relationships of populations. Within a single generation, there is a very high concordance between language and genes: People inherit their genes from their parents, and they tend to learn the same language as their parents. But only a slight mismatch in each generation may, over the course of many generations, add up to a huge difference in the histories of the two systems. And if those differences are biased in some direction, instead of random noise, then they may not only obscure the real history; they may strongly point to a false one.

    To return to our example: French, Spanish, Portuguese, and Italian are not the only major Romance languages: there is also Romanian, for example. Romanians are genetically most similar to their neighbors in southeastern Europe, such as Serbs and Greeks—neither Romance speakers. In this case, population movements after the Roman period, such as the migrations of Slavic peoples, transformed the languages spoken in the Balkans without fundamentally altering the genetic similarities. And the persistence of Greek reminds us that the vast expansion of the Roman empire could not supplant some linguistic communities, and did not itself erase some earlier genetic patterns.

    So should we expect the ”perfect pedigree” of human populations to resemble the genealogy of languages? It would help if there were factors that tended to reinforce such similarities instead of destroying them. We have to go beyond the simple statistical comparison of language and gene trees, which over enough time really shouldn’t resemble each other very much—at least, if the deviations in their evolutionary patterns are just noise. Instead, we have to consider how demography shapes genetic and linguistic transfer.

    References


       Darwin C. 1859. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. John Murray, London.

  • Y chromosome migrations and African pastoralism

    Fri, 2008-08-22 12:12 -- John Hawks

    Sharon Begley covers a recent paper by Joanna Mountain on Y chromosome migrations and African pastoralists:

    The novel mutation arose in eastern Africa about 10,000 years ago and was carried by migration to southern Africa about 2,000 years ago not by Bantu-speakers, in whom the mutation is absent, but in speakers of what’s called the Nilotic language. These unsuspected ancestors first brought herds of animals to southern Africa before the Bantu migration.

    To me, this is one of the most useful applications of genetics to prehistory: finding migrations that have been largely obscured by later movements. But it's tricky, and faces a major problem in the fact that recent selection has also generated demographic forces. Of course, if the migrations were somehow connected to the selection, that would be less of a problem...

  • Were ancient Africans divided into small, isolated bands?

    Thu, 2008-05-08 12:05 -- John Hawks

    Last week when I wrote about the study of African mtDNA variation by Behar and colleagues, I focused on the issue of population size. To me, that must be the first parameter that we try to estimate, because the simplest relevant model of population history -- the Wright-Fisher model -- is described by that single parameter: the number of individuals. If we are going to evaluate evidence for population structure, we first must deal with the question of size.

    The claim in the press release is that the African population was divided into separate populations:

    Doron Behar, Rambam Medical Center, Haifa, said: "We see strong evidence of ancient population splits beginning as early as 150,000 years ago, probably giving rise to separate populations localized to Eastern and Southern Africa. It was only around 40,000 years ago that they became part of a single pan-African population, reunited after as much as 100,000 years apart."

    Is it true? Certainly that describes the model tested in the paper. But is it the right model? Is there evidence to justify that model as opposed to simpler alternatives?

    A real population may be structured in many ways -- by age, by caste or class, by space. If we have samples that are taken from different geographic locations, as in this study, it is natural to test hypotheses about structuring across geography. That's what Behar and colleagues did: they tested a hypothesis of panmixia, or random mating across space.

    Panmixia is the simplest model -- the null hypothesis -- about population structure. If everyone mates randomly, then there is no geographic structure. The population would be a single, unstructured gene pool. The paper refutes this model, demonstrating that people did not mate randomly across the geography of Africa during a certain period of time.

    But the question is: which model do we adopt once we have refuted panmixia?

    I rather like isolation-by-distance as a model for human population history. Isolation-by-distance (IBD) assumes that people travel some distance before they reproduce. It's a simple model -- the distance traveled may vary among individuals, but the variance in this value is the only parameter necessary to predict the structure of the population. IBD can explain quite a lot -- why people look like their neighbors, why intermediate populations on the map tend to look intermediate in allele frequencies, and why selected alleles take some time to disperse across space. It is generally consistent with what we know about hunter-gatherer demography. People tend to stay where they are, but a fairly large fraction move to marry into neighboring groups, and a smaller fraction go beyond the neighboring groups to marry further away. So I think this is the null hypothesis once panmixia is refuted. IBD is not a hypothesis of small, isolated bands -- it is a hypothesis of a geographically dispersed population with gene flow.

    The Genographic Project has done more than any other single project to extend the sampling of human populations. The paper by Behar and colleagues is a testament to that -- they are able to work with a broader and deeper sampling of mitochondrial variation in Africa than has yet been available. This is a credit both to the ambitious goals of the project and to today's genetic technology, which has made it possible to sequence more whole mitochondrial genomes on the project's budget. It is a great example of how spending money can circumvent some theoretical problems.

    Still, the Project likely wanted to maximize the effectiveness of its money, so it focused on sequencing only those variants that were underrepresented or rare in previous studies. From the Methods:

    Samples were chosen to include the widest possible range of Hg L(xM,N) internal variation on the basis of the previously available sequence analysis of the mtDNA control region and are, therefore, biased toward rare variants. In addition, we attempted to focus on branches (e.g., L0d, L0k), populations (e.g., Khoisan), and geographic regions (e.g., Chad) for which the current data were scant. Last, we preferred to sequence variants that the current literature suggested to be rare or anecdotal in any given geographic region (e.g., L0k in the Near East).

    Ummm... wait a minute. This is definitely not what you want to do if you're going to test hypotheses of population history. They have deliberately narrowed their sample in a way that distinguishes Khoisan from other peoples, and have excluded some proportion of variants already known to be common. We can predict, based on the sampling scheme alone, that Khoisan and other people ought to be more distinct that would be expected under a random sampling of each population, and certainly more so than expected under a random sampling of the African continent. This means that if the data were to reject IBD, we would have to examine whether that was because of the population history, or instead because of the sampling scheme.

    Do the data reject IBD? Well, we don't actually know from the paper. The study employs an island model, in which Khoisan and all others are assumed to represent either one panmictic population or two isolated ones. They devised a test based on permuting the number of lineages that they inferred to have existed during past time intervals. An island model with isolation of two populations predicts that each will share some gene lineages lacking in the other -- so-called "private" haplotypes. In contrast, two samples taken from a single panmictic population would each have a small proportion of "private" haplotypes, as well as some number of common haplotypes shared by both samples.

    So, the study (reasonably) tests the null hypothesis that the African mtDNA samples derive from a single panmictic population going back to the mtDNA coalescent. They estimate the date of this coalescent (based on their mutation rate model) as around 200,000 years ago, so this is a test of panmixia in Africa across this time period. They use a permutation test to evaluate the likelihood that some number of closely related lineages would all be private to the Khoisan population, under the hypothesis that they are randomly drawn from the African population as a whole. The lineages they examine are the ones they infer to have been present in the Khoisan population at various time intervals in the past -- again, based on their model of mutation rate. They can disprove panmixia across times after 100,000 years and before 80,000 years. Before this time, too few coalescent lineages are inferred to have existed to obtain a significant refutation of the test of panmixia. After 40,000 years, there are obvious shared lineages between Khoisan and other samples that could only have been shared by gene flow.

    I worry that there is a bias in this test. The authors applied it only to a period of time earlier than the coalescence times of recent shared lineages, but after the diversification of the ancient lineages that are not shared. In other words, there appeared to be a gap in the coalescence times of shared haplogroups. Usually, you would correct the test for multiple comparisons not only across haplogroups, but also across time periods. Given that we are considering a range of 150,000 years, across which there is evidence for gene flow both early and late in that history, what is the significance of the fact that we see few shared lineages at intermediate times? That will be less significant than the values reported in the paper, but how much less it is difficult to predict.

    In the end, what do the observations in the paper mean? In the simplest interpretation, either Africans were not random-mating after 100,000 years ago or regional selection differentiated southern and other African mtDNA pools.

    Did ancient Africans live in two isolated groups? I wouldn't say that: the authors didn't test that hypothesis.

    Did ancient Africans live in small bands scattered across the continent? Well, all ancient humans lived in small bands. The question of whether they were scattered is a question about the population size -- and as I showed last week, the population size during this period of time was not small. So we can imagine a population structure like recent historic hunter-gatherers -- with Africa possibly having something like the population size and structure of indigenous Australians.

    What's the bottom line? The results are consistent with isolation-by-distance in ancient Africans. That model, followed by a subsequent global expansion, has been around for a long time. In 1993, Henry Harpending and colleagues called it the "Weak Garden of Eden" model: a geographically structured African population that underwent an expansion and dispersal to other regions. Certainly for the mitochondrial DNA, this seems to be the model that presently best fits the data.

    What remains in question is how much of the subsequent spread of mtDNA was also reflected by spread of nuclear DNA haplotypes, and how much was induced by natural selection on mtDNA haplogroups. As I continue to write about population histories, we will meet this issue again.

    References:

    Behar DM, 14 others, and The Genographic Consortium (consortium again? Whoa). 2008. The dawn of human matrilineal diversity. Am J Hum Genet 82:1-11. doi:10.1016/j.ajhg.2008.04.002

    Synopsis: 
    Revisiting a paper that claims an African bottleneck, I examine the subject of population structure
  • Did humans face extinction 70,000 years ago?

    Fri, 2008-05-02 17:22 -- John Hawks

    That was the headline of many of last week's stories about the paper by Behar and colleagues, drawing upon the Genographic Project African mitochondrial DNA (mtDNA) data. Here's a quote from the National Geographic Society's press release:

    Previous studies have shown that while human populations had been quite small prior to the Late Stone Age, perhaps numbering fewer than 2,000 around 70,000 years ago, the expansion after this time led to the occupation of many previously uninhabited areas, including the world beyond Africa.

    And here's project director Spencer Wells' quote in the same release:

    Dr. Spencer Wells, National Geographic Explorer-in-Residence and Director of the Genographic Project, said: "This new study released today illustrates the extraordinary power of genetics to reveal insights into some of the key events in our species' history. Tiny bands of early humans, forced apart by harsh environmental conditions, coming back from the brink to reunite and populate the world. Truly an epic drama, written in our DNA."

    Well, that certainly sounds dramatic. But is it true?

    The paper itself does not provide any tests of the number of ancient humans indicated by the mtDNA phylogeny. The press release mentions "previous studies" that fix a small initial founding population for Africans, so I went looking through the paper to see which studies they had cited.

    I found this passage, which seems relevant:

    Different approaches were taken in the attempt to estimate the sub-Saharan Homo sapiens population size in different time frames.7

    OK, that seems like what I want -- estimates of population size in different time frames in sub-Saharan Africa. So I looked up reference 7, and found this:

    Hawks,J., Wang,E.T., Cochran,G.M., Harpending,H.C., and Moyzis,R.K. (2007). Recent acceleration of human adaptive evolution. Proc. Natl. Acad. Sci. USA 104, 20753-20758.

    D'oh!

    Now on the one hand, it is very gratifying to be recognized as an expert on the genetic demography of sub-Saharan Africa. I mean, we did work hard on that paper. But on the other hand, it seems like we might do a little better than that paper as an examination of the demographic history of sub-Saharan Africa.

    And the current paper by Behar and colleagues provides exactly the right kind of information to get that more detailed demographic history. So I've put together some notes here on how we can discover whether there was a population bottleneck 70,000 years ago in Africa, using the mtDNA evidence. I'm setting aside for the moment the question of population structure -- the "isolation" story that was also made in the press release for the paper. Population structure and size are not independent of each other, and we will have to consider how they interacted in African prehistory. But the first issue should be size, because our interpretation of size is based on relatively simple aspects of genetic variation (at its simplest, the first moment), while testing hypotheses about population structure requires higher-order comparisons.

    Assumptions

    Any estimate of ancient population sizes requires a number of assumptions. I'm about to be more explicit about these assumptions than any other analysis of ancient population sizes you're likely to have seen.

    Major assumption: Effective population size is relevant to the actual population size.

    I wanted to put that up front, to be perfectly clear that we are going to be dealing with an estimate of the rate of inbreeding, and not a direct signature of the number of individuals. The two may be related to each other, given other assumptions. But human genetics has what you might call a Vizzini problem: that word they keep saying, "effective population size," it does not mean what they think it means. The "effective population size" is often confused uncritically for the true number of individuals in a present-day or ancient population.

    The interpretation of a bottleneck (e.g., of the 2000-individual variety noted above) is based on the inference that a population in the past had a much higher rate of inbreeding than was true more recently. Inbreeding, as we will see below, has characteristic effects on the rate of coalescence of gene lineages. The rate can be converted to an estimate of the effective population size, given other assumptions.

    I'll have a much longer discussion of effective population size in a separate post. But for the time being, please note that our "population size" estimates are actually estimates of "effective population size," which for ancient humans must certainly have been smaller than the actual number of people, and may have been smaller than the actual number of people by an order of magnitude or more.

    So what other assumptions should we keep in mind?

    1. Mutation model and rate

    In this discussion, I will assume the mutation model and rates assumed in the paper by Behar and colleagues (2008). Those may be in error, and the errors will necessarily affect my estimates. But I don't have any particular reason to think the errors are in one direction or another -- in fact, I think that the dates given for lineages in the study look reasonable given the evidence from other sources, and are consistent with other recent studies.

    2. Panmixia

    We'll assume random mating. This assumption is certainly false for this sample of mtDNA, as demonstrated by Behar et al. But we will leave it for additional tests to see how much a more accurate hypothesis of population structure will affect our estimate of effective size.

    3. No selection

    That's an odd thing for you to see me write. It is actually very likely that some of the relevant mtDNA lineages have been selected during their history. We ignore that at our peril, since natural selection (which causes inbreeding) would invalidate the relationship between effective size (a measure of inbreeding) and the actual population size.

    The coalescent

    The coalescent is a statistical description of the genealogical relationships among a sample of genes taken from a population. As I will describe it here, the coalescent applies to a Wright-Fisher population model, a random-mating, discrete-generation population model described by a single parameter, N, the number of individuals. The concept of "effective population size" (Ne) relates the level of inbreeding in an actual population to that expected in a Wright-Fisher population of Ne individuals. Often we consider a diploid population with 2N gene copies (chromosomes), but because we will be considering mitochondria here, I will use the haploid formulation with N gene copies.

    In a Wright-Fisher population, the probability that two randomly chosen gene copies actually descend from a single gene copy in the previous generation is simply 1/N. Looking at the genealogy of the genes in a retrospective point of view, this is the probability that the two lineages will coalesce into a single ancestral lineage, and we can call this a coalescent event.

    Now, let's consider a sample if n gene copies. Any random pair of these copies might share a parent in the preceding generation, with probability 1/N. Collectively, if we ignore the chance that two or more coalescent events occur in the same generation, the probability of that n gene copies have only n-1 ancestors in the previous generation is:

    Conversely, the probability that the n copies have n distinct ancestors is 1 - Pr[nn - 1].

    Now we can ask, what is the mean number of generations between coalescent events? The probability of a coalescent event among n copies in generation t -- but not before generation t -- is:

    This probability declines as t increases, and is distributed as a geometric decay, so that the expected time that the genealogy has exactly n ancestral lineages is:

    The mean time from n to n-2 is the sum of the means from n to n-1 and from n-1 to n-2, and the total time to the coalescent of the entire sample is expected to be:

    This expression converges toward 2N as n increases to be very large. Because of this relation, the coalescence time of the sample serves as an estimator of the size of the Wright-Fisher population with similar genetic variation as the sampled population. The estimate of N thus derived is called the inbreeding effective population size -- "inbreeding" because it depends on the identity-by-descent among the sampled gene copies.

    This estimate is a kind of average effective population size, covering the time all the way back to the coalescent of the sample. It is therefore not very informative about the way that population has changed over time. If we want to know about the size of the African population between 70,000 and 100,000 years ago, say, we are going to have to find a way to estimate more precisely.

    Applied to African mtDNA

    If we know that the present-day sample of African mtDNA coalesced to n ancestral lineages 80,000 years ago, and further to n-x ancestral lineages 90,000 years ago, then we can arrive at an estimate of effective population size applying just to that time period.

    The mitochondrial equivalent of the Wright-Fisher population size N is the inbreeding effective number of females, Ne(f). Assuming this number was constant across the interval from tn to tn-x, then the female effective size is given by the equation:

    Helpfully, Behar et al. (2008) provide the number of ancestral lineages remaining at the beginning of various time periods, from 80,000 to 144,000 years ago, in their Table 1. They estimated dates using their model of mutations in the genealogy, so the numbers may be imprecise. But they make it very easy to derive estimates of the effective number of females in the ancient African population. Note that these estimate will have an error factor due to the fact that the coalescent events do not match up exactly with the time intervals given in Table 1. A substitution of the true estimates for the coalescent dates would eliminate this source of error; for this post I am sticking with the dates given in Table 1 instead of the supplementary table, just because the numbers of lineages in these intervals have been pre-tabulated and are simpler for readers to cross-check.

    At 80,000 years ago, the African sample includes 24 ancestral lineages; by 90,000 years ago it includes only 22. That means 2 coalescent events in this 500 generation span, yielding an estimate of female Ne(f) = 66,000.

    That's a high estimate, certainly very high compared to the notion that the human population was restricted to fewer than 2000 individuals at that time!

    But in a very small population of 2000 individuals, we would expect a very large fraction of the sampled ancestral lineages to have coalesced in a 500-generation span. We would expect 24 ancestral lineages to coalesce into only three or four across that time period. Instead, we find that the African sample had a low rate of lineage coalescence across that span: low inbreeding, and therefore a high effective population size.

    I picked the 80,000-90,000 year span first because it has relatively few coalescent events. Earlier and later spans have more coalescent events, and therefore the estimate of effective population size should be smaller.

    For example, the period from 90,000 to 100,000 years ago includes 8 coalescent events, from 22 to 14 ancestral lineages, leading to an estimate of Ne(f) = 9600. The period between 70,000 and 80,000 (less accurate because I have to read off Figure 1 instead of a table) appears to have 7 coalescent events (counting the multifurcation at the base of L3 as 6 separate events). That leads to an estimate of Ne(f) = 27,000.

    Which of these estimates is more accurate? We have to consider:

    1. More coalescent events in an interval should lead to a more accurate estimate, because the major component of error comes from the large variance in the distribution of coalescence times. Counting several events tends to average out extreme values. Likewise, the boundaries of these time slices are arbitrary dates in years, not coincident with coalescent events. Each slice includes some amount of time before and after the included coalescent events, causing more inaccuracy for small numbers of events. So the interval including only 2 events should result in a much less reliable estimate of effective size.

    2. On the other hand, the population could really have changed in size. If so, there's nothing impossible about all the estimates being accurate. The estimates provide a testable hypothesis of population history.

    I tend to think that these estimates by themselves are insufficient to document the fluctuation of the population across this span -- we would want additional evidence from other genes, preferably corroborated by archaeology. Also, we have to remember that population structure is affecting the inbreeding in a way that we haven't tested yet. So all in all, I would prefer to assume a constant size, and put the most reliable estimate together.

    We can do this by combining the span represented in the paper. Across the time period from 70,000 to 144,000 years ago, the African sample coalesces from 31 to 7 ancestral lineages, a total of 24 coalescent events. If the female effective population size were constant across that time span, this would lead to an estimate of Ne(f) = 17,000. Taking only the span from 70,000 to 100,000 years ago as a subsample yields an estimate of Ne(f) = 19,000 -- in other words, barely larger. The effective sizes seem consistent across the span from 70,000 to 144,000 years. There was no narrow bottleneck at that time, nothing associated with those "megadroughts," nothing associated with the Toba volcanic eruption.

    What does the large African effective size mean?

    Since those estimates are for the female effective size, we can double them to estimate the total inbreeding effective population size for Africa across this time period -- approximately 34,000 individuals.

    This is substantially larger than the long-term effective size as estimated from most autosomal genes. Autosomal genes have a much longer total coalescence time than the mtDNA -- in the Wright-Fisher model, they would have a coalescence time 4 times longer. And they do tend to preserve ancient diversity from much earlier than 200,000 years ago -- an average around 800,000 to 1 million years, and some much longer. These times correspond to an estimate of the long term effective size of Ne = 10,000.

    If the mtDNA is showing a larger effective size -- around three or more times larger than the long-term value for autosomal genes -- then I would hypothesize that the expansion of the African population had already commenced during this time period. This corresponds to the late Middle Stone Age, a time of exceptional flourishing of regional technological variation in Africa as well as the innovation of new tool types, hunting techniques (projectile weapons) and artistic expression. It makes sense for the African population to have been expanding at this time, not contracting or undergoing a severe bottleneck.

    I am currently working on testing this hypothesis with new SNP data from African populations. But from the mtDNA alone we have a partial indication that the expansion had begun by 144,000 years ago. If we look back to the earliest part of the tree, back to the inferred coalescent of the entire mtDNA sample -- a date this study places at 200,000 years ago -- we can estimate female effective size using the same method as above.

    The earliest branches of a gene genealogy should be the oldest, because we should wait much longer for a coalescent event among a few sampled lineages than among many. In fact, in a Wright-Fisher population, half the total time depth of a gene genealogy is expected to be taken up by the final coalescent -- the time span during which only two ancestral lineages remain. In the tree presented by Behar et al. (2008), the tree coalesces from three lineages to two at 180,000 years ago. All things being neutral and constant, we would expect the final coalescent to take another 180,000 years -- for a total time depth of 360,000 years. We can observe that that date would accord very well with a female effective size of 18,000 as estimated for later dates in the tree.

    But in this tree, the final coalescent takes only 20,000 years. In fact, the final few coalescent events seem to take substantially less time than would be expected under neutrality.

    We can quantify this: taking the population from 7 ancestral lineages at 144,000 years ago back to only 1 ancestral lineage at 200,000 years ago, we estimate Ne(f) = 1600. This looks like a small ancestral size for the sample -- but not at 70,000 years ago, but instead well over 140,000 years ago. If we consider the effective size estimate back to the next-to-last coalescent, 180,000 years ago, we obtain an estimate of Ne(f) = 2500 -- almost as small, but not quite. This indicates that the apparent increase in inbreeding characterizes not only the final coalescent but the final few. Pending further investigation, I would say this is probably a real increase in inbreeding.

    Is this a bottleneck? I think the null hypothesis is that this small effective size represents the same population size as estimated for the long-term autosomal value. A population effective size of 5000 for the mtDNA is probably not statistically different from a size of 10,000 for the autosomes -- given the small number of coalescent events we are including.

    But maybe this early high level of inbreeding does represent a bottleneck, or possibly the rapid differentiation of a mtDNA variant under selection in Africa after 200,000 years ago.

    How many people would this mean in the African late MSA?

    I'm going to post on effective population size in the next few days, because it is a complicated story. But if you happen to miss those posts, I can give a quick indication of what these numbers mean for the African population. The inbreeding effective size in ancient humans was likely somewhere between a third of the actual number of individuals anywhere down to a tenth or less.

    This means that an estimate of the effective size of 34,000 would likely indicate a true population size of 100,000 up to 300,000 individuals. This population may have occupied all the parts of Africa for which late MSA assemblages exist. The evidence for some population structure presented by Behar et al. (2008) (as well as earlier work by Gonder et al. 2007 and others) supports that model: A large and widely dispersed African population, exchanging genes and ideas.

    Other phenomena may cause much more inbreeding than expected in a Wright-Fisher population. Genetic draft, or pseudohitchhiking, is one; repeated colonization and extinction of small groups is another. If these factors were important in human evolution, they may have reduced the inbreeding effective size to much less than a tenth of the census population size. As yet, we have no good estimate of such factors, but they should have become less important as populations grew during the Late Pleistocene. So a total population size of 500,000-1 million is not out of the realm of possibility for the late MSA, especially if some currents within this population made more of a contribution to later populations than others.

    The late Acheulean/early MSA population appears likely to have been much smaller, only a third or less the size of that during the later MSA. Human occupation was widespread across Africa during that earlier time, so the expansion may represent either an increase in the density across the same range or the displacement of some populations (or genetic lineages) by others.

    I can't finish without noting that many of the key archaeological innovations of the late MSA are also found outside of Africa, in the European and Levantine Middle Paleolithic. The mtDNA evidence is necessarily telling us about African populations of this period, because the Eurasian mtDNA gene pool is much more recently derived. But the demographic changes occurring in Africa must have influenced neighboring regions as well, both through gene flow and cultural diffusion.

    The most interesting genes are the selected ones. This improved picture of demographic history in Africa greatly aids our ability to evaluate selection in recent African populations. As indicated by these data, there was no severe bottleneck in Africa within the last 100,000 years. The population was quite large (in evolutionary terms) across that entire span. This is reflected by the low level of linkage disequilibrium evident in the Yoruba HapMap and other African population samples. It means that long LD regions in these samples certainly do not derive from recent bottlenecks.

    How could this be wrong?

    Looking back to the underlying assumptions, there are several possible weaknesses in the analysis as it applies to the real prehistoric African population as opposed to an ideal Wright-Fisher population.

    1. Nonrandom sampling strategy. The research article has very selectively drawn mtDNA samples for further analysis by whole-genome sequencing. In particular, the study attempted to increase our knowledge of the diversity within the ancient L1 and L0 lineages. But ideally for this kind of demographic inference we would want a random sample of genes, not a highly selected sample.

    Still, I don't think that presents a significant bias in this case. The entire sample includes several hundred mtDNA copies, so these ancient lineages should have shown up regardless. Sampling diversity in this way would certainly alter our interpretations of recent divergences (by underestimating the true inbreeding rate) but this should have little effect on the ancient parts of the genealogy.

    2. Population structure. Behar et al. (2008) conclude that their Khoisan and other African samples could not have belonged to a single random-mating population across the span from 80,000 to 100,000 years ago, and possibly more recently.

    Generally speaking, population structure tends to increase genetic variation. But when we consider the rate of coalescent events, the story is more complicated. Partial isolation would affect the times of coalescent events in the sample across this span: Coalescent events within each sample would be somewhat more likely, while coalescent events including lineages from both samples would be much less likely.

    We can test the effect of population structure on our estimate of effective size, by considering each sample separately. The Khoisan sample includes very few lineages across the entire span (at most 4), so an estimate of female effective size from this sample will not be very reliable. Based on the reduction from 4 to 2 ancestral lineages from 80,000 to 144,000 years ago, we can estimate Ne(f) = 6400. The non-Khoisan African sample includes many more ancestral lineages across this range: from 20 to 5. This yields an estimate of Ne(f) = 11,000. The sum of these is hardly different than would be estimated from the whole sample under the assumption of random mating, so this aspect of population structure has relatively little influence on the estimate across this span. It remains possible that other aspects of population structure (such as finer geographic structuring) might be inflating the estimate.

    3. Selection. I am less concerned with selection in this portion of the African mtDNA history than I am in later times. Clearly, many mtDNA lineages have recently been increased or reduced in frequency by selection. We see rather large transitions in frequency between ancient and modern samples in some regions, we see health and other phenotypic consequences of mtDNA lineages in living humans, and we see a clear signature of rapid growth for some lineages indicative of selection.

    But across the span from 70,000 to 144,000 years ago, there is no sign of large-scale frequency changes among the sampled lineages. It is possible that some diversifying selection may be maintaining different lineages, but this seems sort of unlikely over 60,000 years or more, also considering the relatively steady rate of coalescent events across the span.

    However, the initial period of the genealogy, between 200,000 and 144,000 years ago, may have seen positive selection on a new mtDNA variant. That hypothesis may be tested by further comparisons with autosomal variability.

    Weak purifying selection on coding sites would tend to reduce the length of early branches compared to later branches, as these later branches would include some weakly selected mutations that would not be expected to survive over the long term. This phenomenon would depress our estimates of effective size early in the tree. This could be tested by comparing the synonymous/nonsynonymous mutation ratio for earlier and later branches.

    4. Mutation model. The estimates here depend on the mutation model employed by Behar et al. That model could be wrong.

    This is by far the largest potential source of error in the estimate, not only because adjustments to the rate of mutations would cause a linear response in the estimates of female effective size, but also because the evident changes in the genealogy over time will change in date as a result. If we increased the mutation rate estimated for coding mtDNA mutations by double, we would be looking at a bottleneck around 2000 individuals ending around 70,000 years ago -- pretty much what the press release had claimed. (It is interesting that you have to double the mutation rate to get to that value!)

    Now, the mutation model used in this paper does agree with results from other recent whole-genome analyses of mtDNA. And doubling the mutation rate would have other untenable consequences -- for instance, it would move the South Asian appearance of the modern human mtDNA variability down to less than 30,000 years ago, and the population expansion into the New World down to around 7000 years ago. Still, we might maintain that the later branches have lower effective mutation rates than the early branches.

    We should also keep in mind the possibility of error in the other direction, as well. If our estimate of mutation rate is too high, then effective size estimates are systematically too low. That would point to an earlier population expansion in Africa.

    Conclusion

    As you can see, these data allow a direct test of the hypothesis of a 70,000-year-old bottleneck in Africa, and they refute the hypothesis. The new data allow a powerful model of ancient African population size to be built, one that comes together with archaeological data to give us a really interesting picture of the early evolution of "modern" humans. The model can be tested with new, massive sets of information from single nucleotide polymorphisms, as well as a more detailed chronology of late MSA sites.

    References:

    Behar DM, 14 others, and The Genographic Consortium (consortium again? Whoa). 2008. The dawn of human matrilineal diversity. Am J Hum Genet 82:1-11. doi:10.1016/j.ajhg.2008.04.002

    Synopsis: 
    A study of mtDNA variation gets hyped as evidence of a bottleneck, but actually shows the opposite.
  • Diffusion versus migration in North African prehistory

    Wed, 2007-04-11 11:25 -- John Hawks

    There is a little disagreement in the letters of this week's Science, about mtDNA evidence for migrations from West Asia into North Africa. This is in reference to a paper late last year by Olivieri and colleagues, that argued that both North African and European populations traced their ancestry ultimately to Upper Paleolithic people of the Levant.

    That paper had this abstract:

    Sequencing of 81 entire human mitochondrial DNAs (mtDNAs) belonging to haplogroups M1 and U6 reveals that these predominantly North African clades arose in southwestern Asia and moved together to Africa about 40,000 to 45,000 years ago. Their arrival temporally overlaps with the event(s) that led to the peopling of Europe by modern humans and was most likely the result of the same change in climate conditions that allowed humans to enter the Levant, opening the way to the colonization of both Europe and North Africa. Thus, the early Upper Palaeolithic population(s) carrying M1 and U6 did not return to Africa along the southern coastal route of the "out of Africa" exit, but from the Mediterranean area; and the North African Dabban and European Aurignacian industries derived from a common Levantine source.

    That sets out the hypothesis: a migration from the Levant some 40,000 years ago spread these haplotypes into North Africa, at around the same time as in Europe.

    I took some notes on this paper at the time, because of the real paucity of any comparative information on the "Dabban" industry that is the proposed archaeological correlate of this migration. I'm not going to go into it here; let's say I was skeptical at the time, not least because geneticists have a way of assuming that industries are much more "real" or extensive than the archaeology allows. I still think that the archaeology is weak, but it is certainly possible that some significant population movement happened.

    The current letter and response are interesting, not because they differ in their interpretation of the haplotype distributions (which they do) but because their arguments are almost entirely in terms of archaeological and linguistic comparisons.

    Forster and Romano argue that haplotypes can't really provide evidence for early population movement, because a relatively late migration could have carried much older haplotypes along with it, or they may have entered North Africa by diffusion without any major population movement.

    They argue that archaeological and linguistic evidence favor more recent migration of populations as a major mechanism for the movement of gene lineages:

    Three points lead us to believe that our younger chronology for the back-migration into northern Africa still merits consideration. First, the mtDNA trees reconstructed by Olivieri and colleagues are less than conclusive because they consist of phylogeographically mixed branches, which cause uncertainty in identifying the relevant founder nodes for genetic dating. Second, in our view the fact that the North African mtDNA marker types still correspond so closely with the Afro-Asiatic language zone argues against the existence of that correlation for tens of thousands of years. Third, cave art in the Sahara shows that in Neolithic times (around 5000 B.C.), the population of the Sahara was still of sub-Saharan African ancestry (see figure), whereas "Europoid" figures documenting the arrival of west Eurasians appear later in the cave art record (3).

    In response, Olivieri and colleagues claim that the recent Holocene events -- although relatively well documented, are insufficient to explain older haplotype distributions, and that archaeological evidence also supports their point of view.

    The principal problem with great syntheses of languages, genes, and figurines (or pots) is that they lump together different migrational and cultural processes and especially overstretch recent events of the Holocene, thereby downplaying or swamping the genetic signals that point to much earlier events of the Pleistocene (1, 2).

    Personally, I don't know which hypothesis is correct; it seems to me that mtDNA haplotypes are never going to answer this kind of question. The question is about mechanisms of genetic dispersal.

    Both hypotheses more or less agree about the current distribution of haplotypes. I say "more or less" because in fact, both hypotheses are interpreting these distributions post hoc -- they're not really testing hypotheses, they're just offering archaeological arguments in support of migrations they assume the mtDNA is documenting. Remember when I mentioned the "reality" of these archaeological industries? This is what I meant.

    Regardless of what we think about the archaeology, these haplotype distributions still deserve some explanation. How did a 45,000-year-old haplotype spread from its apparent origin in the Levant across North Africa? Did it get there slowly and gradually by diffusion? Or did it come all at once in a long-distance movement of a group of people -- a so-called "folk migration"?

    This question, diffusion versus folk migration, is of course a very old one. It remains central to all these considerations of recent genetic variation.

    A couple of weeks ago, I had the extraordinary privilege of hearing two of the real leaders in paleoanthropology having precisely this argument. How much of recent evolution has been driven by folk migration, and how much by the diffusion of genes into standing populations?

    These letters are a good illustration of the question, drawn out into a particular case. We'll be hearing more about this soon, I think.

    References

    Forster P, Romano V. 2007. Timing of a back-migration into Africa. Science 316:50-53. doi:10.1126/science.316.5821.50

    Olivieri A, and 14 others. 2007. Timing of a back-migration into Africa. Science 316:50-53. doi:10.1126/science.316.5821.50

    Olivieri A, and 14 others. 2006. The mtDNA legacy of the Levantine early Upper Palaeolithic in Africa. Science 314:1767-1770. doi:10.1126/science.1135566

    Tags: 
  • The Templeton review

    Fri, 2006-02-03 11:50 -- John Hawks

    The Yearbook of Physical Anthropology has a new review of the genetic evidence for modern human origins by Alan Templeton. The paper is 27 journal pages, and they are full of detail -- especially after the section describing basic coalescent theory.

    I'll be going through this paper in the next few days and highlighting some of the issues it raises. In the meantime, here are some quotes from the Washington University press release:

    "The 'Out of Africa' replacement theory has always been a big controversy," Templeton said. "I set up a null hypothesis and the program rejected that hypothesis using the new data with a probability level of 10 to the minus 17th. In science, you don't get any more conclusive than that. It says that the hypothesis of no interbreeding is so grossly incompatible with the data, that you can reject it."

    ...

    The new data confirm an expansion out of Africa to 700,000 years ago that was detected in the 2002 analysis.

    "Both (the 1.9 million and 700,000 year) expansions coincide with recent paleoclimatic data that indicate periods of very high rainfall in eastern Africa, making what is now the Sahara Desert a savannah," Templeton said. "That makes the timing very amenable for movements of large populations through the area."

    Found via Dienekes, who seems to be one step ahead of me this week!

    References:

    Templeton AR. 2005. Haplotype trees and modern human origins. Yrbk Phys Anthropol 128(S41):33-59. DOI link

Pages

Subscribe to migrations

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.