Number of New World founders

Jody Hey (Rutgers) has a paper in the current (vol 3, no 6) PLoS Biology providing estimates of the number of founders of the initial New World human population, along with the approximate date of their arrival (thanks to Dienekes for the link). These estimates were based on analysis of nine genetic loci, including the usual suspects (beta-globin, mtDNA, NRY, Xq13.3, ZFX) and some less familiar ones (ATM, APXL, TNFSF5, RRM2P4).

This paper follows a fundamentally good idea: that evaluation of demographic characteristics of ancient populations must depend upon analysis of multiple unlinked genetic loci. Demography is expected to exert consistent effects on every genetic locus. As long as this is true, adding more loci to an analysis should allow a fuller picture of demographic history. In a nutshell, additional data should allow the statistical testing of more and more parameters relating to population history. So trying to elucidate the founding of the New World by using only a single genetic locus (such as mtDNA) can only test models with one or two parameters.

Such examinations have most commonly attempted to determine the time that New World founding populations arrived. This is done in one of two ways. "Founder analysis" considers the most recent common genetic ancestor for a locus in two populations, and uses the time of that genetic ancestor to infer an upper bound on the time the populations were isolated. Alternatively, examination of frequency spectra of mutations at a locus in New World populations may allow an estimate of the initial time of population expansion, which is generally assumed to correspond to the founding population. These two techniques have been applied most broadly to evidence from the nonrecombining Y chromosome and mtDNA, to varying results.

Hey puts such research into a broader perspective:

For complex historical subjects such as the colonization of the Americas, there are many ways that models can be constructed, examined, and compared. One approach is to develop a portrait based on a particular kind of data, such as linguistic [6], skeletal [14], or archaeological [15] data, or on DNA sequence data from a particular portion of the human genome such as the mitochondria [4,16-19] or the Y chromosome [9]. Yet each source of data has unique sources of variation. In the case of genetic data there occurs a large stochastic variance of the coalescent history among genes that causes different loci to vary widely in levels of genetic variation and in apparent patterns of relationships among populations [20-22]. This stochastic variance is sometimes overlooked, for example in discussions of the histories of the individual DNA sequence haplotypes [18], and it is easy to underestimate the many possible histories that are consistent with a finding that haplotypes are shared by different populations [23-25] (Hey 2005:e193).

The most accentuated point in the article (as reflected by the title) is the finding that the current population of the New World may have been founded by an effective number of fewer than 100 individuals:

In contrast to the Asian population, the New World population parameter (theta2) is much smaller, and suggests a recent New World effective population size of less than 1,000 (Table 3). However, given the estimate of the effective size of the founding New World population (about 70; Table 4), the overall picture is of a nearly 10-fold growth in the New World effective size since t (ibid).

The conclusion is a simple scenario:

Taken together, the analyses in this study suggest a recent founding of the New World Amerind-speaking peoples by a small population of effective size near 70, followed by population growth in the New World....The analyses reveal very broad distributions for migration parameters, and although the peak locations suggest that gene flow has been fairly high (2Nm values greater than 1; see Table 3), the estimated probabilities of migration rates having been zero are also high (Figure 3G and 3H). Also, because Eskimo-Aleut and Na Dene speakers were not included in this study, the question of separate migrations for these groups has not been addressed [3] (ibid).

But if you think a different scenario for the founding of New World populations is more likely, don't lose heart. There is much in this paper to reveal the limitations of genetic information in testing hypotheses of New World origins.

Hey's discussion notes the limits on his method, including the assumptions that the method makes. It is worth looking critically through the results to see these limitations in action. For example, although archaeological evidence shows that humans arrived in the New World earlier than 12,000 years ago, the maximum likelihood value for this time according to Hey's estimates are only around 6350 years ago. The confidence interval on this date is not given (nor is it obvious from the shape of the likelihood distribution) but would appear to include a range from less than 2000 years ago to over 20,000 years ago.

And these estimates are for one particular assumption of the shape of population size changes in New World populations. If no change is assumed, the maximum likelihood estimate of founding time is earlier than 40,000 years ago. This assumption of constant population size is almost certainly wrong. But the problem is, how can we justify one particular assumption about the style of population growth? In fact, it is precisely this kind of assumption that we would like to derive as a parameter estimate from the data. In this paper, Hey arrives at estimates by assuming that very ancient times of origin could not possibly be correct. This assumption may be validated by archaeological evidence, but that makes it no less arbitrary from a genetic perspective. Indeed, if genetics actually provided a more ancient date as an estimate, there might be good reasons to believe it. For example, there is no conceptual barrier to the hypothesis that the founding New World population actually was isolated from the ancestral Asian population at some relatively early date (e.g., in Beringia), only later to enter the Americas. In such a case, genetics would indicate an ancient population split, while archaeology would show evidence of a recent entry.

What about the estimate of 70 founders? This estimate comes from an estimate of the effective size of the ancestral Asian population (placed at around 9000 individuals) and the proportion of the ancient Asian population that split to found the New World population (placed at less than 0.01). Did one out of a hundred ancient Asians move into Beringia and further to the New World? Quite possibly, who knows? Were there only 9000 people in Asia when the founding happened -- at a time estimated by these genetic data at 6350 years ago? Certainly not.

Consider the ancient Asian population at the time that people first moved into Beringia, perhaps as early as 25,000 years ago or earlier. The evidence about this population applies most strictly to the ancestors of the samples used in the study, which are mostly drawn from China, present-day Siberians, and Korea. Even at the early date of 25,000 years ago, the presence of people in northern Siberia is sufficient to demonstrate much larger populations further to the south, in China. The presence of such larger populations is corroborated by the evidence for widespread colonization out of Southeast Asia into island Melanesia, as well as the colonization of the Japanese islands as far as Okinawa by shortly after 20,000 years ago. We do not know how many Asians there were at this time, but it is almost certainly many times more than 9,000 -- I would guess more than an order of magnitude larger.

We could play games with numbers, assuming that some proportionality between effective and census population numbers existed. Or we could assume that all the parameters but the number of founders were precisely known, and attempt to find the range of variation permissible in that one unknown parameter. But considering all the sources of error, there is no way that any such estimate could have validity. I think the short answer is the most correct in this case: this method tells us essentially nothing about the founding population of the New World.

Would more data from different loci help?

Obviously from a purely statistical perspective more loci means more power, and should allow a greater ability to resolve more population parameters.

But the problems with the current data are not easily addressed by adding more loci or samples. No estimate will have meaning in terms of real population numbers until the relationship between effective size and census size is satisfactorily worked out. In the context of the founding and subsequent growth of a major continental population, this is a major problem. More loci will not answer why these nine give a date much younger than the initial population of the New World could possibly have been founded. This young date must be reflecting a low level of divergence between American and Asian populations for some of the loci used here, but which ones? And why so low? Is there selection on some of these loci?

On the subject of selection, there are clear reasons to think that some of the loci used here may have been under selection in the global population, if not in the New World population. Frequent readers will note that I find selection almost everywhere I look. I tend to be very cautious, because demographic modeling is very sensitive to the assumption of neutrality -- the fact is that natural selection is far more powerful than genetic drift, and can easily throw off the results of an analysis. Hey says this: "Regarding natural selection, the study was limited to loci that had not individually been reported to show evidence of directional or balancing selection," which is a bit misleading, since beta-globin is well known as the primary example of balancing selection in humans, while mtDNA and the NRY violate most tests of neutrality. Although it is unstated, the assumption here is that these loci have not experienced selection within New World populations, which may or may not be true, but is at least problematic. Hey performed an HKA test for selection on eight of the loci, and found a near-significant result (p=0.054). Considering the different loci may exhibit selection in opposite directions (balancing vs. directional), this is not a vote of confidence in the data. The key question is whether evidence of selection globally necessarily affects analysis of New World populations only, or whether there actually has been selection on one or more of these loci in those New World populations. Since some of the model parameters apply to the demography of the ancestral Asian population of origin, I don't think we know the answer to either question.

So I think these issues will need to be answered before genetics will give a clear answer about New World origins.

UPDATE (5/25/05): I wrote the post in a bit of a hurry, and upon reflection I thought a couple of things could be added. My point isn't that genetics hasn't told us anything about New World origins. In fact, I think that founder analysis has added some significant constraints on the date that at least some of the founding populations left Asia. Nor do I think that this is a conflict between archaeology and genetics. The two fields do not currently present alternative hypotheses of origins; instead, they both provide evidence that may test models of the founding population. My feeling is that the archaeology right now provides evidence that is much stronger than anything provided by genetics -- strong enough that it absolutely excludes all but a relatively narrow range of hypotheses.

Consider what archaeology and common sense alone tell us about the founding populations. They must have first arrived earlier than 12,000 years ago, possibly substantially earlier, but certainly not earlier than 50,000 years and probably much more recently. They may have already have been separated for a substantial time from contemporary Siberians, since the geographic extent of Beringia may have put a lot of distance between them. They must have been a relatively small population compared to contemporary Asians. This migration was not the voortrekkers crossing the Vaal; it was a relatively small population of hunter gatherers dispersing into a vast new continental land mass. This means that the population must have begun small and expanded greatly, probably exponentially. There may have been more than one dispersal, with more than one population source.

So given these constraints, what has the genetic analysis in this study added? Does the relatively recent estimate of the date of origin mean that people arrived on the more recent end of the possible range of values? No, because the recent number depends on specific, unverified assumptions, because it lacks any confidence limits, and because it gives an estimate that makes no archaeological sense in any event. Does the estimate of 70 founders add to our knowledge that the founding population was probably small? No, because again there are no confidence limits, and more critically we have no idea what the estimate means in terms of real numbers of people.

In other words, we may know what these numbers are consistent with: they are consistent with some possible hypotheses of origins and additionally a wide range of hypotheses that have been strongly refuted by archaeology. We have no idea what archaeologically plausible hypotheses the numbers are inconsistent with. So they haven't tested anything. Maybe they are the best estimates possible. The demographic model appears sound, the problem is that the data do not allow more precise estimates of the many parameters. Personally, I am skeptical that adding more loci will help very much, since each locus adds the potential of unrecognized problems extraneous to the demography. If this is the future, then genetics are unlikely to provide as much resolution on this problem as archaeology already has.


Hey J. 2005. On the number of New World founders: a population genetic portrait of the peopling of the Americas. PLoS Biol 3:e193.