Playing games with dates

Two papers in the in the current (May 13, 2005) Science and an accompanying commentary focus on the mtDNA evidence relating to human dispersals into South and Southeast Asia. One paper, by Vincent Macaulay (University of Glasgow) and colleagues provides mtDNA sequences from aboriginal populations of the Malay peninsula.

Here's the abstract:

A recent dispersal of modern humans out of Africa is now widely accepted, but the routes taken across Eurasia are still disputed. We show that mitochondrial DNA variation in isolated "relict" populations in southeast Asia supports the view that there was only a single dispersal from Africa, most likely via a southern coastal route, through India and onward into southeast Asia and Australasia. There was an early offshoot, leading ultimately to the settlement of the Near East and Europe, but the main dispersal from India to Australia 65,000 years ago was rapid, most likely taking only a few thousand years (Macaulay et al. 2005:1034).

The second paper, by Kumarasamy Thangaraj and colleagues, covers the mtDNA variation of Andaman Islanders. The abstract is less informative; here's the conclusion:

Our data indicate that two ancient maternal lineages, M31 and M32 in the Onge and the Great Andamanese, have evolved in the Andaman Islands independently from other South and Southeast Asian populations. These lineages have likely been isolated since the initial penetration of the northern coastal areas of the Indian Ocean by anatomically modern humans, in their out-of-Africa migration 50 to 70 thousand years ago. In contrast, the Nicobarese show a close genetic relation with populations in Southeast Asia, suggesting their recent arrival from the east during the past 18 thousand years (Thangaraj et al. 2005:996).

Nicholas Wade has an article about the paper in the New York Times. Here's a great exchange:

There is no evidence of modern humans outside Africa earlier than 50,000 years ago, said Dr. Richard Klein, an archaeologist at Stanford. Also, if something happened 65,000 years ago to allow people to leave Africa, as Dr. Macaulay's team suggests, there should surely be some record of that in the archaeological record in Africa, Dr. Klein said. Yet signs of modern human behavior do not appear in Africa until 50,000 years ago, the transition between the Middle and Later Stone Ages, he said.
"If they want to push such an idea, find me a 65,000-year-old site with evidence of human occupation outside of Africa," Dr. Klein said.

Of course, there is no chance whatsoever that a 65,000 year genetic date is significantly different from 50,000 years. Both the current papers follow a long and dishonorable tradition of not providing any confidence interval for their date estimates. Both papers do provide standard errors -- without explanation, they report different standard errors for the same clades -- but standard errors do not say anything about the real uncertainty in the age estimates. It is not all that easy to figure out what the full range of uncertainty in the estimates may be, since it owes not only to the distribution of uncertainty in coalescence times (which is assymmetrical and skewed toward the high end) but also in uncertainty coming from assumptions like the human-chimpanzee divergence time and adequacy of the sampling scheme. Based on the standard errors alone (ranging around 7,000 years for the clade ages related to the "dispersal"), the 63,000 year date is not significantly different from 50,000 years. The true range of uncertainty is probably far greater.

Now, why wouldn't a reader of the papers know anything about this range of uncertainty? Not only do the papers not report confidence intervals in the text, but also the entire presentation of the data is relegated to the supplementary information online, which for both papers is substantally longer than the text. These are not just data tables, but relatively full literature reviews (as full as they get for these papers) and methods sections. This is a disturbing new trend for Science: reporting only results in the journal, and putting the information necessary to evaluate the results into a secondary source. What if you are asked by a reporter to comment on an article, and they send you an embargoed draft? You don't know enough about the paper even from the full text to evaluate it.

I've been thinking today about "media packaging" of research results, and this strikes me as a pretty stark example. Two papers on a single theme, packaged together with a commentary. Both of the papers make relatively cautious (although not cautious enough in my estimation) interpretations; the commentary is more daring. Media reports focus on the issue raised in the commentary, quoting other scientists who haven't seen enough of the research to be informedly critical. Good science reporters know enough to be skeptical; look where the preceding exchange goes:

Geneticists counter that many of the coastline sites occupied by the first emigrants would now lie under water, because the sea level has risen more than 200 feet since the last Ice Age. Dr. Klein expressed reservations about that argument, noting that people would not wait for the slowly rising sea levels to overwhelm them but would build new sites farther inland.
Dr. Macaulay said genetic dates had improved in recent years, now that it is affordable to decode the whole ring of mitochondrial DNA, and not just a small segment.
But he said he agreed "that archaeological dates are much firmer than the genetic ones" and that it was possible his 65,000-year date for the African exodus was too old.

So in other words, there's no result here. But this only applies to the young end of the range of dates for possible "Out of Africa" migrations -- the end that Richard Klein has been so active in examining. There is no word at all about the older end of the time range in any of the articles, commentaries, or press reports. But just as there is no chance these dates aren't significantly different from 50,000 years, there is likewise no chance they are significantly different from 80,000 years, or probably even 100,000 years. Let's cover the scenario for the initial Out-of-Africa colonization:

The very similar ages of haplogroups M, N, and R indicate that they were part of the same colonization process [see (23)]. This most likely involved the exodus of a founding group of several hundred individuals (27) from East Africa, some time after the appearance of haplogroup L3 85,000 years ago, followed by a period of mutation and drift during which haplogroups M, N, and R evolved and the ancestral L3 was lost. Although the details of this period remain to be elucidated, the next stage is much clearer. The presence in each region of the same three founder haplogroups, but differentiated into distinct subhaplogroups, indicates that there was a rapid coastal dispersal from 65,000 years ago around the Indian Ocean littoral and on to Australasia (Macaulay et al. 2005:1036).

Thus, the initial timing of this putative migration is bounded on the lower end by the 65,000 year dates, and on the upper end by the 85,000 year estimate for haplogroup L3. The standard error on this estimate as reported in the supplementary information is 8,400 years, which means that this date could easily be 20,000 or more years higher than it is. So an ancestry by Skhul and Qafzeh is not excluded by these analyses, either. But the paper does not even raise this issue. More strikingly, the commentary puts the two facts in adjacent sentences without adding them together:

Early humans even ventured out of Africa briefly, as indicated by the 90,000-year-old Skhul and Qafzeh fossils [HN9] found in Israel. The next event clearly visible in the mitochondrial evolutionary tree is an expansion signature of so-called L2 and L3 mtDNA types in Africa about 85,000 years ago, which now represent more than two-thirds of female lineages throughout most of Africa. The reason for this remarkable expansion is unclear, but it led directly to the only successful migration out of Africa, and is genetically dated by mtDNA to have occurred some time between 55,000 and 85,000 years ago (Forster and Matsumura 2005:965).

Ignoring this one, the paper leaves us with these options:

Three possible hypotheses can be distinguished using these data. If modern non-Africans are descendants of populations that dispersed along both northern and southern routes, then mtDNA lineages belonging to relict populations (including Orang Asli, Papuans, and Aboriginal Australians) should diverge from founder types that are distinct from those leading to the main continental Eurasian groups. If there were just a single dispersal, then all non-African populations should diverge from the same set of founders, which would coalesce to 45,000 to 50,000 years ago if the Levantine corridor model were correct, or 60,000 to 75,000 years ago if they were all the result of the proposed earlier single southern route (4). At this time, a northern passage was most likely blocked by desert and semi-desert (26) (Macaulay et al. 2005:1035, citations therein).

Okay, hmm...let me get this straight: modern humans had uber-technology to float across the Red Sea, kill mammoths, and outcompete every archaic human in every ecology they had occupied for a half million years or more, but they couldn't manage to move in 10,000 years across a semi-desert? And let's not forget the "modern" humans that get thrown under the bus in this scenario -- Skhul, Qafzeh, Liujiang -- either they don't qualify as "really" modern, or they've been misdated. Oh, and, there is the slight problem that no other locus provides any evidence of this pattern of population movement -- even the Y chromosome -- and many are not consistent with it.

There is a strategy to deal with these evidentiary problems:

Firm archaeological age estimates are more recent [more ancient dates are simply disregarded in this paper] -- 50,000 years for Australia and ~45,000 years for southeast Asia -- but early evidence may have been lost to sea level rises. Moreover, human populations may then have diffused from the coast into the continental interiors more gradually, leaving a greater archaeological signature on the landscape as they grew in size (Macaulay et al. 2005:1036).

This is always possible, but it can't be a good sign when your hypothesis depends on the same logic as the aquatic ape theory.

A short word about the bottleneck

From the commentary:

One intriguing question is the number of women who originally emigrated out of Africa. Only one is required, theoretically. Such a single female founder would have had to carry the African L3 mtDNA type, and her descendants would have carried those mtDNA types (M, N, and R) that populate Eurasia today. Macaulay et al. use population modeling to obtain a rough upper estimate of the number of women who left Africa 60,000 years ago. From their model, they calculate this number to be about 600. Using published conversion factors, we can translate this estimate into a number between 500 and 2000 actual women. The authors' preferred estimate is several hundred female founders. All such estimations are influenced by the choice of parameters and by statistical uncertainty; hence, it is understood that the true number could have been considerably larger or smaller. Improved estimates will involve computer simulations based on informed scenarios using additional genetic loci (Forster and Matsumura 2005:966).

Gee, are there any other genetic loci that have been examined with this issue in mind? Do any of them agree with a bottleneck reducing human population size to "between 500 and 2000 actual women"? Considering that the answers to these questions are, "yes, many have been examined" and "no, most of them don't agree with that number," what does the full pattern of genetic data say about mtDNA variation?

Earth to Science: what about selection?

Along with the failure to provide confidence limits on estimates, both papers and commentary join another long and dishonorable tradition of completely neglecting the possibility that mtDNA has been affected by natural selection.

There are several ways that selection could affect the interpretations of these papers. My own inclination is to think that an episode of positive selection on human mtDNA explains the its recent coalescence date and the appearance of a rapid dispersal out of Africa. This would be the pattern expected if an advantageous allele appeared within the African population and spread from there through a global human population. The strength of this explanation is that it accounts for why mtDNA looks so different from most autosomal genes in its pattern of variation (c.f. Templeton 2002; Wall and Przeworski 2000).

Now, you don't have to buy into this hypothesis of positive selection to understand that some kind of selection may have severely weakened the ability of human mtDNA to accurately portray ancient population movements. Purifying selection alone would affect these estimates, particularly since they are based on coding region sequences. At the least, small isolated populations may be observed to have a higher effective rate of mutations because of an increased effect of genetic drift against weak purifying selection. At the worst, different environments affecting human groups in the past may have had differential selective effects, with unpredictable effects on the mtDNA phylogeny.

Is this a serious problem? On the one hand, even the maximum degree of purifying selection affecting nonsynonymous substitutions probably affects the apparent diversity of the coding region of the global mtDNA by a factor of two or less. So on the surface, although this might be a fairly big problem, it is somewhat limited in its possible impact. On the other hand, this kind of selection almost certainly occurred. Selected sites in the coding region of the mtDNA are increasingly recognized and known to be common within many human populations. Within just the last week, there has been a new announcement of a common mtDNA variant (haplogroup U) related to cancer risk, and one survey associating mtDNA genotypes with performance in elite athletes (Niemi and Majamaa 2005). Mitochondrial dysfunctions (not all caused by mtDNA genes) are known to increase the risk of Alzheimer's, Parkinson's, ALS, and other neurodegenerative disorders (Zhu 2004). Purifying selection has been an important force on the global distribution of mtDNA (Wise et al. 1998). The presence of mutational variants with such a high selective cost may suggest countervening selective advantages to thse mutations that have not yet been discovered -- in other words, suggesting that not only purifying but also balancing selection may be affecting the frequencies of these mtDNA alleles. So the problem is likely serious, and its full extent is not yet known.

A basic cautionary attitude would indicate that it is no longer tenable to assert that the history of the mtDNA of a population is the same as the history of the population. There are just too many unaccounted variables to believe that methods that assume complete neutrality for mtDNA are giving accurate dates for population movements, expansions, or other events. My favorite quote on the issue is from Razib at Gene Expression:

I am not totally discounting all elements of the narrative pressed forward above, but, genes serve as flexible instructions to shape and mold a human's phenotype, the lineages are all their own, and the concordance of the gene lineages with "individual" lineages, let alone populations, is I think an often tenditiously assumed axiom in many of these research papers. The authors above make the identity of genes:individuals, groups of genes:groups of people. Working back over 2,000 generations with such assumptions is I think a somewhat sketchy proposition unless your variables are controlled for (eg; at least the Andamans are islands, which are noted for fostering relatively genetically isolated people. For example, Sardinia is situated in the rather populous Mediterranean, but it often is an outlier in Principal Component Analysis diagrams of European genetics). Or, if your facts are so crystal clear, the narrative so compelling, the predictions so spot on, than the model is simply self-evidently true. But at this point I think that Recent-Out-of-Africa has depleted all the parsimony capital it had saved up, at least from where I stand.

Selection on mtDNA is not a moribund backwater of research; it is being pursued by groups studying some of the highest-profile diseases. We don't know yet how selection may have affected the full pattern of human variability, but we know enough to know that the answer isn't zero. So why have human geneticists studying global mtDNA variability completely ignored the issue? And what will it take for them to hear the message over the buzz of their own voices?

References:

Forster P and Matsumura S. 2005. Did early humans go north or south? Science 308:965-966. Science Online

Macaulay V, et al. 2005. Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. Science 308:1034-1036. Science Online

Niemi A-K and Majamaa K. 2005. Mitochondrial DNA and ACTN3 genotypes in Finnish elite endurance and sprint athletes. Eur J Hum Genet advance online publication. Nature Online

Templeton AR. 2002. Out of Africa again and again. Nature 416:45-51.

Thangaraj K, et al. 2005. Reconstructing the origin of Andaman Islanders. Science 308:996. Science Online

Wall JD and Przeworski M. 2000. When did the human population size start increasing? Genetics 155:1865-1874.

Wise CA, Sraml M, and Easteal S. 1998. Departure from neutrality at the mitochondrial NADH dehydrogenase subunit 2 gene in humans, but not in chimpanzees. Genetics 148:409-421.

Zhu X, Smith MA, Perry G, and Aliev G. 2004. Mitochondrial failures in Alzheimers disease. Am J Alzheimer's Dis Other Dement 19:345-352.