The plaque record of human migrations

I let this story pass by last week, but I can't resist the version with the excellent headline:

'Ancestral eve' was mother of all tooth decay
A New York University College of Dentistry (NYUCD) research team has found the first oral bacterial evidence supporting the dispersal of modern Homo sapiens out of Africa to Asia.
The team, led by Page Caufield, a professor of cariology and comprehensive care at NYUCD, discovered that Streptoccocus mutans, a bacterium associated with dental caries, has evolved along with its human hosts in a clear line that can be traced back to a single common ancestor who lived in Africa between 100,000 and 200,000 years ago.

I've seen a few people comment on this one, but somehow nobody ever bothers to read the original research. Here it is, in the Journal of Bacteriology.

Let's just say that the press accounts vastly oversimplify the observations.

I think it makes a nice case study for the difficulty of linking pathogen evolution with human population movements. There is no question that humans and their pathogens have coevolved over time. They respond to our evolutionary changes, and we have responded to theirs. But coevolution is a much more complicated relationship than one-to-one correspondence.

To start with, S. mutans is an oral bacterium that is a major cause of dental caries. The incidence of caries has increased vastly over the course of the last 10,000 years, predominantly after the adoption of agricultural diets. Hence, S. mutans is an unlikely candidate for neutral evolution. At a minimum, its ecology radically changed in recent human populations. Samples in the study were taken from caries-active individuals, meaning that the strains included are all active pathogens.

The paper notes that S. mutans is "generally transmitted vertically, mother to child." This generalization underlies the argument that it may mark the population history of its hosts. But the actual transmission ecology of oral Streptococcus strains seems to be more complex. Individuals are hosts to multiple strains. Early acquisition of S. mutans strains depends on diet; particularly sugar consumption. Children with early dental caries harbor more strains of S. mutans than children without caries (Alaluusua et al. 1996).

This study in particular analyzed not all S. mutans variants, but a particular set of strains that include a "cryptic" plasmid. Plasmid transmission has been a focus of epidemiological research in Streptococcus because it is a major mechanism for the horizontal transmission of virulence and drug resistance traits. The plasmid-containing strain was studied here because it contained vastly more genetic variation than the non-plasmid carrying strains of S. mutans. The current paper notes that

Earlier work (Caufield et al. 1982) showed that plasmid-containing S. mutans strains are often inherited familially -- that is, relatives of children carrying plasmid-positive strains have a higher chance of carrying plasmid-positive strains themselves. But that study found a relatively high proportion of cases where children had a plasmid-positive strain but no tested family members carried such a strain. In those cases, the plasmid-positive strain must have come from somewhere other than the tested relatives.

So, to review the ecology and transmission of S. mutans: people carry multiple strains of the bacterium; people with active caries carry more strains than other people; strains tend to be transmitted within families, but sometimes are transmitted horizontally in some other (unknown) pattern; the disease ecology radically changed in human populations with agricultural subsistence, and again in recent Westernized populations, due to dietary changes. The plasmid-carrying strain, which includes a large fraction of genes that are variably present or absent in other strains, has much higher genetic variation.

This doesn't sound like a pathogen that would make a promising candidate for tracing human population movements. Instead, it looks like a great opportunity to study the adaptive evolution of this pathogen in different human populations, and with different strains of S. mutans having different genes in their genomes.

The date

Still, even if S. mutans seems an unlikely candidate to reflect the long-term pattern of human evolution, we might find that the S. mutans phylogeny resembles the genealogy of human mtDNA or some other genetic system. If the worldwide sample of S. mutans had a pattern of relationships that looked like human population relationships, then we might accept the hypothesis that the pathogen had coevolved with human populations in ways that reflect their long-term relationships, instead of the adaptive ecology of human mouths.

Two comparisons are possibly informative: the dates associated with the divergences of plasmid-carrying S. mutans strains, and the branching pattern of those strains. If the dates match the dates of origin of human populations, as assessed by mtDNA or some other markers, then that would tend to confirm the hypothesis that the S. mutans strains originated in concert with human populations.

Of these, the greatest importance should be ascribed to the dates. If the S. mutans strains originated on the same time scale as human population relationships, then it might tell us something about population dynamics even if the strains were affected by selection.

But there are no dates in the paper. There are no data that can be used to estimate the rate of mutational changes in the genes sampled.

There is no need to argue about neutrality versus selection -- there is simply no possibility of dating the relationships of the strains. Remember, the other, non-plasmid-bearing strains have almost no variation at all! The substantial variation in the plasmid-bearing strains is something of an exception, as far as S. mutans strains are concerned. This is not a question of dates matching those of human mtDNA or any other gene. There is no date.

The phylogeny

OK, so the dates don't confirm the hypothesis, because there are no dates.

The paper presents two maximum likelihood (ML) phylogenies of S. mutans variants. The first represents relationships for the 5.6-kb plasmid based on the sequence variation of the plasmid's hypervariable region (HVR); the second for the intergenic spacer region of the rRNA gene (IGSR). In addition, the genotypes of two genes (mutacin I and II) are considered in the phylogenies.

The HVR sequence allows the construction of a well-resolved cladogram. The paper presents an ML phylogeny of the strains, which the paper says is essentially the same as the cladogram based on maximum parsimony. Here is the figure depicting the phylogeny along with the caption from Caufield et al. (2007):

HVR phylogeny from S. mutans, Caufield et al. 2007

Caption from Caufield et al. 2007: FIG. 2. Unrooted maximum-likelihood phylogeny of the cryptic 5.6-kb plasmid as inferred from HVR sequences (see Material and Methods for details of analysis). Taxa with names in blue represent strains with mutacin II, and those in black represent strains with mutacin I; taxa and branches in red represent strains and ancestral lineages with serotype e. One of two possible reconstructions is depicted for serotype e; the alternative possibility is that serotype e was independently derived for CH5A. The pairs of numbers on branches are bootstrap values: the first number is from a likelihood bootstrap analysis, and the second is from a weighted-parsimony bootstrap (1,000 replicates each). Hyphens and branches without numbers indicate bootstrap values that were below 50%. The branch lengths are proportional to the numbers of substitutions/site, as reconstructed using the HKY85+G likelihood model. Abbreviations for ethnicity of the human host: AF, African; AA, African American; CA, Caucasian American; CH, Chinese; JP, Japanese; BR, Brazilian; AM, Amazon Indian; SW, Swedish Caucasian; HI, Hispanic.

This phylogeny of plasmids from S. mutans does not correspond well to human populations. Notice that the cluster at the top of the phylogeny includes strains from China, Japan, African Americans, American whites and American Indians. The basal node branches include African Americans, Africans, Guatemala (LM7), Japan, and Sweden. This is not a correspondence to human population history. It looks like the rapid diversification (possibly with reticulation) of plasmids through different populations.

That is the conclusion of Caufield et al. (2007), also:

That is, outside the terminal nodes containing identical sequenced strains, the HVR did not seem to support discrete clades for geographically/racially similar hosts. For example, strains from Asian and African individuals were dispersed throughout the tree, showing no clear cluster of S. mutans strains from similar racial/geographic groups.

The other phylogeny in the paper is an ML phylogeny of the IGSR region, which is not part of the plasmid, and hence may relate to the strains' origins in a different way. Here is the phylogeny presented in the paper:

HVR phylogeny from S. mutans, Caufield et al. 2007

Caption from Caufield et al. 2007: FIG. 3. Maximum-likelihood phylogeny of IGSR sequences from strains with plasmids, rooted with IGSR Streptococcus ratti CCUG 27642 (see Materials and Methods for details of the analysis). Taxa and branches in blue represent strains with mutacin II; taxon names in red represent strains with serotype e. The triplets of numbers on branches are bootstrap values: the first two numbers are from weighted-parsimony analysis including or excluding, respectively, serotype and mutacin characters (2,000 and 1,000 bootstrap replications); the third is from a likelihood bootstrap (952 replications). When the serotype and mutacin characters were included and each was weighted the same as the set of DNA characters (i.e., the three data partitions were weighted equally), HI24 grouped with the AF199 cluster with a parsimony bootstrap value of 58%. Hyphens and branches without numbers indicate bootstrap values that were below 50%. The branch lengths are proportional to the numbers of substitutions/site as reconstructed using the HKY85+G likelihood model.

The paper offers this interpretation of the IGSR phylogeny:

Although the poor resolution limits our ability to draw decisive conclusions, there are some interesting features of the IGSR tree that contrast with the plasmid HVR tree. First, association of the strains with mutacin II is continuous along an evolutionary lineage, and associations with serotype e have evolved multiple times. Although it is formally possible that multiple independent changes to the mutacin II genotype occurred, a single-gain-single-loss scenario is most parsimonious, given the ML tree. Second, relationships among the taxa are different. For example, whereas the JP9-4 IGSR is identical to that of CH638 and CH639 but is in a distinct clade from the CA96 cluster (Fig. 3), the plasmid HVR of JP9-4 is most closely related to the CA96 cluster but is phylogenetically distinct from the CH638 and CH639 HVRs (Fig. 2). Also, the AA140 IGSR cluster is essentially identical to the SW114 IGSR (Fig. 3), but the plasmids of these strains are at opposite ends of the tree (Fig. 2).

Based on the IGSR phylogeny, there is no particular correspondence between the S. mutans relationships and the relationships among human populations. The basal node includes African, African American, and Guatemala strains. The population affinity of the Guatemala strain is not reported (the sample was obtained in 1973) -- it is conceivably African in origin, although one might assume it is Maya or European. The next node includes China, Sweden, Brazil, Hispanic, and African American samples.

This is not a tree with branches corresponding to dispersal from Africa, or other kinds of human population relationships. There is no way to reconcile this phylogeny with a simple branching model of human population origins. The simplest explanation is that the S. mutans strains can disperse and adapt.

Anything left?

The paper suggests that the lack of variation of the S. mutans strains in "Caucasians" is possibly due to their population history -- with a limit to the variation imposed by a bottleneck in the founding of that population. But with no dates, there is no telling whether the lack of S. mutans variation is attributable to the founding of "modern" populations in Europe, or the Neolithic (diet change) or later selection in the pathogen population. The lack of variation in the non-plasmid-bearing S. mutans strains was previously suggested to be indicative of selection associated with the development of agriculture, and possible recent transfer into humans from another host species (Ogretme et al. 2006). Without any clear idea of the rate of genetic change, there is no reason to doubt that hypothesis.

The paper emphasizes the presence of an "Asian clade" in the IGSR phylogeny. This includes most of the samples from China and Japan, but not all of them. Nor does it include the American Indian strains or the Guatemala or Brazil strains (although the population affinity of these strains is not specified). If I had to suggest a hypothesis for these relations, it would also relate to recent subsistence changes rather than ancient population history.


To sum up, this paper does not fit the description of an independent verification of a human dispersal from Africa. It does provide a valuable starting point for evaluating the coevolution of humans and their Streptococcus biota.

And to be fair, that is all the paper claims to provide. The hype about human dispersals all came from the press accounts, based in no small part on the press release. Here is a key passage:

"It is relatively easy to trace the evolution of S. mutans, since it reproduces through simple cell division," says Caufield, who gathered over 600 samples of the bacterium on six continents over the past two decades. His final analysis focused on over 60 strains of S. mutans collected from Chinese and Japanese; Africans; African-Americans and Hispanics in the United States; Caucasians in the United States, Sweden, and Australia; and Amazon Indians in Brazil and Guyana.
"By tracing the DNA lineages of these strains," Caufield said, "We have constructed an evolutionary family tree with its roots in Africa and its main branch extending to Asia. A second branch, extending from Asia back to Europe, traces the migration of a small group of Asians who founded at least one group of modern-day Caucasians."

The paper doesn't claim such a relationship. It suggests that the limited variation of plasmid-bearing S. mutans in the Caucasian samples and the clustering of Asian IGSR sequences might be explained by population history. But it doesn't claim that the S. mutans data support that history, and abundant evidence in the paper suggests that population movements and origins cannot explain many aspects of the S. mutans tree. Most notably, this includes the evolution of the plasmid itself, which appears to be subject to widespread horizontal transfer. That looks like adaptive evolution.

The part of the press release that discusses the "100,000 to 200,000 years ago" date is hugely misleading. There is nothing in this study that indicates such a date, or indeed any date at all.

So, the coevolution story certainly is the relevant aspect of this paper, but that doesn't say much about human origins. It says a lot about the adaptive ecology of human oral biota.


Caufield PW, Saxena D, Fitch D, Li Y. 2007. Population structure of plasmid-containing strains of Streptococcus mutans, a member of the human indigenous biota. J Bacteriol 189:1238-1243. doi:10.1128/JB.01183-06

Alaluusua S, and 7 others. 1996. Oral colonization by more than one clonal type of mutans streptococcus in children with nursing-bottle dental caries. Arch Oral Biol 41:167-173. doi:10.1016/0003-9969(95)00111-5

Caufield PW, Wannebuehler YM, Hansen JB. 1982. Familial clustering of the Streptococcus mutans cryptic plasmid strain in a dental clinic population. Infect Immun 38:785-787. Abstract

Ogretme, M. S., S. G. T. Do, D. Clark, W. G. Wade, and D. Beighton. 2006. Multilocus sequencing typing (MLST) of Streptococcus mutans. Caries Res. 40:303-358. doi:10.1159/10.1159/000093189