Reviewing the clock, and phylogenomics

After reading yesterday's penguin post, one of my readers thought I'd given up the ghost on the molecular clock.

But notice the bottom line of that message: those ancient penguins didn't tell us any thing new about the rate of mitochondrial changes over 10s of thousands of years. The rate, over at that time period, is pretty much what you would expect from comparing humans, or comparing Neandertals. Considering that the generation-to-generation rate of mutations of the mitochondrial DNA is maybe an order of magnitude higher, I'd say that consistency is pretty impressive.

Much more important, when it comes to comparing humans and chimps, we've come billions of base pairs beyond the mitochondrial DNA alone. We have drafts of the complete the numbers of humans and chimpanzees, macaques, and working drafts for gorillas, orangutans, and a handful of other primates. We have a better ability than ever to reconstruct the phylogenetic relationships of those species, the times that they diverged from each other, and even something about the number of individuals and structure of their ancient populations.

For the past five years, almost every study including more than a single gene has agreed on one central fact: humans and chimpanzees last exchanged genes less than 6 million years ago. Most of them place the date much younger -- an average of less than four and a half-million years ago.

Still, these kinds of comparisons can be quite complicated, and many -- maybe most -- of my paleoanthropology colleagues would prefer to remain ignorant of the details.

I can kind of sympathize. If somebody is willing to say it could be 6 million years, well, that doesn't sound so different from seven. And Sahelanthropus is only seven. What's the problem anyway?

I've got to say, though, that attitude is a fundamental lack of seriousness about the data. It's like if I said about Lucy, "Hey it's just a pelvis, right, what's the big deal?"

Well, it's the evidence, that's what. A. afarensis is a large and substantial sample with dozens of shared homologous features with humans and other hominins. If the genetics told us that humans and chimpanzees diverged less than 2 million years ago that would be a substantial conflict. Either that estimate would be wrong, or much of what we thought we knew about the pattern of hominin evolution would be.

We are in fact at that point in genetics. If the human-chimpanzee divergence really were much older than 5 million years ago, then much of what we think we know about population genetics of primates must be wrong.

I understand that many of my readers might welcome that suggestion. I, on the other hand, am having a hard time figuring out just how I'm supposed to make the divergence date much older than the current best estimates. In the 1990's, it was fashionable to just say that the clock was wrong, because our estimates of mutation rate were wrong, and leave it at that. People even did silly things like provide "confidence intervals" based on different assumptions about the human-orangutan divergence. If it was 12 million years ago, you'd get one (low) answer; if it were 16 million years ago, you'd get another (high) answer. Report the low and high ends, there's your "confidence" interval. Human-chimpanzee divergence: 4 to 6 million years.

It was a joke, but that's where things stood.

Nowadays, we know an awful lot more about the relations of these populations. I'm going to point everybody to a recent review paper -- it was released the same week as Ardipithecus was -- by Adam Siepel, in Genome Research. It's a very good review of the recent literature on the human-chimpanzee divergence, and by implication the human-gorilla and other primate divergences. It is not about building a phylogenetic tree -- it's about how we use sequence data from many genes to put together a phylogenomic tree, one that involves the divergences of populations and also their inbreeding and selection characteristics.

The time we estimate for a population divergence depends on the size of the ancestral population, as well as the pattern of selection within it. These factors also affect the sorting of gene variants of the ancestors into the descendant populations. As Siepel points out, these effects have led to two different methods of examining the demography and divergence times of ancient species:

Two simple, but ingenious, approaches were proposed early on, both of which exploited the fact that, with sparse sampling across the genome, the loci under study were likely to be unlinked, and their genealogies could be assumed to be statistically independent. The first method, by Takahata (1986), derived information about ancestral population sizes from the variance in the estimated divergence times for pairs of orthologous sequences. The second, by Wu (1991) (see also Hudson 1983a; Nei 1987), made use of the variance in tree topologies estimated from three or more orthologous sequences. Takahata's method essentially estimated [population divergence time] and [effective size] from the variance in estimates of [genetic divergence time] at multiple loci (in the notation above), while Wu's method estimated [effective size] from the relative frequency of topological inconsistency in reconstructed gene trees.

Those topological inconsistencies began to show up during the 80's and 90's, when people would publish sequences that favored human-gorilla or chimpanzee-gorilla clades. These were genes in which humans really were more closely related to gorillas, because the human-chimpanzee (chuman) ancestral population was large enough to retain two divergent alleles for the two million or so years that chumans existed.

Siepel goes on to review the literature using variants of these two approaches during the last seven or eight years. The Nature chimp-human hybridization paper by Patterson and colleagues (2006, which I reviewed here) forms a central part in the discussion, as people have reacted to that paper and the major issue it raised.

Reading the review, one cannot help but notice the low age estimates that keep coming up again and again. Most of them are under 4.5 million years. Patterson and colleagues had one of the highest recent estimates, putting the speciation at less than 5.4 million years. That's because they assume a smaller effective size in the ancestral lineages -- pushing the date higher. The more that demography fiddles with the assortment of ancestral genes before a population divergence, the younger the resulting estimate of divergence date will be.

To make the date older, you need to assume there was no demography -- an extreme chuman bottleneck. But that would be inconsistent with the evidence of incomplete lineage sorting -- those gorilla genes that we share. And it would take some magical rate discontinuities among genetic loci to get them the amount of interlocus variability that they have.

The review mentions some recent work suggesting that background selection may have reduced the site diversity in the ancestral species -- work to explain why the human X chromosome is even more similar to chimpanzees than the autosomes. Taken to an extreme, background selection or massive hitchhiking could raise the divergence estimate a bit, but it doesn't overcome the issue of incomplete lineage sorting, either.

You could push the human-orangutan divergence higher, or the human-macaque divergence, both of which help to calibrate the mutation rate. But that's not going to make 4 million years into 8 million, not unless orangutans diverged from us in the Oligocene.

You could propose a massive slowdown in mutations in the chuman lineage. But why? How? Like I said earlier, you'd have to change something pretty fundamental about our understanding of primate genetics.

No, it's very hard to see how these dates are going to get much older. What I'm saying is that you can't just wave them away; these are serious estimates and I don't see any simple way to get a better one.

Now, the question is, do the geneticists insufficiently appreciate the hominins? Do they just not care about the havoc this wreaks in paleoanthropology-land?

In fact, Siepel addresses this issue. The review mentions that Patterson and colleagues (2006) offered their hybridization idea in part to explain the early "hominin", Sahelanthropus. With the revelation of Ardipithecus' postcranial anatomy, I don't think we need to resort to chuman hybrids.

I think it's more parsimonious to imagine a widespread population of chumans, a large-bodied, basically Ardipithecus-like primate, structured into regional populations in much the way that today's chimpanzees and gorillas are. This population was numerous and stable, and it gave rise over time to many more arboreally adapted branches -- first the gorillas and later the chimpanzees. The remainders, as it were, became the hominins.

There are various hangups with this scenario that make me hesitate. I do take Orrorin seriously, for example -- it is hard to accommodate a 6-million-year old hominin under the large-population recent-divergence hypothesis.

And on the genetic side, the substitution rate in the nuclear genome is affected by positive selection, background selection, duplications and unequal crossing over. It's quite possible that some odd demographic scenario might reduce the genetic divergence date yet further, or increase it to some extent.

What's encouraging is that today's dense genetic data and fast modeling give us the chance to test these scenarios. We can model selection and demography directly and comparing results to observed genetic patterns.

OK, it's bedtime. More on this later...

References:

Siepel A. 2009. Phylogenomics of primates and their ancestral populations. Genome Res 19:1929-1941. doi:10.1101/gr.084228.108