Data supplements driving me crazy

I'm about to pull out my hair reading "supplementary information" for papers.

Two recent papers (by Mike Hammer's group and David Reich's group) attempt estimates of the diversity level of the X chromosome versus the autosomes. As discussed on Gene Expression this week, the two papers came to completely opposite results.

In the olden days, ten years ago, I would simply put the two papers side by side and find the discrepancies. But nooooo, we can't do that any more. Now, all the relevant parameters from one of the papers (you guessed it, the one published by the Nature Publishing Group) are hidden away in a supplement.

You'd think that might not be so bad, since I have the supplement. But I have to keep tracking the cross references to the paper to find out where the methods apply. It's a pain in the neck. Nobody else ever seems to complain. But that's because they simply don't read the papers! AAARGGGH!

So what's the discrepancy in this case? I'm still working through these darned things.

My first impression is that both papers use different methods to estimate the mutation rate on the X chromosome. It was Reich's group, after all, who claimed that the human-chimp divergence was followed by extended hybridization, a process that took over 4 million years in their estimation. The evidence was the X chromosome.

So, for their current paper, Keinan and colleagues (2008) try to correct for the recent divergence of human and chimpanzee X chromosomes. Simple enough -- rescale all X chromosome mutation events by the some ratio proportional to the human-chimp divergence discrepancies. In this case, they attempt to rescale to the human-macaque divergence. Since that divergence happened in the Oligocene, the discrepancies among chromosomes should slight compared to the overall divergence. I'd feel better if they actually tested this idea.

Meanwhile, Mike Hammer and colleagues scaled X chromosome diversity to the human-orangutan divergence. They claimed that this gave the same results as the human-chimpanzee divergence. Which, if true, would obviously give a different outcome than the procedure followed by Keinan and colleagues, which was predicated on the idea that the human-chimpanzee X divergence is the wrong number to use.

The human-chimpanzee divergence discrepancy, if it exists to the extent claimed by Patterson et al. (2006), is probably enough to explain the discrepancies in the results of these two papers, and clearly in the correct direction. By assuming a low divergence date for the human-chimp X chromosome comparison, Keinan et al. have assumed a low mutation rate for the X. That means that the X variation in humans represents relatively less time, and therefore lower genealogical diversity and a lower effective size, than estimated by Hammer et al.

But I don't think that's the end of the story. In fact, I think there are quite a few strange aspects of the results of both papers. Even though both papers explain their results in terms of demography, I don't think that avenue is very promising. The kinds of demographic changes that happened in the Late Pleistocene just don't look very much like those coming out of these papers. More on that later...

What the Keinan et al. paper is showing is some substantial differences in the derived/ancestral ratio between populations, and large discrepancies in X diversity across different regions of the X. Large discrepancies would be expected between small regions due to the intrinsic variability of the coalescent process. But these large discrepancies exist between regions 3 centimorgans in length -- large enough regions that there ought to be less dispersion among them. The Asian and European samples have a strong deficit of derived alleles at frequencies lower than 30 percent, but the African sample has a slight excess.

We'll apply some more simpleminded analysis to these data and see if anything interesting pops out. As they say, garbage in, garbage out -- but when the garbage consistently looks like banana peels, you can guess there's a monkey somewhere.

UPDATE (2008/12/21): More craziness -- this article from New Scientist includes a quote from David Reich:

However, the chance of finding archaeological evidence for these migrants is slim. "You're looking for a population that was there only a short period of time, perhaps only 10 generations, so the physical impact of that population in that environment wouldn't be enough to detect," Reich says.

Surely he's not talking about a bottleneck 10 generations long, which by the estimate in the paper would mean an effective size of around 50 individuals. Surely not. No. It's just a quote in an article.

Oh, heck. I think the point of all these recent papers that use "inbreeding ratio" instead of effective size and time as bottleneck parameters is to hide these kind of crazy numbers from peer review. We've got people out there who are talking about biblical models of human migration, like Noah-and-the-Flood level bottlenecks.

And archaeology makes no difference. All those archaeological sites you've got? Well, they're not the ones who founded the world's population. Our actual ancestors made no impact on the environment that we can detect today. They were invisible.

And hey, if results contradict each other? No worries. It's not like this is a refutationist science, after all:

Their analysis also challenges a study published earlier this year, which found that all humans descend from fewer numbers of males than females. The researchers suggested that polygyny, where few men procreate with many women, accounts for this result.
"It's possible, in principle, that both are true in some level," says Reich.
Polygyny that occurred over the last million years of human evolution could have left an imprint in our genomes, says Michael Hammer, a geneticist at the University of Arizona, who led that study.
Reich and Keinan, on the other hand, focused their analysis on the period when anatomically modern humans left Africa.
"We'll have to figure out this issue in future work," Reich says.

GAAAAAAAAHHHHHH! And you thought I was silly to be driven crazy by these papers! "It's possible, in principle, that both are true in some level."