Neandertal genome project

3 minute read

By now you've probably read something about the Neandertal genome project. But have you seen the press kit from 454 Life Sciences?

Due to such sample contamination, the task of sequencing the Neandertal genome is much more extensive than the task of sequencing the human genome. 454 Life Sciences' Genome Sequencer 20 System makes such an endeavor feasible by allowing approximately a quarter of a million single DNA strands from small amounts of bone to be sequenced in only about five hours by a single machine. The DNA sequences determined by the Genome Sequencer 20 System are 100-200 base pairs in length, which coincides neatly with the length of ancient DNA fragments.

Of course, there's nothing really new here; they're just saying what their goals for the project are. The only concrete result is this:

Approximately 99% of the Homo sapiens genome is identical to the chimpanzee genome, our closest living relative. It is estimated that the Neandertal shares 96% of the 1% difference with Homo sapiens. The Neandertal shares the remaining 4% of the difference with the chimpanzee.

That has been part of the public talks about the sequencing, as well as the press conference. If we assumed a 7 million year genetic divergence of humans and chimpanzees, this would place the human-Neandertal genetic divergence time at 560,000 years ago. They don't mention the obvious: most human genes have variation that is a lot older than 560,000 years -- so Neandertals will be within the human range of variation for most genes.

The hope of the project is to identify the set of genes that were under selection too long ago to detect them in recent selection assays, but more recently than the average human-Neandertal genetic divergence. In other words, genes for which the Neandertal allele no longer exists in living people.

But there's another use we might consider, raised by Bruce Lahn in a New York Times article by Nicholas Wade:

A longstanding dispute among archaeologists is whether the modern humans who first entered Europe 45,000 years ago, ultimately from Africa, interbred with the Neanderthals or forced them into extinction. Interbreeding could have been genetically advantageous to the incoming humans, says Bruce Lahn, a geneticist at the University of Chicago, because the Neanderthals were well adapted to the cold European climate -- the last ice age had another 35,000 years to run -- and to local diseases.
Evidence from the human genome suggests some interbreeding with an archaic species, Dr. Lahn said, which could have been Neanderthals or other early humans.

Of course, the Neandertal genome would be very important for confirming possible genetic contributions from Neandertals.

I think the most interesting thing will be the variation in Neandertal genes. For that question, I think the project is facing a pretty extreme challenge:

Over the next two years, the Neandertal sequencing team will reconstruct a draft of the 3 billion bases that made up the genome of Neandertals. For their work, they will use samples from several Neandertal individuals, including the type of specimen found in 1856 in Neander Valley and a particularly well-preserved Neandertal from Croatia. The Max Planck Society's decision to fund the project is based on an analysis of approximately one million base pairs of nuclear Neandertal DNA from a 45,000-year-old Croatian fossil, sequenced by 454 Life Sciences.

OK, here's the thing: they have to try to assemble a sequence from a diploid individual. This is really challenging, because anywhere the individual is a heterozygote, there will be ambiguity about which SNPs are linked on one chromosome, and which belong together on the other.

Now, they faced the same problem in the mammoth Mc1r paper, which they resolved by sampling additional mammoth individuals and finding homozygotes. That may be the reason for mentioning the "several Neandertal individuals" in the press release. But it's not obvious that even the "several" they are likely to have samples from will be enough for most genes to be unambiguously reconstructed.

And then there's the contamination problem. Well, that one I'll save for another day.