Mozart and mammoth metagenomic manipulation

OK, I just think the Mozart skull DNA extraction is creepy. Not because identifying dead skulls is creepy in itself -- hey, I like forensic anthropology a lot more than the random person on the street.

No, I think it's creepy because of the mammoths. I got ahold of the mammoth DNA paper by Poinar and colleagues a couple of weeks ago; it's on Science Express.

Can I just say, Science Express is super-lame? I mean, a subscription wall inside a subscription wall!

The paper, on the other hand, is decidedly not lame. Here is the abstract:

We sequenced 28 million base pairs of DNA in a metagenomics approach using a woolly mammoth (Mammuthus primigenius) sample from Siberia. Thanks to exceptional sample preservation and use of a novel emulsion polymerase chain reaction and pyrosequencing technique, 13 million base pairs (45.4%) of the sequencing reads were identified as mammoth DNA. Sequence identity between our data and African elephant (Loxodonta africana) was 98.55%, consistent with a paleontologically based divergence date of 5 to 6 million years. The sample includes a surprisingly small diversity of environmental DNAs. The high percentage of endogenous DNA recoverable from this single mammoth would allow for completion of its genome, unleashing the field of paleogenomics.

Of course, they were helped a lot by the unique preservation in the sample, which was found in optimal cold conditions at the shore of Lake Taimyr. That probably cut down substantially on extraneous microbial and fungal DNA.

But the metagenomic approach makes these kinds of contaminants mostly irrelevant. In metagenomics, researchers sequence every last piece of DNA in a sample, and then figure out what all the pieces are by comparing them to genome databases. What you get is illustrated by this pie chart:

Proportion of DNA sequence from different sources in the mammoth sample of Poinar et al. (2006).

There are two beautiful things about this graph. One is that, although there happens to be a lot of mammoth DNA in the sample (over 50 percent), there doesn't have to be. The fact is, it doesn't really matter how much of the original stuff is there or how much junk there is; if there is any minimal level of DNA preservation from the original beast, you are going to be able to find it. </p>

The other beautiful thing is that the ability to recognize sequence is determined not by your own work on a fossil, but by the completeness of genome databases. This means that unknown sequences just sitting on your computer after an extraction gradually, inexorably, will be identified when science gets around to sequencing the organism they came from. The 18.42 percent "unidentified" in the graph will slowly reduce over time. Now, almost none of that will be mammoth-relevant information, but it's still pretty cool.

There are two problems. One is, if the DNA preservation is poor, you are going to have to grind through an awful large amount of bone to get any kind of good genome coverage. In this case, a small sample of mammoth bone was sufficient to sequence 13 million base pairs of mammoth DNA. But there might or might not be anything interesting in those 13 million base pairs. It is certainly possible to sequence more from more samples, and that is the point: if preservation was not as good as in this particular sample, you would have to mill major mammoth mandible to get a full genome sequence.

For mammoths, I don't see that as much as a problem. Remember the Explorers' Club, after all. I imagine a large woodchipper in some DNA lab standing ready to chomp the frosty mammoth meat.

For hominids, that will be a bit more troubling. Will we be willing to put an entire skull in the blender for a complete Neandertal genome? Or if Neandertals are well-enough preserved and we are willing to settle for less-than-full genome coverage, what about more ancient or more marginally preserved fossils, like an Atapuerca femur? Does a genome have more scientific value than a fossil object itself, if we can preserve its anatomical detail with microCT or other techniques?

Then there's the other problem: degradation. How good is the sequence? Even in the exceptionally well-preserved mammoth sample, there was substantial evidence for degradation of sequence, with around twice the number of expected C -> T transitions compared to elephant and a third or so more G -> A transitions. That's an awful lot of potential noise for anyone looking at gene function and evolution. I'm guessing what will have to be done is to simply ignore certain classes of mutations that are likely to derive from postdepositional diagenesis (that is, DNA rot). Even so, some remaining diagenetic changes will remain hard to figure out.

The best approach may be to simply grind up more bone; making sure that each genome section is covered by multiple copies. The multiple copies allow for error correction, since it is relatively unlikely that any single diagenetic change will occur in multiple copies of a gene. The really, really good news is that given enough sample, we are very likely to get accurate genome sequences from ancient humans.

But the whole thing raises a fairly hairy problem concerning fossil humans. It's like that commercial with the owl and the Tootsie Pop -- how many samples does it take to get the genome? CHOMP!

So what about Mozart?

Something we can do to a Neandertal, we can certainly do to bones from any historical figure. The Mozart genome, the King Tut genome, the Lincoln genome, the John Wilkes Booth genome -- we can have them all!

Today, you can have your Y chromosome sent away to find out if you are a descendant of Genghis Khan. Tomorrow, you'll be able to compare every one of your genes to Mozart. In all likelihood, some genetic variants will be associated with musical talent. The obvious next Austrian TV special will be the Mozart genotypes for any music-related genes. The less obvious step will be screening your young Julliard candidate for genetic similarity to Mozart.

There's no way Mozart can cash in on the process. But what about living celebrities, or athletes? Subscribe to iGenes and you can find out whether your kid's genes might give him the chops for the NBA (with proper work and training, of course) or whether he should start hitting the links instead.

That's what I find creepy. And there are an awful lot of composers buried in well-known locations that could be dug up for genetic comparisons.

References:

Poinar HN et al. 2006. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science (online early) doi:10.1126/science.1123360.