Cave bear genomics

8 minute read

A new article on the epub area of Science, by James Noonan (Lawrence Berkeley National Laboratory) describes the recovery of nuclear DNA sequences from cave bear remains. Here's the abstract:

Despite the greater information content of genomic DNA, ancient DNA studies have largely been limited to amplification of mitochondrial sequences. We describe metagenomic libraries constructed using unamplified DNA extracted from skeletal remains of two 40,000-year-old extinct cave bears. Analysis of ~1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8% and 1.1% of clones contain cave bear inserts, yielding 26,861 base pairs of cave bear genome sequence. Comparison of cave bear and modern bear sequences revealed the evolutionary relationship of these lineages. The metagenomic approach employed here establishes the feasibility of ancient DNA genome sequencing programs.

In a previous post, I covered the recent announcement that this group -- a collaboration between Max-Planck Evolutionary Anthropology and Lawrence Berkeley lab -- plans to recover genomic sequences from Neandertals. The cave bear paper gives a clear hint about how it will be done.

A Nature news article covers the bear research:

The standard practice for sequencing genes involves making numerous copies of the initial sample through a process called a polymerase chain reaction, or PCR. Subjecting ancient DNA to this does not produce good results because PCR picks up and duplicates the sequences of modern animals more efficiently. This means that bits of contaminating DNA often drown out samples from the prehistoric animal.
"The prevailing idea was that this was impossible," says James Noonan of the Lawrence Berkeley National Laboratory in California, who is lead author of the paper that appears in Science this week.
To overcome this challenge, Noonan and his colleagues decided to skip the replicating step and directly sequence the tiny amount of DNA extracted from two Austrian cave-bear bones that are more than 40,000 years old. To make sure each portion of DNA was really from the bears rather than a contaminating source, they compared each sequence produced with the genome of the dog, a modern relative of the bear.
The technologies needed to examine such tiny amounts of DNA directly, along with the reference genome from the dog, have become available to scientists only recently.
The team determined that nearly 6% of the sequences analysed from one of their animal samples belonged to ancient bear: an unexpectedly large amount. The rest of the DNA probably came from soil microbes or the palaeontologists handling the bones, the team says.


The technique they are using, called metagenomics, is borrowed from environmental science. The principle is that you take a sample of organic material and look for evidence of the organisms within it by separating out all the DNA and cloning it.

This is in contrast to PCR, where you look for a specific piece of DNA from one location in the genome by designing primers that will amplify that piece preferentially. With metagenomics, you don't start out knowing what you are looking for.

Metagenomics is useful to environmental scientists, drug researchers, and others because it allows the study of DNA from organisms without being able to culture the organisms in the laboratory. You are taking DNA from the samples and inserting it into bacterial colonies using a vector, resulting in a "metagenomic library." This library consists of DNA fragments from any kind of organisms that were in the sample, possibly including hundreds of species. If you've heard of the idea of creating a "bar code" of DNA that could identify organisms taken from ocean water or soil samples, this is the science that is behind that idea. You don't know what you're extracting from, and you'd like a way to standardize samples so you can say.

For the cave bears, what has been done is the extraction of DNA from the sample and cloning into a metagenomic library, consisting of bacterial DNA, fungal DNA, human DNA, and some cave bear DNA. Then the lab sequences the cloned fragments to find out what they are. The ones that look bear-like, they assume are endogenous. Hence, a limitless source of cave bear genetic material.

Of course, in the case of the bears, the lab has little worry that living bears in the laboratory have handled and contaminated the remains (although I have seen cases in labs where such strange contaminations have happened...). For Neandertals, the possibility of human contamination is everpresent. That this technique skips the PCR step is very important in limiting contamination (since modern DNA amplifies much more readily than ancient DNA) but it far from eliminates the problem. The two cave bear extracts preserve a substantial amount of human sequence -- in one case a third as much human contaminant as original cave bear. It will be very hard to exclude this contamination from consideration in a Neandertal extract, which is very likely to share much of its genome in common with humans without contamination.

Why did they compare with dogs? Because there is a dog genome project, but not a bear one. This is a computational comparison, not a wet one. For Neandertals, the comparison will be the same: hunting through the human genome to find segments that correspond to the Neandertal extracts.

Looking for Neandertal genomic DNA

This is new stuff, to a point, but not all that new. The original extraction of Neandertal mtDNA in 1997 used bacterial cloning to reconstruct the fragments. The history indicates that Pääbo's lab has not trusted PCR amplification in Neandertal-aged remains from the beginning, and certainly for good reason considering the very high chance of preferential amplification of contaminants.

But the metagenomics approach adds a new twist. If you aren't looking specifically for one genomic region when you extract DNA from the sample and clone it, then the results are going to be a scatter from across the genome. In this case, Neandertal genomics may really be like Forrest Gump's box of chocolates: you never know what you're going to get. With a sufficiently large sample, you could in principle find any region of the genome. But it's not obvious how much extract a sufficiently large sample would take. For the bears, around 1 megabase was cloned, yielding around 27 kilobases of cave bear DNA. With more effort, a larger quantity might be obtained, but of course this would require the destruction of larger samples of bone.

Twenty-seven kilobases is a potentially interesting amount. It is large enough to give a good chance of finding genetic variants in the Neandertal sequence. Humans vary in around 1 nucleotide for every thousand, so 27 kb is a nice chunk of potential differences.

But if only one out of a thousand base pairs are different between humans, the amount of DNA degradation over time might overwhelm the actual number of changes. There is some evidence from ancient mtDNA sequences for diagenetic damage to the preserved sequences resulting in sequence changes. These are known to be diagenetic because some of them apparently occur at predictable hotspots, but the rate of this damage is not yet known, and it appears to differ between different specimens. Nuclear DNA may be more stable than mitochondrial DNA, because it is packaged by proteins into a firmer structure, but I wouldn't make any bets on it. But even so, this process of diagenetic change has the potential to be much greater than the actual rate of evolutionary differences. So it will be a terrible problem to interpret the genetic differences.

Noonan et al. (2005:3) observe this problem in the cave bear sequence:

The substitution rate we estimated for cave bear is higher than that in any other bear lineage. On the basis of results from PCR-amplified ancient mitochondrial DNAs, cytosines in ancient DNA can undergo deamination to uracil, which results in an excess of G to A and C to T (GC-AT) transitions (22). The inflated substitution rate in cave bear is likely due to an excess of such events, since many of the substitutions assigned to the cave bear lineage are GC-AT transitions (Fig. 3A). These presumably damage-induced substitutions complicate phylogenetic reconstruction and the identification of functional sequence differences between extinct and modern species.


They argue that the diagenetic changes may be excluded if they occur in a subset of the clones, as they apparently do in this case. They merely leave out the clones with high rates of GC-AT transitions, and their results look more normal. This helps to reduce the problem, if the changes are concentrated in certain clones, but it cannot eliminate it.

This might be easier if we knew we were looking for particular variants at certain genomic locations. For example, if the lab went looking for the FoxP2 gene, they could expect to find variation at the one or two amino acid changing substitutions that have occurred in humans compared to chimpanzees. The odds of diagenetic changes at these positions would be relatively low compared to the known odds of finding a genetic substitution there. But the metagenomic approach may not give the opportunity to focus in on changes that are known to be likely polymorphisms. We may have to just take what we can get.

In any event, it should be interesting to see these results come out. I am afraid that we will see phylograms showing the relationship of some Neandertals compared to other living human populations. That would be a mistake, since living people are not related as branches on a tree; and there is no necessary reason to suppose that Neandertals were either. But I guess that's my job to point out when the time comes.


Noonan JP, et al. 2005. Genomic sequencing of Pleistocene cave bears. Science Express. doi: 10.1126/science.1113485. Abstract