john hawks weblog

paleoanthropology, genetics and evolution

ancient DNA

  • The sea shall give up her dead

    Thu, 2013-05-09 13:56 -- John Hawks

    I really like this ScienceNOW account by Traci Watson of new work that has uncovered ancient DNA in deep-seafloor contexts: "Ancient DNA Found Hidden Below Sea Floor". The article covers two studies, including one looking at 11,400-year-old DNA from the abyssal plain, another comparing more ancient and recent Black Sea seafloor samples. The latter study may help to redate the last time the Black Sea basin was flooded from the Mediterranean:

    One type of marine fungus, for example, first appeared in the sediments roughly 9600 years ago—exactly when some forms of freshwater plankton and a freshwater mussel vanish, the team reports this week in the Proceedings of the National Academy of Sciences. That suggests that marine waters started to invade the lake roughly 600 years earlier than thought. The team also found DNA from a form of marine alga in 9300-year-old sediments, though the alga doesn’t show up in the fossil record until 2500 years ago, says molecular paleoecologist Marco Coolen of the Woods Hole Oceanographic Institution in Massachusetts and an author of the Black Sea paper.

    What a neat project it will be, to explore seafloor DNA for unexpected inclusions. There's a good reason to fund much more work here, given that the 11,400-year horizon where this is already practical is so near the Younger Dryas. We need a fleet of tiny autonomous vessels to find the interesting stuff -- we can call them, "Glomar Venters"!

  • Denisova at high coverage

    Thu, 2012-08-30 15:25 -- John Hawks

    Science today has released the new paper on the Denisova high-coverage genome by Mattias Meyer and colleagues from Svante Pääbo's group [1]. There is a lot of material in the supplements of the new paper, and it will take some time to work through implications.

    The basics are quite simple: The paper confirms the initial interpretation of the genome by David Reich and colleagues [2] in most respects. The mixture with a whole-genome sample from Papua New Guinea is estimated at 6% Denisovan ancestry. Confirming the later paper by Reich and colleagues [3], the new analysis finds no significant evidence of Denisovan ancestry in a mainland south Chinese (Han Dai) individual, and can exclude it down to a very small fraction:

    However, in contrast to a recent study proposing more allele sharing between Denisova and populations from southern China, such as the Dai, than with populations from northern China, such as the Han (17), we find less Denisovan allele sharing with the Dai than with the Han (although non-significantly so, Z = –0.9) (Fig. 4B) (table S25). Further analysis shows that if Denisovans contributed any DNA to the Dai, it represents less than 0.1% of their genomes today (table S26).

    That is a mystery to be explained. How did Asians end up lacking any evidence of Denisovan ancestry, when the peoples of Sahul (Australia and New Guinea) have six percent? It's nutty! The early modern humans who were the ancestors of present Sahulian peoples surely came from Asia, and they surely mixed with Denisovans there somewhere, right? But today there's no sign that present Asian peoples descended from those early Asian peoples.

    We must, I think, conclude that there was at least one, and possibly several episodes of massive population movement across South and Southeast Asia.

    I have recently completed a review of the analogous problem for Neandertals in Europe -- late and early Neandertals themselves appear to have been a dynamic population. I'm now working on a review of the situation in Southeast Asia. We may fundamentally have to look at the archaeological record in a new, and much more dynamic, way than has been the case.

    Neandertal gene flow

    To me at the moment, this is the most interesting paragraph of the new paper:

    Interestingly, we find that Denisovans share more alleles with the three populations from eastern Asia and South America (Dai, Han, and Karitiana) than with the two European populations (French and Sardinian) (Z = 5.3). However, this does not appear to be due to Denisovan gene flow into the ancestors of present-day Asians, since the excess archaic material is more closely related to Neandertals than to Denisovans (table S27). We estimate that the proportion of Neandertal ancestry in Europe is 24% lower than in eastern Asia and South America (95% C.I. 12–36%). One possible explanation is that there were at least two independent Neandertal gene flow events into modern humans (18). An alternative explanation is a single Neandertal gene flow event followed by dilution of the Neandertal proportion in the ancestors of Europeans due to later migration out of Africa. However, this would require about 24% of the present-day European gene pool to be derived from African migrations subsequent to the Neandertal admixture.

    This is a very interesting result, partially because it is the opposite of what we are finding. As I explained earlier this year, we are finding Europeans to share more Neandertal alleles than Asians do. The difference in our results has been much smaller than 24%; really only an increase of less than 0.5% on the whole genome, or maybe 10% relative to the overall amount in Europe (which is on the order of 3%).

    My initial reaction to this difference is that it reflects the sharing of Neandertal genes in Africa. Meyer and colleagues filtered out alleles found in Africa, as a way of decreasing the effect of incomplete lineage sorting compared to introgression in their comparison. But if Africans have some gene flow from Neandertals, eliminating alleles found in Africans will create a bias in the comparison. If (as we think) some African populations have Neandertal gene flow, that probably came from West Asia or southern Europe. So as long as the present European and Asian (and Native American) samples have undergone a history of genetic drift, or if (as mentioned in the quote) they mixed with slightly different Neandertal populations, this bias will tend to make Asians look more Neandertal and Europeans less so.

    Anyway, this demands further investigation. The Denisova genome makes a more compelling outgroup for these kinds of comparisons, because it is much closer to us than chimpanzees are. But it isn't really an outgroup because it shares alleles by descent with Neandertals. So it takes some clever genetics to compare the distributions of derived alleles in these genomes in terms of introgression versus incomplete lineage sorting.

    Denisovan demography

    It has become possible to make some good estimates of demographic history using only a single diploid genome, using a technique developed by Li and Durbin [4]. Meyer and colleagues applied this technique to the Denisova genome, finding that its genetic history contrasts with that of living human populations:

    To estimate how Denisovan and modern human population sizes have changed over time we applied a Markovian coalescent model (22) to all genomes analyzed. This shows that present-day human genomes share similar population size changes, in particular a more than two-fold increase in size before 125,000–250,000 years ago (depending on the mutation rates assumed (23), Fig. 5B). Denisovans, in contrast, show a drastic decline in size at the time when the modern human population began to expand.

    There is not yet enough data from Neandertal genomes to apply the same method, but to the extent that we understand their diversity, they show a similar picture. These archaic humans in Eurasia had much, much smaller effective population sizes than the ancient population of Africa. That's not surprising, given what we understand about ancient hunter-gatherer population dynamics.

    What may be a bit more surprising is the geography. We know that Neandertals of Europe and Central Asia lived in an environment that was relatively marginal for their technology and subsistence pattern. The Denisovan population could well have lived in parts of South or Southeast Asia -- subtropical and tropical areas comparable to Africa in their ecological diversity and resource richness.

    We might have imagined that the Denisovan population would be more diverse than Neandertals -- that it might have been comparable in diversity to part of Africa, if not the entirety of Africa. The genome is inconsistent with that picture.

    How can we explain the apparent contrast?

    1. Maybe Denisovans didn't live in South or Southeast Asia at all. If not, that demands that we explain how Australians got their genes.

    2. Maybe the population was geographically extensive and diverse, but the genome from Denisova Cave doesn't represent it well. If so, we might discover that Sahulians actually have even more ancestry from this group. Alternatively, we might find that the early history of the population was widely shared, but the recent history diverged between Siberian and other branches of the Denisovan-inhabited region.

    3. Maybe African diversity emerged from a much more complex series of interactions than we now appreciate. The demographic model of Li and Durban doesn't encompass admixture, just the probability of gene coalescence across time. We have recently begun to appreciate the reality of ancient African population structure. If those initial African populations were more divergent from each other than Neandertals and Denisovans, their later mixture would give rise to a picture of early population expansion, even if each of them had relatively low (Denisovan-like) diversity.

    This picture is already complicated. It will get more so. We have a long way to go before the archaeology of MSA and Middle Paleolithic peoples will be reconciled with these genetic models.

    The "modern human" catalog

    I think it's tremendously interesting that the authors have compiled a list of gene variants shared by living humans that are absent from this high-coverage archaic human genome. It's a first step to identifying networks of genes that have been subject to recent evolutionary change in human ancestors.

    That being said, the list of genes itself doesn't lend itself to concrete conclusions:

    One way to identify changes that may have functional consequences is to focus on sites that are highly conserved among primates and that have changed on the modern human lineage after separation from Denisovan ancestors. We note that among the 23 most conserved positions affected by amino acid changes (primate conservation score ≥ 0.95), eight affect genes that are associated with brain function or nervous system development (NOVA1, SLITRK1, KATNA1, LUZP1, ARHGAP32, ADSL, HTR2B, CBTNAP2). Four of these are involved in axonal and dendritic growth (SLITRK1, KATNA1) and synaptic transmission (ARHGAP32, HTR2B) and two have been implicated in autism (ADSL, CNTNAP2). CNTNAP2 is also associated with susceptibility to language disorders (27) and is particularly noteworthy as it is one of the few genes known to be regulated by FOXP2, a transcription factor involved in language and speech development as well as synaptic plasticity (28). It is thus tempting to speculate that crucial aspects of synaptic transmission may have changed in modern humans.

    Interesting. I can imagine a Ph.D. dissertation looking into the function of each of those genes. It is surely true that in the last 300,000 years, human brains have been evolving. But why these genes as opposed to others? And how many regulatory changes (as opposed to amino acid changes) may have been further involved?

    Maybe even more interesting: How many times will the human alleles be found in some other Denisovan (or Neandertal) genomes, and how often will the "archaic" allele be found in anyone living now?

    A limited series of comparisons is too small to exclude that the range of variation will overlap, as fossil analysts have known for a long time. So we will need to work on extending our knowledge of the range of variation within living people, by increasing the sample of genomes representing populations around the world, particularly in Africa.

    The technology

    Of course, the most exciting thing about the new paper is the proof of concept for future high-coverage archaic genomes. The lab was able to generate the high-coverage sequence using its existing samples, by sequencing single-strand DNA instead of requiring double-strand DNA. This is a massive advantage when working with ancient DNA, because damage to the sequence often prevents double-stranded DNA from being amplified.

    The paper makes explicit that the Denisova phalanx simply has better endogenous DNA preservation than any other specimen known. That being said, the new sequencing method has greatly increased the sequence yield from the sample:

    We applied this method to aliquots of the two DNA extracts (as well as side fractions) that were previously generated from the 40 mg of bone that comprised the entire inner part of the phalanx (2, 8). Comparisons of these newly generated libraries to the two libraries generated in the previous study (2) show at least a 6-fold and 22-fold increase in the recovery of library molecules (8), which is particularly pronounced for longer molecules (fig. S4).

    It would be too soon to say that a similar increase in yield will happen for other specimens, but obviously, this may bring higher coverage into reach for several specimens that are currently only sequenced at very low coverage, including the Vindija, Mezmaiskaya, and El Sidron Neandertals. We will have to wait and see how the new technique affects ancient DNA recovery going forward.

    I keep telling people that I think it's exciting that research into human evolution is now pushing technology forward. It has often been that paleoanthropology uses technological advances in other fields. But with ancient DNA, we really see an organic growth of technology along with research questions about our evolution. In our work on the ancient genomes, we're making some progress pushing forward knowledge about human biology by understanding human evolution. Evolution really is the fundamental principle of biology, but using evolution to learn about biology sometimes requires traveling through time. Ancient DNA gives us a time machine bringing new insights into reach.


    References

    Synopsis: 
    A technological advance in library preparation gives rise to much better knowledge of the ancient Denisovans
  • Neandertal ancestry "Iced"

    Wed, 2012-08-15 15:24 -- John Hawks

    I've been mobbed with e-mails from readers asking about my reaction to the new paper by Anders Eriksson and Andrea Manica in PNAS, titled "Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins" [1]. The paper asserts that Neandertal similarity in the genomes of living people outside Africa can be explained only in terms of incomplete lineage sorting from the shared human-Neandertal common ancestral population in Africa. If the paper's assertions were accurate, we could go back to thinking that all the genetic heritage of people today traces back to Africa, although we would still need to abandon the idea that the African population had undergone a small bottleneck.

    I have not been posting as frequently the last month or two because I have been out of the country doing science.

    The new paper's press release has given rise to quite a lot of media attention, much of which unfortunately misrepresents our current knowledge of human and Neandertal genomes. Razib Khan summarized the situation on Monday, in a post titled, "Why you shouldn't publish in PNAS". I agree with his criticism, although I have a perspective coming out soon in PNAS. In fact, I suppose this episode shows why everyone should publish in PNAS, because so many journalists will just parrot press releases instead of asking relevant experts. Ewen Callaway did a great job on this story by putting it into the broader context ("Neandertal sex debate highlights benefits of pre-publication"). You will notice how no other science writers with any Neandertal knowledge picked up this press release...

    Paleoanthropology is a field where data are rare and precious, and we do a lot of arguing about the validity of models. I love arguing about the validity of models (Cliff Notes version: All models are wrong).

    Genomics is not such a field. We have abundant data today to compare with Neandertal genomes. Yet puzzlingly, the idea of Neandertal ancestry has been challenged by several papers that haven't performed any new empirical comparisons at all. I'm struggling to figure this out. We have an unparalleled ability to explore the genomes of humans and Neandertals, and we should believe a computer model with no empirical data?

    I've been assessing the Neandertal similarity of 1000 Genomes Project samples here on my blog (e.g., "Which population in the 1000 Genomes Project samples has the most Neandertal similarity?"). This is ongoing research here in my group, but we've been making it open because it tells us immediately that some hypotheses about Neandertal similarity must be wrong. Modeling is a lot of work. We're trying to avoid putting a lot of investment into modeling that will be easily refuted by the next piece of genomic data. Data are flowing now so rapidly that we can afford to be naive empiricists.

    For example, our comparisons quickly refute the hypothesis that Neandertal similarity comes only from ancient population structure in Africa. That hypothesis predicts much more heterogeneity within Africans in Neandertal similarity than exists today. We've shown that the heterogeneity in Africans is basically the same as within Europeans or Asians, and that the variance among African populations so far is quite small. Those are very simple observations, which are consistent with what Yang and colleagues [2] concluded on the basis of the frequency spectrum of Neandertal alleles in large samples of living people. Even though many Neandertal-shared SNP alleles came from incomplete lineage sorting, the signature of excess Neandertal sharing outside Africa must come mostly from recent introgression. In Ewen Callaway's article about this research, David Reich dismissed the new paper by Eriksson and Manica as "obsolete". I agree. The paper describes a model without carrying out any new empirical comparisons, and so has fallen behind where the science has gone.

    Another example is the proportion of Neandertal ancestry. Initially, the proportion of ancestry from Neandertals in living people was argued to be between 1 and 4 percent [3]. That was a model-based estimate that was the best possible under the assumption that Africans have no Neandertal ancestry. We now have a lot more human comparisons, which would make possible a more precise estimate of the mean. I hesitate to provide a new estimate, because we have shown that some Africans have substantial evidence of Neandertal similarity, which throws the baseline for any estimate into question. How much Neandertal ancestry is present in living people must depend on a more complex model of mixture among later populations. The result will still be small (probably less than 6 percent) but understanding this proportion will help us to evaluate when and where Neandertal genes flowed into our populations.

    Here's a third example. I haven't written about here yet, but I have been lecturing about it quite widely over the past few months. Earlier this year, the genome of Ötzi the Tyrolean Iceman was reported by Andreas Keller and colleagues [4]. Aaron Sams and I downloaded the data and have been carrying out several different kinds of comparisons. A picture:

    Otzi 1000 Genomes Neandertal comparison

    I'd like to see the model of African population structure that could explain this result...

    If you'll remember my earlier posts on the 1000 Genomes Project samples, this chart is a histogram of the number of shared Neandertal derived SNP alleles in different samples. The European and Asian samples are substantially greater than either African sample (here, Luhya and Yoruba colored differently). If we took as a baseline that Europeans have an average of 3.5 percent Neandertal, Ötzi would have around 5.5 percent (again, the actual percentage would be highly model-dependent). He has substantially greater sharing with Neandertals than any other recent person we have ever examined.

    You can imagine, we have carried out just about every comparison we can think that could explain this result as anything other than greater Neandertal ancestry. Aaron and I will be putting our manuscript on the arXiv as soon as we've both signed off on all the text and figures, hopefully this week. This is simple stuff, and I see no reason not to be open about it -- anybody with the Ötzi data can immediately do the same thing.

    We think that showing and sharing these comparisons will save people a lot of useless effort. Personally, I can't believe that these people spending effort on population models for Neandertals aren't talking to those of us who have already carried out these comparisons and have already presented them in public. I guess we'll find out if secrecy or openness leads to better science.

    Meanwhile, I can share the abstract of the conference paper I'll be presenting in September at the meeting of the European Society of Human Evolution in Bordeaux:

    Evaluating recent evolution, migration and Neandertal ancestry in the Tyrolean Iceman

    Paleogenetic evidence from Neandertals, the Neolithic and other eras has the potential to transform our knowledge of human population dynamics. Previous work has established the level of contribution of Neandertals to living human populations. Here, I consider data from the Tyrolean Iceman. The genome of this Neolithic-era individual shows a substantially higher degree of Ne- andertal ancestry than living Europeans. This comparison suggests that early Upper Paleolithic Europeans may have mixed with Neandertals to a greater degree than other modern human populations. I also use this genome to evaluate the pattern of selection in post-Neolithic Europeans. In large part, the evidence of selection from living people’s genetic data is confirmed by this specimen, but in some cases selection may be disproved by the Iceman’s genotypes. Neolithic-living human comparisons provide information about migration and diffusion of genes into Europe. I compare these data to the situation within Neandertals, and the transition of Neandertals to Upper Paleolithic populations – three demographic transitions in Europe that generated strong genetic disequi- libria in successive populations.


    References

  • Calculus microbially

    Sun, 2012-06-10 22:01 -- John Hawks

    Molecular archaeologist Christina Warinner gave a TED talk and the main ideas are now in a CNN article: "Why your dental plaque is valuable".

    By applying advanced DNA sequencing and protein mass spectrometry technologies to ancient dental calculus, we can begin to reconstruct a detailed picture of the dynamic interplay between diet, infection and immunity that occurred thousands of years ago. This allows us to investigate the long-term evolutionary history of human health and disease, right down to the genetic code of individual pathogens, and it can teach us about how pathogens evolve and why they continue to make us sick.

    This is really neat work, although the article doesn't go into any new results.

  • Mailbag: Fickle finger

    Fri, 2012-04-06 20:51 -- John Hawks

    Re: Denisova

    Dear John Hawks,
    I would like really to know what decisive arguments allowed scientists to tell Denisova finger went from a female, after nuclear genome sequencing.

    That is quite simple; if the specimen were a male there would be Y-chromosome sequences in the genome.

  • How widespread is Denisovan ancestry today?

    Tue, 2011-11-01 00:32 -- John Hawks

    Last month, David Reich and colleagues [1] reported on estimates of Denisovan ancestry for island and mainland Asian populations. Their most memorable conclusion was that they could find no substantial sign of Denisovan ancestry anywhere on the Asian mainland, or indeed on any island that had ever been connected by land to Asia.

    The distribution was stark, as illustrated by the map from the paper:

    I wrote about the paper when it was released ("Denisovan DNA in the islands, and an Australian genome"), noting:

    Notice the apparent lack of Denisovan ancestry in anyone who lives anywhere that was once connected by land with mainland Asia. I say "apparent" deliberately: Abi-Rached and colleagues reported last month on the widespread distribution of Denisovan HLA types among today's Asian populations, and those may well be products of Denisovan genes that were later selected. I've already identified a handful of other loci that seem to reflect Denisovan ancestry in mainland Asian people. According to the comparisons by Reich and colleagues, such loci must be exceptions.

    Abi-Rached and colleagues [2] had argued that HLA alleles found in the Denisovan genome are presently common in some parts of Asia, and likely reflect local adaptive introgression. Substantial introgression of a small number of genes would not be enough to create a strong genome-wide appearance of Denisovan ancestry. Still, it was a little odd that the first genes anybody looked closely at would provide strong evidence of introgression.

    Now, Pontus Skoglund and Mattias Jakobsson [3] say that Denisovan ancestry is widespread across China and Southeast Asia.

    That conclusion contradicts Reich and colleagues, so why do the studies come to such different results?

    Skoglund and Jakobsson suggest that they have succeeded in finding introgression where others failed because their model accounts for ascertainment bias in the available datasets. SNP data come from genotyping chips, which have been designed using known polymorphisms. Five years ago, we knew much more about polymorphisms in Europe than other parts of the world, and so the HGDP, and HapMap to a lesser extent, do a good job of sampling rare alleles in Europe but miss many rare alleles in Africa and other populations. This is the ascertainment bias.

    Some of the most obvious signs of introgression today are cases where rare alleles are shared with an archaic genome. If ascertainment bias causes you to miss the rare alleles, you'll miss the introgression.

    But that explanation isn't really sufficient to explain the differences between these papers. For one thing, Reich and colleagues [1] also worked hard to account for ascertainment biases in their SNP samples. For another, whole genome comparisons between East Asian samples and the Denisova genome have not yielded evidence of Denisovan ancestry, even though whole genomes have no ascertainment bias. The number of whole genomes so far compared is very small, and so the statistical ability to detect introgression is lower, but Skoglund and Jakobsson actually replicate that null result in their current paper.

    Probably most important, it's not clear that Skoglund and Jakobsson's result can actually be explained by rare alleles. Here is Figure 1e from their paper:

    Figure 1e from Skoglund and Jakobsson (2011). Original caption: Interpolated spatial distribution of the frequency of Denisova alleles at SNPs where Denisova is different from chimpanzee and Neandertal. Sample localities are indicated with rectangles.

    This map represents a clever comparison. It is a heat map of the mean local frequency of the subset of alleles that are present in Denisova but absent from chimpanzees and Neandertals. These are presumptively derived alleles relative to the chimpanzee. The SNPs here are all known to vary in human populations, because they are all included in the HGDP sample. So the map does not represent all the Denisova derived mutations in humans today, only a particular subset that is especially likely to be informative.

    Given that the sites have been picked in a special way, we need to examine carefully how strong the pattern really is. Notice the scale of the heat map. The difference between the orange area in south China, from the green area in north China, is around 0.001, or a tenth of a percent in mean frequency. The actual values are reported in the online supplement, in Table S3. An exception of Yizu in south China who have around 0.006 more than their neighbors. The Yizu sample includes only 10 individuals (9 males, 1 female). The paper does not report the number of SNPs included in this comparison, but it must be a very small set relative to the total, because only a small fraction of human SNPs are known to be derived in Denisova and ancestral in Neandertals.

    With this very small difference in frequencies, I would not rule out the hypothesis that the zone of high Denisova derived frequencies in south China is caused entirely by frequency enrichment of a small number of loci. A handful of genes like the HLA loci observed by Abi-Rached and colleagues might be enough to create this very slight elevation in the average. Hence, the best case is that the data here simply provide greater sensitivity to small amounts of introgression. The worst case is that the pattern may be dominated by the Yizu sample, which is really too small to carry this kind of load.

    The strongest evidence presented in the paper is a comparison of north and south East Asian regions directly. Although the comparison of south China against other regions of the world (Africa, Europe) does not yield significant evidence of Denisovan similarity in this paper, south China differs from north China in essentially the same way that the Oceanian people do from other regions. And the Oceanian populations (here, Papua New Guinea and Bougainville) differ from other regions because of their Denisovan ancestry. So Skoglund and Jakobsson infer that the north/south comparison reflects Denisovan ancestry as well.

    I think this comparison is sound, and the question is, how much introgression would this pattern require? The paper answers that question in this way:

    Quantitative estimation of the precise fraction of Denisova-related ancestry in Southeast Asian populations based on genotype data are unfortunately sensitive to ascertainment bias and genetic drift, and such estimates will require genome sequence data that are currently unavailable. However, both the PCA results (Fig. 1B) and the approximately six times lower absolute values of the D statistic in tests between Northeast Asians and Southeast Asians compared with tests between Northeast Asians and Oceanians (Table S4) indicate a relatively low fraction of Denisova-related ancestry. Thus, the fraction is likely to be smaller than both the ~5% fraction of Denisova-related ancestry present in Oceanians and the ~2.5% fraction of Neandertal ancestry present in non-Africans (23, 24), perhaps around 1%.

    One percent is an amount that whole genome comparisons at present do not rule out, and I think it's a reasonable guess. I would not have thought we could rule out a one percent contribution from other, non-Denisovan archaic people, for example.

    We aren't very far from a more definitive answer of this question, as the data continue to accumulate every day. What I find interesting is the way that models can generate these 1% differences in ancestry proportions, depending on sampling and the pattern of migration assumed to have happened in the past. Two estimates that differ by less than a percent are not really different. This paper provides the suggestion of a more widespread Denisovan legacy, and I accept that as a possibility.

    I should mention: less than one percent of a half billion people is still a very large number, added to five percent of the indigenous population of New Guinea and Australia, and smaller fractions of other island populations. The total amount of Denisovan legacy present in living people probably exceeds the population of Earth at the time the Denisovans lived.


    References

    Synopsis: 
    A new paper contradicts earlier work, by suggesting a widespead Denisovan legacy in south China
  • Ancient genomes review

    Thu, 2011-08-18 12:21 -- John Hawks

    Mark Stoneking and Johannes Krause present a review article in the current Nature Reviews Genetics [1] that gives an overview of the science of ancient genomes.

    I think the article is very good about presenting aspects of ancient genome sequencing and assembly, and the attendant problems and biases. I find myself explaining this stuff a lot and it's useful to have the concise descriptions that Stoneking and Krause provide here. For example, here's a paragraph that describes mapping bias:

    However, there are important limitations to current approaches to ancient genome assembly owing to the short length of ancient DNA fragments and the repetitive nature of large parts of mammalian genomes (which creates ambiguities in sequence read mapping). For example, short fragments can cause mapping bias, as highly divergent short fragments cannot be accurately mapped to a reference genome. Fragments may also map to different locations in different reference genomes depending on the completeness and accuracy of the reference genomes. For example, to calculate divergence times between an ancient hominin genome sequence, modern humans and chimpanzees, it is important to first verify that the ancient DNA sequences map to orthologous positions in both the human and chimpanzee genomes. These issues mean that even at 20-fold coverage (which was the coverage obtained for the Saqqaq genome) not more than 85% of the genome could be reconstructed; full genome sequences from fossil samples can probably never be achieved with current methods.

    The article discusses chemical changes in ancient genomes, methods to detect contamination, and specialized methods such as targeted DNA hybridization capture.

    I'm less happy with the second half of the article, which discusses population genetics. A few computational techniques are very briefly described (for example, unsupervised versus model-based approaches) and Stoneking and Krause give quick synopses of some population genetic inferences reported during the last year.

    I guess where I perceive a difference between the first (sequencing) and second (population genetics) parts of the article, is that the sequencing part emphasizes the many problems with analysis and describes approaches to overcome them. It seems as if there's a vibrant discussion of sequencing and biochemistry, giving rise to a fuller account. Meanwhile, the second part, discussing human population history, seems to accept results relatively uncritically. There is very little citation of anthropological or archaeological work, and little indication that the methods of population genetic inference may have weaknesses or assumptions that color their results.

    It's great to see review articles on this topic, given the broad interest I expect we'll see more of them soon. A flood of ancient genetic data means a lot of new results that need to be summarized. But a summary is really not enough -- we need critical examination of the assumptions underlying population genetic inferences and a discussion of how they accord with what we know from archaeology and paleontology.


    References

    Synopsis: 
    A new review article by Mark Stoneking and Johannes Krause presents some useful information.
  • Did Denisovans have genetic adaptations to high altitude?

    Tue, 2011-06-21 12:26 -- John Hawks

    We don't really know the extent of territory that might have been occupied by the population represented by the Denisova genome. The signs of mixture into the Melanesian/New Guinea population suggests that the Denisova individual shared many genes with people who lived somewhere along the South or Southeast Asian coast. Denisova itself, however, is in the Altai Mountains.

    Last week I wrote some thoughts about the possible introgression of HLA alleles from Denisovans into more recent populations. HLA genes pose many problems for testing this hypothesis -- including the difficulty of identifying the alleles in a low-coverage genome and the high chance of incomplete lineage sorting of ancient alleles in recent populations. Other parts of the genome in principle may be much easier to find evidence of introgression.

    If an allele that originated in Denisovans had some advantage in later populations, it might today be found very widely spread across Asian populations, even if the amount of Denisovan ancestry in most of these populations is very small. This was the theme of my paper with Gregory Cochran several years ago [1] ("The inevitability of introgression"). The probability that a single copy of an advantageous allele will survive and increase in the population is roughly 2s, where s is the fitness advantage in a heterozygote carrying the allele. A relatively small number of copies of an allele might have entered a recent human population by introgression from some ancient population, but these few copies would have a high likelihood of surviving and increasing in frequency, possibly toward fixation. HLA alleles could easily be in this category, but the challenges identifying them and high chance of ILS make the hypothesis hard to test.

    Another strategy is to identify genes that have been selected in recent populations and see if the linked haplotype shows up in the Denisova genome. Recently, several studies have attempted to identify genes related to high altitude adaptation in Tibetans. At least some Denisovans lived in the mountainous areas of central Asia, and so I'm curious whether they might have some alleles adapted to this environment. The Altai are not nearly as high as the Tibetan plateau (in fact Denisova itself is not much higher than western Kansas), and we don't know how long Denisovan people might have been resident in Central Asia, but if we're looking for selected alleles there are some strong candidates in this category of genes.

    So let's look at some of them. All positions here are mapped to the hg18 human genome assembly.

    Yi and colleagues [2] find a strong frequency difference between China and Tibet for a SNP in EPAS1, at chr2:46441523. The derived allele, G, has a frequency of 87% in their Tibetan sample but only 9% in their Chinese sample (and zero in Denmark). The Denisova genome is represented by two reads at this site, both C, the ancestral allele. We don't necessarily have to accept that this is a functional site, but as the marker most strongly differentiating the high altitude population it would likely be closely linked to any functional variant. So the Denisova allele suggests that this ancient individual lacked whatever functional variant might currently be common in Tibetans for this gene.

    Simonson and colleagues [3] took a different approach, focusing on candidate genes that they argued a priori were likely to be involved in adaptation to hypoxia because of their physiological role. They evaluated these genes for evidence of positive selection in Tibetans, finding several candidate haplotypes for recent adaptive evolution to high altitude.

    For each of five genes, they identified a three-locus "core selection haplotype" that shows signs of selection within Tibet. The purpose of these three-SNP haplotypes was to examine the correlation of haplotypes and phenotypes in a sample of people where physiological data were taken. So they are intended as tags, not as comprehensive and unique identifiers of the candidates at the genetic level. But the three-locus haplotypes are the only ones reported in the supplement to the paper, so that's what I have to compare.

    EGLN1: The three-allele candidate selected haplotype consists of A at chr1:229793717, T at chr1:229667980 and T at chr1:229665156. Denisova apparently has the selected haplotype with A at chr1:229793717 (2/2 reads), T at chr1:229667980 (3/3 reads) and T at chr1:229665156 (1/1 reads). However, it is not obvious whether this is significant. All three alleles on the candidate selected haplotype are the ancestral (present in chimpanzees and gorillas) alleles, which are much more likely to show up in the archaic genomes than derived alleles. These ancestral alleles are also present in several of the whole genomes provided along with the Denisova sequence reads. So it's not clear to me how good a candidate for selection the haplotype really is.

    CYP17A1: Here the three-allele candidate selected haplotype includes G at chr10:104568521, G at chr10:104594906, and C at chr10:104517420. Denisova has C (5/5 reads, ancestral), T (4/4 reads, ancestral), and C (3/3 reads, ancestral). Again, Denisova has the all-ancestral haplotype here, but in this case it is not the selection candidate.

    PTEN: The selected candidate haplotype is G at chr10:89770364, C at chr10:89790851 and C at chr10:89778618. Denisova has G (5/5 reads, ancestral), T (2/2 reads, derived), and C (4/4 reads, ancestral). Not selected.

    I always find it interesting when the Denisova genome has a derived allele at an interesting site -- it is the shared derived alleles between these archaic genomes and living people that constitute evidence of genetic persistence of the archaic people. No single site carries that information (any one allele may be shared by incomplete lineage sorting), but I still like to note them. The Papuan and half the Native American, Sardinian and Mongolian reads share the derived T at chr10:89790851 with Denisova.

    HMOX2: The candidate selected haplotype has C at chr16:4456093, T at chr16:4465266, T at chr16:4442515. Denisova has this candidate selected haplotype: C (3/3 reads, ancestral), T (4/4 reads, ancestral), T (5/5 reads, ancestral). That haplotype may also be in the Cambodian whole genome accompanying the Denisova data, and can't be ruled out for the Mongolian. Again, the all-ancestral haplotype and wider distribution argue against the hypothesis that this haplotype was specifically selected in Tibet.

    PPARA: The core candidate selected haplotype has A at chr22:44827140, C at chr22:44832376 and T at chr22:44842095. Denisova has A (8/8 reads, ancestral), A (5/5 reads, ancestral), and C (2/2 reads, ancestral). Notice again, Denisova has the all-ancestral haplotype. As an ancient sequence, we are finding this is the usual case, human-derived alleles are just rarer in this genome.

    OK, where are we? Out of six genes that are candidates for selection on altitude adaptation in Tibetans, the Denisova genome has two -- at ELGN1 and HMOX2. In both cases, the core selected haplotype consists entirely of ancestral alleles, and so I think they are actually poor evidence of introgression on the surface. I would test them by looking at more SNPs linked to the presumed selected haplotype, hoping to find some derived SNPs shared by the Denisovan genome and the presumed selected haplotypes. Unfortunately, publications do not yet routinely report long haplotypes, so it will take some more digging to test these cases.


    References

    Synopsis: 
    Noodling through the Denisova genome data for signs of candidate altitude adaptations.
  • A problem of fuzzy mammoths

    Sat, 2011-06-04 03:56 -- John Hawks

    Paleogenomics is changing the way we study evolution. In a number of cases, it now allows us to study extinct organisms with the same methods as we study living ones. A study last year in PLoS Biology[1] used genetic evidence from living elephants, extinct mammoths and mastodons, to reconstruct the times that these species diverged.

    Woolly and Columbian mammoths

    Mammoths are back in the news this week because of a paper by Jacob Enk and colleagues [2]. I think this paper represents a very nice collaboration of paleontologists (Dan Fisher, Ross MacPhee) and paleogeneticists (led by Hendrik Poinar's lab). It's refreshing to read a paper that describes not only the way that the DNA was sampled but also the age and morphological attributes of the sampled mammoths. For example:

    This 60+ year old bull is exceptionally well preserved, and exhibits the classic character suite of his species, including low molar lamellar frequency (Figure S1 in Additional file 3), broadly divergent tusk alveoli, a markedly downturned mandibular symphysis, and tremendous body size. We used tusk fragments for the shotgun sequencing, and both tusk and bone samples for PCR and Sanger sequencing.

    Every genetics paper should have descriptions like that. Very nicely done.

    As an anthropologist, I pay a lot of attention to studies of elephants, because they are another long-lived social mammal, in some ways closer to us in population structure and dynamics than most primates. As in the case of hominins, some taxonomists have argued that we should recognize lots of fossil elephants, others question that distinctiveness. And just as we are discovering for hominins, the elephants are showing evidence for population mixture among groups once considered to be different species.

    Enk and colleagues sampled the mtDNA from two Columbian mammoths and one woolly mammoth from North America. The Columbian mammoth is seen by pretty much everybody as a separate species (Mammuthus columbi) from woolly mammoths (Mammuthus primigenius), and paleontologists have thought that they diverged 1-2 million years ago. Woolly mammoths were Holarctic animals, with a range that extended from Europe to North America, while Columbian mammoths were limited to the Americas south of the U.S.-Canada border, roughly. Already other researchers have recovered dozens of woolly mammoth sequences, and their phylogenetic relations are well characterized (as shown in the paper). What Enk and colleagues show is that the two Columbian mammoths both have mtDNA sequences that belong to a single, relatively young clade that is present in woolly mammoths in Alaska and Yukon.

    The simplest explanation is that the Columbian and woolly mammoths of North America were exchanging genes.

    The authors also suggest the possibility of incomplete lineage sorting (ILS) -- the retention of a single ancestral clade in two isolated species. This seems unlikely given the topology of the clade within woolly mammoths, but the authors omitted the crucial test: the date of the most recent common ancestor of the mtDNA within the clade. If it's truly younger than a million years, we might easily rule out ILS.

    Forest and savanna elephants

    A lot more information about the variation within living elephantids has appeared within the past year. Looking at them compared to the fossil species, it's pretty clear that taxonomists haven't done well matching taxonomic levels in these groups. Here is a quote from the paper by Rohland and colleagues, who considered the genetic relationships of forest and savanna elephants in Africa.

    We also find that savanna and forest elephants, which some have argued are the same species, are as or more divergent in the nuclear genome as mammoths and Asian elephants, which are considered to be distinct genera, thus resolving a long-standing debate about the appropriate taxonomic classification of the African elephants.

    Forest and savanna elephants may deserve a species rank, but we might equally say that the mammoth-Asian elephant divergence doesn't merit the genus rank it has historically been given. As reconstructed in the paper, the forest-savanna elephant and Asian elephant-mammoth divergences both fall within ranges from 2.5 to 5.5 million years. Some widely-recognized mammalian genera (e.g., Homo) are younger, but most mammalian divergences in this range of times are recognized below the genus rank. Should mammoths be put into Elephas? That would probably be a better recognition of the adaptive radiation of Eurasian elephants.

    One way to consider the question is by examining the pattern of speciation. With a large number of sampled loci, a far more detailed consideration of speciation can be achieved. This brings us back to a more careful examination of ILS.

    We find a higher rate of inferred [Incomplete Lineage Sorting (ILS)] in forest and savanna elephants than in Asian elephants and mammoths: (FE+SE)/(AL+ML) = 3.1 (P = 4×10−8 for exceeding unity; Table 2), indicating that there are more lineages where savanna and forest elephants are unrelated back to the African-Eurasian speciation than is the case for Asian elephants and mammoths (Table 2). This could reflect a history in which the savanna-forest population divergence time TFS is older than the Asian-mammoth divergence time TAM, a larger population size ancestral to the African than to the Eurasian elephants, or a long period of gene flow between two incipient taxa. (We use upper case “T” to indicate population divergence time and lower case “t” to indicate average genetic divergence time (t≥T)).

    "A long period of gene flow" would reflect a very gradual speciation event, which might argue that the two resultant species should be classified in the same genus. Or...it might suggest that the ecological differentiation actually commenced much earlier in time than the modal estimate, with later hybridization. Mammoths and Asian elephants, by contrast, seem to have a cleaner separation even though the genetic relationships are almost equally close.

    We're not quite able to test these alternatives, yet, because only a single individual has been sampled from most of these species. Testing for gene flow really will require larger samples of individuals. In particular, the longer geographic distance between Asian and mammoth samples compared to forest-savanna samples may mean that population structure is hiding within this comparison. I just find it remarkable that genetics has arrived at a point where the pattern of speciation of extinct species is within reach.

    The paper uses the extinct mammoth and mastodon comparisons as a frame for discussing the diversity and distinctiveness of African forest elephants. This is in a way unfortunate, because the mammoth-centric questions are probably more interesting to most readers. There's still a lot of productive biology to do there. But the status of forest elephants is a useful hook to hang a paper upon. Whether forest elephants should be given the status of a species has been a hot topic in proboscidean evolutionary biology during the past 10 years. Debruyne [3] gave a good historical review of the issues:

    Indeed, when discovered by Matschie in 1900, [forest elephants] were described as either a potential species, or a regional race of Cameroon (Matschie, 1900). Matschie advocated the usefulness of hydrographical basins in order to subdivide African elephants into distinct units. He thus contributed to the profusion of new taxa to be defined by the turn of the 20th century, so that the taxonomy of the African elephant quickly became extravagant, the most meagre morphological evidence being used to acknowledge a new form (Lyddeker, 1907). Up to 22 forms of Loxodonta were described that were finally assigned either to the savannah or the forest elephant—see Laursen and Bekoff (1978) for a review. Morphologists have addressed this question for decades according to their personal taxonomic perspectives. Some have considered that, although displaying a smaller size, smaller round ears—responsible for their designation as “cyclotis”—more toenail structures on both feet, thin down-pointing tusks and a flatter back and forehead, forest elephants belong to the same species—i.e., Loxodonta africana—as savannah elephants with whom they assumed were reproductively compatible (Backhaus, 1958; Carroll, 1988; Cousins, 1996). Many cases of intermediate morphology have supported this view, which had become prevalent (Laursen and Bekoff, 1978). Conversely, the “splitter” attitude led other authors to put forest elephants apart on the basis of the same anatomical distinctiveness (Frade, 1931; Frade, 1933; Allen, 1936; Petter, 1958). More doubtful morphological characters—extent of hair-covering, color of the skin, carriage of head—have been put forward to support this division.

    The problem became complicated upon recovery of genetic information. Most early phylogeography has been done using mtDNA. The deepest mtDNA clade in the African elephants defines two haplogroups, both of which are shared by the forest and savanna populations. Based on large samples of mtDNA alone, the two populations have been recently exchanging genes.

    Early analyses of nuclear microsatellites indicated the opposite pattern, with relatively little allele sharing between the two elephant varieties. I became interested in the question after a paper by Régis Debruyne (a coauthor on the current paper by Enk and colleagues as well). Debruyne emphasized the great gaps in our sampling of geographic variation in African savanna elephants. Providing some additional data, he showed a very deep mtDNA clade in many forest elephants that was also in many savanna elephants. He argued that the widespread evidence of gene flow refutes the hypothesis of different biological species of elephants.

    Rohland and colleagues also addressed the discordance between mtDNA and nuclear genetic variation.

    Our study also infers a strikingly deep population divergence time between forest and savanna elephant, supporting morphological and genetic studies that have classified forest and savanna elephants as distinct species [13],[16]–. The finding of deep nuclear divergence is important in light of findings from mtDNA, which indicate that the F-haplogroup is shared between some forest and savanna elephants, implying a common maternal ancestor within the last half million years [21]. The incongruent patterns between the nuclear genome and mtDNA (“cytonuclear dissociation”) have been hypothesized to be related to the matrilocal behavior of elephantids, whereby males disperse from core social groups (“herds”) but females do not [13],[38]. If forest elephant female herds experienced repeated waves of migration from dominant savanna bulls, displacing more and more of the nuclear gene pool in each wave, this could explain why today there are some savanna herds that have mtDNA that is characteristic of forest elephants but little or no trace of forest DNA in the nuclear genome [13],[14],[39],[40].

    The scenario may fit with the facts. It was proposed first by Roca and colleagues [4], who proposed it as a "genomic record of ancient habitat changes", which had brought the forest and savanna populations into contact across shifting hybrid zones. They reiterated the hypothesis in a later paper [5] supported with larger samples.

    Further progress will require larger samples and better models. I was interested in Debruyn's account of the geographic holes in genetic sampling across the African range of forest elephants. A highly-resolved test of recent gene flow demands finding and sampling potential contact zones between two populations. Some hypotheses can be tested surprisingly strongly using only a single individual from each population. But the power of such tests depends on the pattern of inbreeding in the past. We can imagine that the ancestry of a single individual stretches through the genealogical network of a species like a cone, widening into the past. Recent events are poorly tested by single individuals.

    If geographic structure is strong enough, distant populations will approximate different species in their recent genealogical connections. So the single individuals in the more recent study by Rohland and colleagues [1] carry a lot of weight.

    There are many parallels here between hominin population dynamics and the elephants. Also, as I pointed out in 2006, the elephant situation helps to clarify how we should consider genetic samples from living great apes.

    The past year has seen a real reversal in the race between data and analysis. For a long time, sequencing has been a bottleneck in serious analysis of population history. The genealogical connections among individuals ramify by double in every generation, so that the inheritance of a single gene reflects one possibility among countless trillions. If we can only afford to sequence a single gene, we are limited to a single sample of the genealogical links among individuals. Whole genomes give enormous samples of the genealogical history among samples. But they create their own challenges of analysis.


    References

    Synopsis: 
    Mammoth paleogenomics and African elephant population structure pose similar problems of sampling.
  • Mummy trouble redux

    Thu, 2011-04-28 22:56 -- John Hawks

    Speaking of Jo Marchant, she has a long article in the current Nature about the mummy DNA controversy ("Ancient DNA: Curse of the Pharoah's DNA").

    I wrote about the problem earlier this year: "Mummy troubles". My opinion is that this work has been relentlessly hyped and hasn't presented adequate information to assess whether the results are genuine:

    Can we accurately type STR alleles from mummies? I wouldn't rule it out given the quantity of tissue available, but there should be many more controls for a high-profile study like this one. The work took place over several years, so it's a bit unrealistic to expect the latest sequencing methods. But JAMA and the Discovery Channel presented the results as important science. They should have ensured that solid answers for the obvious questions were at hand.

    Marchant digs up some quotes from the authors:

    The researchers deny that the television involvement put them under excessive pressure to produce dramatic results. But working for the cameras did make a challenging project even tougher, says Pusch. "Each time they came in to film, we had to close the lab for a week to clean." Eventually the TV crew was banished and the lab scenes reconstructed.

    The article gives an interesting sociology of the competing groups of ancient DNA researchers. I dispute that the field is evenly divided, however. There are a very long list of laboratories doing ancient DNA work according to standardized protocols on skeletal remains from the past several thousand years. Only a few groups claim to be working with nuclear DNA or microbial DNA, the areas of contention in the mummies. Among that small set of labs, most follow similar, conservative techniques.

    Then there are the handful that come up with "surprising" results time and again. If the surprising results are accompanied by substantial evidence, I have no problem. But when a paper has no clear explanation why it arrives at results that others think impossible, that raises my skepticism.

Pages

Subscribe to ancient DNA

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.