john hawks weblog

paleoanthropology, genetics and evolution

population structure

  • New Denisova and Neandertal DNA results reported

    Fri, 2013-05-17 08:37 -- John Hawks

    Elizabeth Pennisi reports from the Biology of Genomes conference at Cold Spring Harbor, New York: "More Genomes From Denisova Cave Show Mixing of Early Human Groups". The article describes a talk by Svante Pääbo about new results from Neandertal DNA, as well as new analyses of the Denisovan genome. It has lots of details for those interested in these topics, but the article is paywalled, so I can only share a little of it here:

    From the detailed genomes of both Neandertals and Denisovans, Pääbo and Montgomery Slatkin of the University of California, Berkeley, estimated that 17% of the Denisovan DNA was from the local Neandertals. And the comparison revealed another surprise: Four percent of the Denisovan genome comes from yet another, more ancient, human—"something unknown," Pääbo reported. "Getting better coverage and more genomes, you can start to see the networks of interactions in a world long ago," says David Kingsley, an evolutionary biologist at Stanford University in Palo Alto, California.

    With all the interbreeding, "it's more a network than a tree," points out Carles Lalueza-Fox, a paleogeneticist from the Institute of Evolutionary Biology in Barcelona, Spain. Pääbo hesitates to call Denisovans a distinct species, and the picture is getting more complicated with each new genome.

    We have been finding some of this in our comparisons of the genomes also. These were not isolated groups of ancient people, and some of them were more similar to living people than others. It is just wonderful to have more and more DNA coming out -- although that makes it hard to think we won't learn something new from high-coverage data that will require us to re-run various comparisons. That's the cost of discovery!

    Meanwhile, the article sheds light on two interesting contradictions in the Denisova data. The analysis of the high-coverage data last fall [1] noted that the pinky bone genome is consistent with a very small long-term effective size, because of its limited genetic variation ("Denisova at high coverage". These results included a "drastic decline in size" around the time the Denisovans were estimated to have separated their population from the ancestors of living sub-Saharan Africans.

    That result was curious in comparison with the mtDNA evidence. The Denisovan mtDNA is substantially more divergent from living human and Neandertal mtDNA, with an estimated time for the last common ancestor of mtDNA among these groups a bit more than a million years ago. In the initial analysis of the Denisova genome, Reich and colleagues [2] pointed out that even a deep divergence might be consistent with a neutral population history in a single population. But a population of radically reduced size, with a substantially more recent common ancestry shared with Neandertals and other ancestors of living people? Seems odd.

    Now, we may be learning that the Denisovan genome itself represents different ancestral groups -- not only a more ancient "something unknown" population, but substantially the local Neandertals. That kind of mixture is not the population history described by papers on the Denisova genome so far. And a third Denisovan mtDNA from one of the third molars at the site is substantially different from the other two, pointing to greater mtDNA diversity within the Denisovan population than now known from either Neandertals or living people.

    What does it mean? I don't think there's a contradiction here in the data. What this shows is that the methods applied to the data have been too simplistic. The methods will come to a result, but that result may not fit the data as well as a population model with more complexity. Looking only at one kind of comparison -- as the Li and Durbin model applied to the Denisova genome by Meyer and colleagues last year [1] -- will probably not give a result that describes the true population history. We need to keep our minds open to more complex population histories that may be more consistent with other sources of data, including archaeological and fossil information.


    References

    Synopsis: 
    A talk on new ancient DNA results at the Biology of Genomes conference
  • Neandertal anti-defamation files, 17

    Tue, 2013-01-01 17:30 -- John Hawks

    Let no one say that I'm an uncritical voice about the many advantages of releasing preprints. They do have their downsides. Lack of editing is one.

    Here's a passage from a new preprint from Peter Waddell and Xi Tan, "New g%AIC, g%AICc, g%BIC, and Power Divergence Fit Statistics Expose Mating between Modern Humans, Neanderthals and other Archaics":

    The apparent lack of Denisovan alleles on the X chromosome suggested that some of these archaic interbreeding events were male biased, that is archaic males mating with modern females (Waddell, 2011). This was formerly dubbed the “archaic Ron Jeremy” hypothesis, after the well-known American thespian. Formerly known, because a journal editor has recently urged us to alter our manuscript, to avoid confusion with a “Ron Jeremy Event”, which they referenced to the Urban Dictionary. The new synonymy is the “lecherous archaic man” hypothesis.

    I'll return to the argument in the paper later, I just wanted to consider the question of Neandertal similarity to well-known thespians. This is a followup to another preprint from last 2011, which addressed the question of male-biased gene flow into the ancestry of Papua New Guinea from Denisovan peoples ("Homo denisova, Correspondence Spectral Analysis, Finite Sites Reticulate Hierarchical Coalescent Models and the Ron Jeremy Hypothesis"). From that preprint:

    While the origin of the unusual features of the NSYFHP pattern is just a hypothesis at this stage, it is testable and deserves a name, so we call it the “Ron Jeremy hypothesis” (after the accomplished American thespian Ron Jeremy, who is adroit at debauching modern young women, whose father’s might well call him a Neanderthal or a Denisovan, and who looks remarkably like reconstructions of these archaic humans in museums, including being very big boned).

    Big boned.

    Similarly, we may refer to the low frequency of the NSYFHP on the X chromosome as “Ron’s Grandfather hypothesis” which is the mixing of the Denisovan lineage with an even more ancient hominid lineage due to a male biased infusion.

    Obviously we badly, badly need a better system of terminology to discuss the relationships of archaic human groups, including MSA and earlier Africans, which we now understand to have been subject to recurrent gene flow. Male-biased gene flow has often happened in human groups, sometimes due to warfare or the dominance of elites, sometimes as a simple function of greater male dispersal. Male-biased gene flow also appears to characterize orangutan population history, but not chimpanzees, so it depends on species-specific aspects of population structure and dispersal strategies.

    We unfortunately have a 150-year history of looking at Neandertals, and secondarily at other archaic human groups, as strange evolutionary dead-ends. When faced with the evidence that these ancient people are among our ancestors, some scientists have turned first to the idea that mating among ancient people was exotic and strange. Hence the "Ron Jeremy" angle.

  • Mailbag: North and South China

    Mon, 2012-11-19 09:08 -- John Hawks

    I read with interest your post on:
    http://johnhawks.net/weblog/reviews/neandertals/pigmentation/neandertal-...

    in particular:
    "People of Han Chinese ethnicity sampled in Beijing appear to have on
    average a half percent more Neandertal ancestry than people of the
    same ethnicity sampled in southern China."

    Apologies if you know this already but Han Chinese civilization
    started in the Yellow River area and only later expanded south. The
    original people in the south of China are Viet people and have more in
    common with modern Vietnamese. They all became "Han" people after
    their kingdoms were conquered by the north and are really Han in name
    only. Northern and Southern Chinese people look different and their
    spoken dialects (languages) are mutually incomprehensible to each
    other.

    Chinese people from the province of Shantung have the reputation of
    being the biggest in size, always attributed to their diet of wheat,
    but they are probably the last purest reservoir of Neandertal genes in
    the East. Shantung people generally have big noses, fair skin and big
    bones.

    Yes indeed, these are very deep differences, at least as great as between northern and southern Europe genetically, and maybe more. That's why we find the contrast so useful in comparison with the archaic human genomes. The current samples are not ideal because the "South Chinese" were sampled in Beijing based on ancestry, and so are a diverse set. We are hoping soon to have data from many more Southeast and Northeast Asian populations, which will give us some resolution on when things changed.

  • ASHG notes on Gene Expression

    Sun, 2012-11-11 17:13 -- John Hawks

    Razib Khan has started writing up his notes on this week's conference of the American Society of Human Genetics: "Reflections on the evolution at ASHG 2012". He includes some reactions on the presentations in human population history, which will be well worth following. There's an exciting agenda of discovery underlying many of the current projects.

    Khan mentions the work on Neandertal genetics at the meeting:

    Sriram Sankararaman had a poster on Neandertal admixture in modern human lineages. In the broad outlines the Reich lab and the Wall lab seem to agree (along with others, like Melinda Yang in the Slatkin lab). We’re seeing the convergence of a new orthodoxy/paradigm.

    I agree that a new paradigm is being written, but I don't expect it to rise to an orthodoxy. At the moment, there is an obvious path forward with extensions of standard tools and new data, and that is what constitutes the active research paradigm. I think of this as a path of least parameters. But so far nobody writing outside our group has published any serious effort to match genetic results with archaeological evidence.

    Thus far, some of the reactions by established players in archaeology can be described as falling in Pauli's "not even wrong" category. Paleogenomes just shocked the systems of some people who should really have hedged their bets on modern human origins. But modern human origins are no longer the interesting issue. Genetics has moved the ratchet forward, and there is no going back to the simple paradigm.

    Now we have to grapple with a complex population history. That history was multilayered, with many more than one or two waves of significant admixture leading to the samples at hand. The great promise is that genetics will at last allow us to test a lot of anthropological assumptions about human hunter-gatherer population dynamics. But the theoretical challenge is that admixture estimates from genetics are conditioned on extremely simple population models that are really far from the ways we know humans have interacted in the past.

    On that note, I will point to my current paper, which has just gone online in the Journal of Anthropological Sciences: "Dynamics of genetic and morphological variability within Neandertals". As I put the paper together, I began to appreciate the difficulty of describing each of these different sources of data -- genetic, morphological and archaeological -- for specialists in the other areas. I will post on some of my favorite parts of the paper later in the week.

  • The North African Neandertal descendants

    Thu, 2012-10-18 16:25 -- John Hawks

    A new paper by Federico Sánchez-Quinto and colleagues reports on comparisons of North African population samples with the Neandertal DNA project data [1]. The paper shows that North African populations also carry a substantial trace of Neandertal ancestry, like living populations outside of Africa, much more than populations of sub-Saharan Africa.

    One of the main findings derived from the analysis of the Neandertal genome was the evidence for admixture between Neandertals and non-African modern humans. An alternative scenario is that the ancestral population of non-Africans was closer to Neandertals than to Africans because of ancient population substructure. Thus, the study of North African populations is crucial for testing both hypotheses. We analyzed a total of 780,000 SNPs in 125 individuals representing seven different North African locations and searched for their ancestral/derived state in comparison to different human populations and Neandertals. We found that North African populations have a significant excess of derived alleles shared with Neandertals, when compared to sub-Saharan Africans. This excess is similar to that found in non-African humans, a fact that can be interpreted as a sign of Neandertal admixture. Furthermore, the Neandertal's genetic signal is higher in populations with a local, pre-Neolithic North African ancestry. Therefore, the detected ancient admixture is not due to recent Near Eastern or European migrations. Sub-Saharan populations are the only ones not affected by the admixture event with Neandertals.

    The interesting aspect of the paper is that the authors attempted to separate the ancestry of North African samples into a pre-Neolithic indigenous African component, and a residual component that represents more recent gene flow into North Africa, from all sources. The historic movement into North Africa has been fairly cosmopolitan, involving sub-Saharan Africans, Arabs, Medieval Europeans, Romans, Carthaginians and many other peoples. Sánchez-Quinto and colleagues used the ADMIXTURE program to try to sort out a pre-Neolithic indigenous component and analyze that specifically for Neandertal similarity.

    Unsurprisingly, the fraction of estimated sub-Saharan African ancestry in each population sample was inversely correlated with the estimated Neandertal ancestry. That is, the more a population looks like sub-Saharan Africans, the less Neandertal it has.

    Here's what's surprising: When they sorted out parts of the genome in Tunisians that ADMIXTURE determines to be most likely from pre-Neolithic North Africans, they found these parts of the genome had more Neandertal ancestry than typical of the CEU sample of northern European ancestry. Is it possible that ancient North Africans had more Neandertal similarity than today's Europeans?

    Sánchez-Quinto and colleagues suggest that the Neandertal ancestry in this population came in Upper Paleolithic times from the Near East. That is possible, or some of the Neandertal similarity may reflect ancient African population structure. Really I think we will have to do a finer analysis of chromosome blocks to examine the subset of shared Neandertal derived alleles that reflect introgression versus incomplete sorting from the ancestral African population. It will be very interesting to examine more closely the mixture of population history within Egypt, through which most Near Eastern pre-Neolithic population movement must have come.

    The authors note that the distribution of Neandertal similarity outside Africa increases with distance from Africa.

    A previous study [26] observed that the similarity to Neandertals increases with distance from Africa and suggested this could be explained by SNP ascertainment bias plus a strong genetic drift in East Asian populations. Nonetheless more complex, population-biased, ascertainment schemes might have additional effects (i.e bottlenecks), but these are not expected to significantly increase the rate of false positives in admixture tests [31]. The Tunisian population has been reported to be a genetic isolate [17] so it is plausible that part of the signal detected is actually due to genetic drift. However, this should not affect the other North African groups in our study. Finally, given that SNP arrays are based on common alleles and probably the relevant admixture information is encoded within the rare and very rare alleles, the potential bias, if anything, will underestimate ancient hominid admixture signals, as shown in previous studies [2],[3].

    This pattern was also observed by Meyer and colleagues earlier this year [2], and I discussed it in my post on that paper ("Denisova at high coverage"). Both papers note that ascertainment bias may contribute to this pattern. I added that Meyer and colleagues had assumed that genes found in sub-Saharan African populations could not have come from Neandertals, which greatly biased their estimates against Europe and West Asia, considering historical and prehistoric gene flow across the Sahara and along the Indian Ocean coast. So I'm not yet accepting the relative numbers of Neandertal ancestry from different populations, as we don't know that they have all come from consistent assumptions. In particular, an elevated amount of Neandertal ancestry in China -- this paper puts it almost as double the amount of Neandertal ancestry in northern Europeans -- is unlikely. There is no pattern of bottlenecks that can give rise to that excess without additional population mixture, and hard to see where such population mixture would have happened without also affecting the ancestors of Europeans. Instead, we have some work to do in reducing the biases on these comparisons.


    References

    Synopsis: 
    A study of North African genetic variation shows that Neandertal genes were widespread in the area before the Neolithic.
  • Denisova at high coverage

    Thu, 2012-08-30 15:25 -- John Hawks

    Science today has released the new paper on the Denisova high-coverage genome by Mattias Meyer and colleagues from Svante Pääbo's group [1]. There is a lot of material in the supplements of the new paper, and it will take some time to work through implications.

    The basics are quite simple: The paper confirms the initial interpretation of the genome by David Reich and colleagues [2] in most respects. The mixture with a whole-genome sample from Papua New Guinea is estimated at 6% Denisovan ancestry. Confirming the later paper by Reich and colleagues [3], the new analysis finds no significant evidence of Denisovan ancestry in a mainland south Chinese (Han Dai) individual, and can exclude it down to a very small fraction:

    However, in contrast to a recent study proposing more allele sharing between Denisova and populations from southern China, such as the Dai, than with populations from northern China, such as the Han (17), we find less Denisovan allele sharing with the Dai than with the Han (although non-significantly so, Z = –0.9) (Fig. 4B) (table S25). Further analysis shows that if Denisovans contributed any DNA to the Dai, it represents less than 0.1% of their genomes today (table S26).

    That is a mystery to be explained. How did Asians end up lacking any evidence of Denisovan ancestry, when the peoples of Sahul (Australia and New Guinea) have six percent? It's nutty! The early modern humans who were the ancestors of present Sahulian peoples surely came from Asia, and they surely mixed with Denisovans there somewhere, right? But today there's no sign that present Asian peoples descended from those early Asian peoples.

    We must, I think, conclude that there was at least one, and possibly several episodes of massive population movement across South and Southeast Asia.

    I have recently completed a review of the analogous problem for Neandertals in Europe -- late and early Neandertals themselves appear to have been a dynamic population. I'm now working on a review of the situation in Southeast Asia. We may fundamentally have to look at the archaeological record in a new, and much more dynamic, way than has been the case.

    Neandertal gene flow

    To me at the moment, this is the most interesting paragraph of the new paper:

    Interestingly, we find that Denisovans share more alleles with the three populations from eastern Asia and South America (Dai, Han, and Karitiana) than with the two European populations (French and Sardinian) (Z = 5.3). However, this does not appear to be due to Denisovan gene flow into the ancestors of present-day Asians, since the excess archaic material is more closely related to Neandertals than to Denisovans (table S27). We estimate that the proportion of Neandertal ancestry in Europe is 24% lower than in eastern Asia and South America (95% C.I. 12–36%). One possible explanation is that there were at least two independent Neandertal gene flow events into modern humans (18). An alternative explanation is a single Neandertal gene flow event followed by dilution of the Neandertal proportion in the ancestors of Europeans due to later migration out of Africa. However, this would require about 24% of the present-day European gene pool to be derived from African migrations subsequent to the Neandertal admixture.

    This is a very interesting result, partially because it is the opposite of what we are finding. As I explained earlier this year, we are finding Europeans to share more Neandertal alleles than Asians do. The difference in our results has been much smaller than 24%; really only an increase of less than 0.5% on the whole genome, or maybe 10% relative to the overall amount in Europe (which is on the order of 3%).

    My initial reaction to this difference is that it reflects the sharing of Neandertal genes in Africa. Meyer and colleagues filtered out alleles found in Africa, as a way of decreasing the effect of incomplete lineage sorting compared to introgression in their comparison. But if Africans have some gene flow from Neandertals, eliminating alleles found in Africans will create a bias in the comparison. If (as we think) some African populations have Neandertal gene flow, that probably came from West Asia or southern Europe. So as long as the present European and Asian (and Native American) samples have undergone a history of genetic drift, or if (as mentioned in the quote) they mixed with slightly different Neandertal populations, this bias will tend to make Asians look more Neandertal and Europeans less so.

    Anyway, this demands further investigation. The Denisova genome makes a more compelling outgroup for these kinds of comparisons, because it is much closer to us than chimpanzees are. But it isn't really an outgroup because it shares alleles by descent with Neandertals. So it takes some clever genetics to compare the distributions of derived alleles in these genomes in terms of introgression versus incomplete lineage sorting.

    Denisovan demography

    It has become possible to make some good estimates of demographic history using only a single diploid genome, using a technique developed by Li and Durbin [4]. Meyer and colleagues applied this technique to the Denisova genome, finding that its genetic history contrasts with that of living human populations:

    To estimate how Denisovan and modern human population sizes have changed over time we applied a Markovian coalescent model (22) to all genomes analyzed. This shows that present-day human genomes share similar population size changes, in particular a more than two-fold increase in size before 125,000–250,000 years ago (depending on the mutation rates assumed (23), Fig. 5B). Denisovans, in contrast, show a drastic decline in size at the time when the modern human population began to expand.

    There is not yet enough data from Neandertal genomes to apply the same method, but to the extent that we understand their diversity, they show a similar picture. These archaic humans in Eurasia had much, much smaller effective population sizes than the ancient population of Africa. That's not surprising, given what we understand about ancient hunter-gatherer population dynamics.

    What may be a bit more surprising is the geography. We know that Neandertals of Europe and Central Asia lived in an environment that was relatively marginal for their technology and subsistence pattern. The Denisovan population could well have lived in parts of South or Southeast Asia -- subtropical and tropical areas comparable to Africa in their ecological diversity and resource richness.

    We might have imagined that the Denisovan population would be more diverse than Neandertals -- that it might have been comparable in diversity to part of Africa, if not the entirety of Africa. The genome is inconsistent with that picture.

    How can we explain the apparent contrast?

    1. Maybe Denisovans didn't live in South or Southeast Asia at all. If not, that demands that we explain how Australians got their genes.

    2. Maybe the population was geographically extensive and diverse, but the genome from Denisova Cave doesn't represent it well. If so, we might discover that Sahulians actually have even more ancestry from this group. Alternatively, we might find that the early history of the population was widely shared, but the recent history diverged between Siberian and other branches of the Denisovan-inhabited region.

    3. Maybe African diversity emerged from a much more complex series of interactions than we now appreciate. The demographic model of Li and Durban doesn't encompass admixture, just the probability of gene coalescence across time. We have recently begun to appreciate the reality of ancient African population structure. If those initial African populations were more divergent from each other than Neandertals and Denisovans, their later mixture would give rise to a picture of early population expansion, even if each of them had relatively low (Denisovan-like) diversity.

    This picture is already complicated. It will get more so. We have a long way to go before the archaeology of MSA and Middle Paleolithic peoples will be reconciled with these genetic models.

    The "modern human" catalog

    I think it's tremendously interesting that the authors have compiled a list of gene variants shared by living humans that are absent from this high-coverage archaic human genome. It's a first step to identifying networks of genes that have been subject to recent evolutionary change in human ancestors.

    That being said, the list of genes itself doesn't lend itself to concrete conclusions:

    One way to identify changes that may have functional consequences is to focus on sites that are highly conserved among primates and that have changed on the modern human lineage after separation from Denisovan ancestors. We note that among the 23 most conserved positions affected by amino acid changes (primate conservation score ≥ 0.95), eight affect genes that are associated with brain function or nervous system development (NOVA1, SLITRK1, KATNA1, LUZP1, ARHGAP32, ADSL, HTR2B, CBTNAP2). Four of these are involved in axonal and dendritic growth (SLITRK1, KATNA1) and synaptic transmission (ARHGAP32, HTR2B) and two have been implicated in autism (ADSL, CNTNAP2). CNTNAP2 is also associated with susceptibility to language disorders (27) and is particularly noteworthy as it is one of the few genes known to be regulated by FOXP2, a transcription factor involved in language and speech development as well as synaptic plasticity (28). It is thus tempting to speculate that crucial aspects of synaptic transmission may have changed in modern humans.

    Interesting. I can imagine a Ph.D. dissertation looking into the function of each of those genes. It is surely true that in the last 300,000 years, human brains have been evolving. But why these genes as opposed to others? And how many regulatory changes (as opposed to amino acid changes) may have been further involved?

    Maybe even more interesting: How many times will the human alleles be found in some other Denisovan (or Neandertal) genomes, and how often will the "archaic" allele be found in anyone living now?

    A limited series of comparisons is too small to exclude that the range of variation will overlap, as fossil analysts have known for a long time. So we will need to work on extending our knowledge of the range of variation within living people, by increasing the sample of genomes representing populations around the world, particularly in Africa.

    The technology

    Of course, the most exciting thing about the new paper is the proof of concept for future high-coverage archaic genomes. The lab was able to generate the high-coverage sequence using its existing samples, by sequencing single-strand DNA instead of requiring double-strand DNA. This is a massive advantage when working with ancient DNA, because damage to the sequence often prevents double-stranded DNA from being amplified.

    The paper makes explicit that the Denisova phalanx simply has better endogenous DNA preservation than any other specimen known. That being said, the new sequencing method has greatly increased the sequence yield from the sample:

    We applied this method to aliquots of the two DNA extracts (as well as side fractions) that were previously generated from the 40 mg of bone that comprised the entire inner part of the phalanx (2, 8). Comparisons of these newly generated libraries to the two libraries generated in the previous study (2) show at least a 6-fold and 22-fold increase in the recovery of library molecules (8), which is particularly pronounced for longer molecules (fig. S4).

    It would be too soon to say that a similar increase in yield will happen for other specimens, but obviously, this may bring higher coverage into reach for several specimens that are currently only sequenced at very low coverage, including the Vindija, Mezmaiskaya, and El Sidron Neandertals. We will have to wait and see how the new technique affects ancient DNA recovery going forward.

    I keep telling people that I think it's exciting that research into human evolution is now pushing technology forward. It has often been that paleoanthropology uses technological advances in other fields. But with ancient DNA, we really see an organic growth of technology along with research questions about our evolution. In our work on the ancient genomes, we're making some progress pushing forward knowledge about human biology by understanding human evolution. Evolution really is the fundamental principle of biology, but using evolution to learn about biology sometimes requires traveling through time. Ancient DNA gives us a time machine bringing new insights into reach.


    References

    Synopsis: 
    A technological advance in library preparation gives rise to much better knowledge of the ancient Denisovans
  • Neandertal ancestry "Iced"

    Wed, 2012-08-15 15:24 -- John Hawks

    I've been mobbed with e-mails from readers asking about my reaction to the new paper by Anders Eriksson and Andrea Manica in PNAS, titled "Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins" [1]. The paper asserts that Neandertal similarity in the genomes of living people outside Africa can be explained only in terms of incomplete lineage sorting from the shared human-Neandertal common ancestral population in Africa. If the paper's assertions were accurate, we could go back to thinking that all the genetic heritage of people today traces back to Africa, although we would still need to abandon the idea that the African population had undergone a small bottleneck.

    I have not been posting as frequently the last month or two because I have been out of the country doing science.

    The new paper's press release has given rise to quite a lot of media attention, much of which unfortunately misrepresents our current knowledge of human and Neandertal genomes. Razib Khan summarized the situation on Monday, in a post titled, "Why you shouldn't publish in PNAS". I agree with his criticism, although I have a perspective coming out soon in PNAS. In fact, I suppose this episode shows why everyone should publish in PNAS, because so many journalists will just parrot press releases instead of asking relevant experts. Ewen Callaway did a great job on this story by putting it into the broader context ("Neandertal sex debate highlights benefits of pre-publication"). You will notice how no other science writers with any Neandertal knowledge picked up this press release...

    Paleoanthropology is a field where data are rare and precious, and we do a lot of arguing about the validity of models. I love arguing about the validity of models (Cliff Notes version: All models are wrong).

    Genomics is not such a field. We have abundant data today to compare with Neandertal genomes. Yet puzzlingly, the idea of Neandertal ancestry has been challenged by several papers that haven't performed any new empirical comparisons at all. I'm struggling to figure this out. We have an unparalleled ability to explore the genomes of humans and Neandertals, and we should believe a computer model with no empirical data?

    I've been assessing the Neandertal similarity of 1000 Genomes Project samples here on my blog (e.g., "Which population in the 1000 Genomes Project samples has the most Neandertal similarity?"). This is ongoing research here in my group, but we've been making it open because it tells us immediately that some hypotheses about Neandertal similarity must be wrong. Modeling is a lot of work. We're trying to avoid putting a lot of investment into modeling that will be easily refuted by the next piece of genomic data. Data are flowing now so rapidly that we can afford to be naive empiricists.

    For example, our comparisons quickly refute the hypothesis that Neandertal similarity comes only from ancient population structure in Africa. That hypothesis predicts much more heterogeneity within Africans in Neandertal similarity than exists today. We've shown that the heterogeneity in Africans is basically the same as within Europeans or Asians, and that the variance among African populations so far is quite small. Those are very simple observations, which are consistent with what Yang and colleagues [2] concluded on the basis of the frequency spectrum of Neandertal alleles in large samples of living people. Even though many Neandertal-shared SNP alleles came from incomplete lineage sorting, the signature of excess Neandertal sharing outside Africa must come mostly from recent introgression. In Ewen Callaway's article about this research, David Reich dismissed the new paper by Eriksson and Manica as "obsolete". I agree. The paper describes a model without carrying out any new empirical comparisons, and so has fallen behind where the science has gone.

    Another example is the proportion of Neandertal ancestry. Initially, the proportion of ancestry from Neandertals in living people was argued to be between 1 and 4 percent [3]. That was a model-based estimate that was the best possible under the assumption that Africans have no Neandertal ancestry. We now have a lot more human comparisons, which would make possible a more precise estimate of the mean. I hesitate to provide a new estimate, because we have shown that some Africans have substantial evidence of Neandertal similarity, which throws the baseline for any estimate into question. How much Neandertal ancestry is present in living people must depend on a more complex model of mixture among later populations. The result will still be small (probably less than 6 percent) but understanding this proportion will help us to evaluate when and where Neandertal genes flowed into our populations.

    Here's a third example. I haven't written about here yet, but I have been lecturing about it quite widely over the past few months. Earlier this year, the genome of Ötzi the Tyrolean Iceman was reported by Andreas Keller and colleagues [4]. Aaron Sams and I downloaded the data and have been carrying out several different kinds of comparisons. A picture:

    Otzi 1000 Genomes Neandertal comparison

    I'd like to see the model of African population structure that could explain this result...

    If you'll remember my earlier posts on the 1000 Genomes Project samples, this chart is a histogram of the number of shared Neandertal derived SNP alleles in different samples. The European and Asian samples are substantially greater than either African sample (here, Luhya and Yoruba colored differently). If we took as a baseline that Europeans have an average of 3.5 percent Neandertal, Ötzi would have around 5.5 percent (again, the actual percentage would be highly model-dependent). He has substantially greater sharing with Neandertals than any other recent person we have ever examined.

    You can imagine, we have carried out just about every comparison we can think that could explain this result as anything other than greater Neandertal ancestry. Aaron and I will be putting our manuscript on the arXiv as soon as we've both signed off on all the text and figures, hopefully this week. This is simple stuff, and I see no reason not to be open about it -- anybody with the Ötzi data can immediately do the same thing.

    We think that showing and sharing these comparisons will save people a lot of useless effort. Personally, I can't believe that these people spending effort on population models for Neandertals aren't talking to those of us who have already carried out these comparisons and have already presented them in public. I guess we'll find out if secrecy or openness leads to better science.

    Meanwhile, I can share the abstract of the conference paper I'll be presenting in September at the meeting of the European Society of Human Evolution in Bordeaux:

    Evaluating recent evolution, migration and Neandertal ancestry in the Tyrolean Iceman

    Paleogenetic evidence from Neandertals, the Neolithic and other eras has the potential to transform our knowledge of human population dynamics. Previous work has established the level of contribution of Neandertals to living human populations. Here, I consider data from the Tyrolean Iceman. The genome of this Neolithic-era individual shows a substantially higher degree of Ne- andertal ancestry than living Europeans. This comparison suggests that early Upper Paleolithic Europeans may have mixed with Neandertals to a greater degree than other modern human populations. I also use this genome to evaluate the pattern of selection in post-Neolithic Europeans. In large part, the evidence of selection from living people’s genetic data is confirmed by this specimen, but in some cases selection may be disproved by the Iceman’s genotypes. Neolithic-living human comparisons provide information about migration and diffusion of genes into Europe. I compare these data to the situation within Neandertals, and the transition of Neandertals to Upper Paleolithic populations – three demographic transitions in Europe that generated strong genetic disequi- libria in successive populations.


    References

  • Neandertal similarity in the HapMap samples

    Mon, 2012-06-25 11:36 -- John Hawks

    In my last installment on Neandertal introgression in present-day human samples, I covered whole genome data from the 1000 Genomes Project ("Which population in the 1000 Genomes Project samples has the most Neandertal similarity?". For the next few weeks I'll be releasing more of these comparisons, made with the help of my Ph.D. student, Aaron Sams.

    Just to remind about our methods for comparing genomes, what we have done is to examine every base reported as a single nucleotide polymorphism by the 1000 Genomes Project. If the sequencing data had no errors, then this would be an account of every point mutation in the human genome. However, the data are imperfect in various ways, as I'll note below. Likewise, the Neandertal sequence data are imperfect in various ways.

    Here's one of the 1000 Genomes Project comparisons, showing the histogram for pooled European, African, and Chinese samples. In this chart, the number of shared Neandertal derived SNP alleles is the x-axis, divided into bins of around 500. The y-axis is the number of individual genomes in the sample found in each bin. So on this chart, the largest number of European genomes (nearly 120) share very approximately 645,000 derived SNP alleles with the Vindija 33.16 genome.

    Comparison of shared Neandertal derived variants in African, Chinese and European samples

    I find it necessary to be very explicit about these charts, because after showing them to many people I know how easily they can be misinterpreted. It's natural to assume that they are bar charts, where higher y values mean more Neandertal. But with more than 2000 genomes to compare, a bar chart is really just noise. These histograms are much like bell curves, in which the shape of the distribution on the y-axis indicates the dispersion within the population of Neandertal shared alleles.

    Percentages

    Everyone is excited to find out what percentage of Neandertal ancestry people have. I'm hesitant to report percentages, because I think they are misleading on these data. There is some filtering hiding beneath the data. In particular SNP alleles that are found only in one individual ("singletons") are likely to be undersampled by the project's sequence analysis. Because gene variants that have introgressed from Neandertal populations tend to be rare in present-day samples, when we miss some rare alleles, this tends to reduce our estimate of Neandertal similarity. This bias in resequencing data should affect populations roughly in proportion to their Neandertal ancestry. Our comparisons of different populations are therefore likely to give the right order of Neandertal ancestry (e.g., Europeans more than Asians) but may underestimate the total fraction of ancestry by some amount. We are counting human SNP variants and not every base pair in the Neandertal genome data, so the effect of sequencing error in the Neandertals will be minimal, but nevertheless present in a small fraction of comparisons. These errors should be randomly distributed with respect to human population differences, but they also add noise that should decrease the accuracy of percentage estimates.

    For another thing, we don't know where the zero point may be. Europeans have around 3 percent more than Yoruba; Yoruba (as I showed in the last post) have around a half percent more Neandertal similarity than Luhya in the 1000 Genomes Project sample. The Luhya are almost certainly not minimal for living people, in fact I would put some money against it. Since some Neandertal alleles have proceeded right up to high frequencies outside Africa, there has been ample opportunity during the last 30,000 years or more for other alleles to have spread into Africa.

    Our conservative approach is to rely on comparisons of large samples of people, ideally hundreds, and to trust a comparison only when it achieves statistical significance in these samples. That still allows us to detect very slight excesses of Neandertal ancestry in some populations, because the data from hundreds of individuals is very strong evidence. But the overlap among populations is sometimes very extensive even if their means differ significantly.

    Incomplete lineage sorting (ILS) is one pattern by which living people share alleles with Neandertals. ILS should be equally distributed among populations today, under the assumption that Neandertals and ancestral Africans stem from a single unstructured population. Obviously, Europeans and Asians share more derived SNP alleles with Neandertals than do Africans today, so we can strongly reject the hypothesis of isolation between African and Neandertal populations.

    Given that, three patterns of evolution could have caused some populations to share more derived alleles with Neandertals than others.

    1. Population structure in the ancestors of Africans and Neandertals may have caused some populations to share more ILS with Neandertals than others.

    2. Continued gene flow between Neandertals and Africans could have spread Neandertal alleles into Africa and vice-versa.

    3. Recent introgression from Neandertal populations into the ancestors of today's populations may have transferred new Neandertal alleles into recent humans.

    These three processes actually overlap with each other. Very likely all three of them happened -- although to date, the descriptions of Neandertal genome data have accentuated the last and argued that the first two are relatively less important [1] [2]. A "new" allele in a Neandertal may actually have originated from a mutation more than a half million years ago, have been lost within ancient Africans, and transferred into today's Europeans when they encountered and mixed with Neandertals. We cannot tell these processes apart from the standpoint of any single SNP allele. Only by comparing many SNP alleles across many populations can we sort out their relative importance.

    To this end, we have been comparing populations with each other and ancient Neandertals in many different ways. The 1000 Genomes Project has continued to sample and resequence many of the same samples that were initially amassed for the International HapMap Project. The HapMap was a project based on genotyping individuals with microarray technology. Genotypes are just as informative in many cases as whole-genome sequences. If you already know which genetic variations you want to examine, a microarray can save a substantial amount of wasted effort.

    With Neandertal comparisons, we don't start out knowing in advance which genotypes will be useful. For this reason, genotyping data yields a potential bias when comparing to Neandertal or other human genomes. The microarray was designed to include genotypes that were already known to vary in some human population. With the HapMap, this bias tends to overrepresent the genetic variations in the initial HapMap samples -- generally, Utah residents of northern European descent, ethnic Yoruba people from Nigeria, ethnic Han Chinese from Beijing, and Japanese people from Tokyo. If these samples share some common derived SNP alleles with Neandertals, they will very likely be represented in the genotyping array. But very rare alleles won't be represented. And alleles that are uniquely in other populations -- such as East Africans or South Asians -- may not be represented, either. The bias is called "ascertainment bias" because it comes from the "ascertainment" of SNPs, or their initial discovery in some populations but not others.

    It is possible now to find sets of SNP markers that have been statistically chosen to minimize ascertainment biases. The filters used in such comparisons are complex, and in some cases actually rely on the Neandertal genotype, so I haven't used them here. For our first paper we have focused on the whole-genome sequence comparisons, but here I'll give the same comparisons on some HapMap samples to show approximately where they fit. I will focus here on raw comparisons instead of standardizing them in terms of the predictive ability of informative SNPs on whole genome data. Finding the most informative SNPs is part of the process of sorting introgression from earlier population structure, and is rather more complex; I prefer to start with something very simple and visually easy to interpret.

    South Asia

    One interesting place is India. The HapMap includes a sample of Indian-Americans with origins in Gujarat, in western India. Here's a plot comparing the Gujarat ancestry (GIH) sample with the CEU and LWK samples:

    Comparison of shared Neandertal derived variants in CEU, LWK and GIH samples

    The GIH sample has substantially fewer shared Neandertal derived SNP alleles than the CEU sample. What may be more curious is that the GIH sample also has fewer than East Asians on average. The JPT+CHB samples, for example, exceed the GIH mean by around 100 derived SNPs.

    Comparison of shared Neandertal derived variants in JPT+CHB, LWK and GIH samples

    On a mean of more than 43,000, 100 is around a fourth of a percent, so it's not much -- and it may fall within the amount expected from ascertainment bias. It will be much more enlightening to have GIH whole genome data. In the meantime, we can probably confirm the picture from sequence data that indicates Europeans today have the highest degree of Neandertal ancestry.

    East Africa

    The situation within Africa is potentially very complex also. From sequence data, we were able to show that Yoruba (YRI) and Luhya (LWK) population samples have different numbers of shared derived Neandertal SNP alleles. The YRI sample in West Africa has significantly more Neandertal similarity than the LWK sample in East Africa. We speculate that this relation may reflect trans-Saharan gene flow, which has continued throughout history and prehistory.

    Is this a question of east versus west in Africa? That might seem unlikely considering the extent of population movements into northeastern Africa and continued trade along the East African coast throughout historic time.

    The HapMap includes a sample of ethnic Maasai people from Kenya, which allows us to provide another perspective on African variation. Here is the chart, compared to LWK and CEU:

    Comparison of shared Neandertal derived variants in CEU, LWK and MKK samples

    The Maasai have substantially more Neandertal similarity than Luhya, despite their present geographic proximity. In fact, the mean amount of Neandertal similarity in the Maasai is approximately the same as that in the ASW sample, which is composed of African-American ancestry people in the Southwest U.S.:

    Comparison of shared Neandertal derived variants in CEU, LWK and ASW samples

    You see immediately more dispersion in the African-American ancestry sample, because the mixture between African and European ancestors is more variable and much more recent than the events that gave rise to the Neandertal ancestry of Maasai people.

    We speculate that there may have been a substantial amount of interaction in northeast Africa. Obviously this has been true in historic times, but the Maasai suggest that it may go back long before the origins of the present ethnic groups and their movements into this area. The present heterogeneity of Neandertal similarity in these populations suggests a really complex population history. Some of the present Neandertal similarity may derive from ILS within the ancient African population.

    Probing assumptions

    Of course my lab is not the only one presently engaged in comparing the archaic human genomes with recent populations. One of the reasons why we're pursuing a more open science strategy in our reporting is that different groups using different methodologies ought to converge on the same population history. Where we see different results, it's often an indication that the alternative approaches involve substantially different assumptions about the way ancient humans interacted. As we've probed more deeply into the data, we have confronted the reality that long-term population mixture between Neandertal and African ancestral populations is extremely difficult to rule out. Assuming that long-term interactions were impossible because Neandertals and Africans were completely isolated will probably lead to erroneous results. That makes it harder for us to clearly identify gene variants that came from Neandertals within the last hundred thousand years, as opposed to those shared with Neandertals via more ancient gene flow.

    What makes long-term interactions seem more likely is that some of the Neandertal genomes seem to be more closely related to living people than others. More on that in my next installment.


    References

    Synopsis: 
    I examine the pattern of Neandertal ancestry in India and East Africa.
  • The H preparation

    Tue, 2012-05-08 08:48 -- John Hawks

    Razib Khan comments on the current round of Henry Louis Gates ancestry programming: "Finding fake roots", and "Reification is alright by me! Razib notes that the criteria that tell many subjects that their ancestry is a mixture of different populations are conditioned on assumptions that don't work at all for South Asians. From the latter:

    In my post below some commenters argued that obviously implausible inferences from a thin set of reference populations are acceptable considering Henry Louis Gates Jr’s target audience. But that really wasn’t my main point. Rather, it was that he was eliding the distinction between uniparental markers, and the clusters generated by modeled based ancestry assignment algorithms, and ascribing the phylogenies of the former to the latter. It is important to note that categories like “Europeans” are only approximations. But they’re damn good approximations today! Nevertheless, note the qualification of time: they may have basically no meaning at some point in the recent past. They’re powerful when it comes to precisely partitioning modern variation, but they don’t tell us the history of that variation.

    The uniparental marker "interpretations" given to people doing genealogical work has become increasingly comical in its distance from what we now know about ancient variation. For example, I carry mtDNA haplogroup H, and here's what the Genographic Project tells me about that history in their "Atlas of the Human Journey":

    Around 15,000 to 20,000 years ago, colder temperatures and a drier global climate locked much of the world's fresh water at the polar ice caps, making living conditions near impossible for much of the northern hemisphere. Early Europeans retreated to the warmer climates of the Iberian Peninsula, Italy, and the Balkans, where they waited out the cold spell. Their population sizes were drastically reduced, and much of the genetic diversity that had previously existed in Europe was lost. Beginning about 15,000 years ago -- after the ice sheets had begun their retreat -- humans moved north again and recolonized western Europe. By far the most frequent mitochondrial lineage carried by these expanding groups was haplogroup H. Because of the population growth that quickly followed this expansion, this haplogroup now dominates the European female landscape.

    Here, a very common mtDNA haplogroup today is given its own origin myth, complete with a glacial refugium and massive expansion and dispersal. The text goes on to explain how this European haplogroup spread right out of southern Europe into central Asia, where today -- surprisingly -- it is even more variable and shows less sign of expansion. Notice how precise the story sounds, a fleshed-out history for people looking to connect their roots to European prehistoric events.

    Why do I say comical? We have ancient mtDNA from all over Europe now, from Neolithic and pre-Neolithic people, showing that haplogroup H was barely there before farming.

    I don't mean to single out Genographic for this issue, in fact the whole edifice of genealogical interpretation is built on assumptions about history that are currently known to be false. We can do much better than this, I think. But many of the same characters who failed five or six years ago keep plugging at it, persisting in describing a distorted version of human history.

    UPDATE (2012-05-08): The thing that really bugs me, is that the amount of money spent producing a season of one of these programs would be more than enough to get some of us to straighten some of these problems out. Population genetics is a lot cheaper than media. Or, to put it in a more inspiring way: any media organization that is willing to spring for a couple of postdocs along with their program can show some real science instead of making stuff up. Just saying...

  • Mailbag: Neandertal derived SNP alleles

    Tue, 2011-12-13 09:48 -- John Hawks

    Re: Neandertal introgression, 1000 Genomes style:

    Long-time reader of your blog, non-paleo/anthro/genetics person, here. But please read on:

    Just a couple of brief questions.

    (i) It seems that it would make sense to look at pairwise comparisons (of shared derived Neanderthal SNP alleles) both within a population (e.g., Asians, or CEU) and between them, and build a histogram of how often they overlap.

    (ii) Then one could remove from the data set all such African shared SNPs - assuming that most of them are incomplete lineage sorting but that Africa had the initial superset of alleles before ooA (I know some are likely West Asian or European admixture, reducing the data set slightly more than necessary), and repeat (i) and similar diagnostics. Is the typical unmodified genome chunk length around such sites much longer than in (i) - can one date this? Can one now better quantify the actual admixture percentage outside of Africa?

    Wouldn't such a procedure give more insight about how Neanderthal introgression is distributed, when it occurred, and perhaps where it occurred?

    I am sure you are already working on similar ideas - just wanted to know if you agree that these may be low-hanging fruit to pursue.

    Thanks!

    Hi -- thanks for writing!

    I started with exactly the approach you describe, when we were working exclusively with SNP data in the spring. For example:

    http://johnhawks.net/weblog/reviews/neandertals/neandertal_dna/europe-ch...

    We were using linked haplotypes rather than single SNPs but the filtering process was the same.

    Now I am hopeful that we will have decent age estimates for the introgressing SNPs from a different technique. I would rather find these ages independently of filtering by geographic location, because having this information will greatly simplify testing models of ancient population dynamics. If we succeed at this, we will also have a test of selection based on the same allele ages.

    I am continuing to update and you'll see these results not long after we get them!

Pages

Subscribe to population structure

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.