john hawks weblog

paleoanthropology, genetics and evolution

introgression

  • "I Believe in Gene Flow"

    Tue, 2013-05-14 23:59 -- John Hawks

    Mindy Pitre forwarded me a video done by her undergraduate students at St. Lawrence University, and I just had to share it. It is about as adorable as caveman lovin' can be!

    "I Believe in Gene Flow"

    She writes: "It was for my Intro to Human Origins at St. Lawrence University. I made them do group raps/songs. They were super creative!"

  • 180 million Neandertals

    Tue, 2013-04-16 14:58 -- John Hawks

    Just got back proofs of a book chapter I have coming out soon with Zach Throckmorton. My favorite paragraph:

    Nearly seven billion people inhabit our planet. At least six billion carry the genes of Neandertal ancestors. Inheritance from Neandertals makes up approximately 3% of the genomes of randomly chosen people outside sub-Saharan Africa today (Green et al., 2010; Reich et al., 2010). A back-of-the-envelope calculation shows if we took all of the Neandertal genes from today’s human population, we would have enough raw material to make up 180 million Neandertals.

    I love that because it makes the Neandertals into the evolutionary success story they really were. They succeeded by becoming part of us.

    UPDATE (2013-04-18): You can tell from this excerpt how long edited book chapters can take to come out, as we've been more than seven billion for quite some time now! That's one of the changes we'll be making to the galleys.

  • Mailbag: Neandertal ancestry and founder effects

    Sun, 2013-04-07 13:22 -- John Hawks

    I am writing a paper regarding the hybridization of Neanderthals into AMH population and there are a few things I just cannot understand, I know you must be incredibly busy, and there is little chance you are going to answer this email, or even receive it at all, but I shall take my chances. I love Anthropology although my major is Philosophy, I am nowhere near a scientist and genetics is rather obscure within the realms of my mind, so my question might sound basic. In your research to find the lets call it statistical number or amount of sexual encounters between these two species, it is mentioned that if these encounters were in fact "occasional" or "sporadic" all non-African humans would show the same small percentage of DNA in their genome... how can--genetically speaking--you tell from the percentage of Neanderthal DNA in each person if the encounters between these species were a few or significantly larger? in other words, why if the encounters were few all humans now would show the same small amount of neanderthal DNA? and viceversa if the encounters were frequent and Neanderthals were absorbed into our population why would some regions of the world show more of their DNA present?

    Small groups inevitably carry genes that are not a perfect representation of the larger population from which they came. This is called the "founder effect" in biology -- when you have a very small group, they really substantially overrepresent some rare genes, and lose many common genes entirely, giving rise to a rapid genetic differentiation.

    If there had been a very small number of Neandertal-modern contacts (like a dozen or so) then everybody who carries Neandertal genes today should have the same small set of them, all inherited from the founder effect of those few Neandertal ancestors. So your Neandertal genes and my Neandertal genes should all be pretty much the same. And most of the genome should have no Neandertal genes today at all.

    Now, we don't have a definitive answer yet about this, but so far the data don't look like that pattern. Your Neandertal genes and mine are mostly different, and we have a large fraction of the Neandertal genome represented today in one person or another. There haven't been a tremendous number of Neandertal genes lost entirely from humans. That suggests that the number of contacts was not very small -- more like the low thousands or high hundreds than dozens. Remember that the entire human population from that time era acted like a breeding population of fewer than 100,000 people, so 3000 Neandertal ancestors are quite a large fraction of that.

  • Geno2 users showing unexpected Denisovan ancestry

    Sun, 2013-01-06 23:08 -- John Hawks

    I have been excited to hear in the last few days from several readers who have gotten results from the new Genographic Geno2 genotyping chip. One aspect of the result reporting is a person's estimated proportion of Neandertal ancestry, which is a simple percentage. This is like the report from 23andMe, and should be a pretty straightforward estimate given a model of Neandertal-human genetic similarity from complete genomes.

    Another aspect of the Genographic results is an estimated proportion of Denisovan ancestry. This might seem a bit surprising, as for most participants in the project who lack Polynesian or Melanesian ancestry this proportion should be extremely low. I've written about Denisovan DNA similarity with living peoples a few times ("Denisovan DNA in the islands, and an Australian genome", "How widespread is Denisovan ancestry today?"). Based on the science published to date, I would have expected the Geno2 calculations just to confirm the very low ancestry estimation found in last year's research based on genotyping Asian and Australasian populations.

    So I have been extraordinarily surprised to see that people are getting Geno2 results with up to 6% Denisovan ancestry!

    What gives? None of my correspondents so far has anything other than European self-reported ancestry, making it seem very unlikely that have substantial Denisovan ancestry.

    The first time I heard from a reader with this result, my immediate reaction was that there must be some problem with the algorithm. This one in particular wouldn't be to hard to get wrong considering the rarity of whole genome evidence from populations known to have substantial Denisovan ancestry. Or possibly, some problem with an individual's genotype chip data might trigger the algorithm to look more Denisovan. For many loci that vary among humans, Denisovans are very unlikely to have the derived human variant; so an individual with an unusual proportion of ancestral homozygote loci might look Denisovan in a human-Denisovan comparison.

    However, this is all speculation without knowing the details of the Genographic analysis. And as I hear from more people with varied results, I am having trouble thinking of how data errors could be patterned. It's a tough one to think about because of the unique aspects of the Geno2 chip, and until I've gotten a feel for results from that platform compared to other datasets I probably won't have a solid idea.

    I should point out that if there is a problem with the algorithm underlying ancestry prediction from Denisova, it almost certainly affects the Neandertal ancestry estimate also. The estimation from both these ancient genomes involves the same procedure, although with Denisovan DNA it requires subtracting out the DNA similarity with Neandertals first.

    So I would be interested in hearing from anyone who is surprised to find that they have Denisovan ancestry. My preliminary assumption it that the result is spurious but I'll try to figure out if there is a possibility of some Denisovan fraction beyond what has been shown in published work.

    Synopsis: 
    Trying to diagnose the odd results from a new genotyping project
  • The North African Neandertal descendants

    Thu, 2012-10-18 16:25 -- John Hawks

    A new paper by Federico Sánchez-Quinto and colleagues reports on comparisons of North African population samples with the Neandertal DNA project data [1]. The paper shows that North African populations also carry a substantial trace of Neandertal ancestry, like living populations outside of Africa, much more than populations of sub-Saharan Africa.

    One of the main findings derived from the analysis of the Neandertal genome was the evidence for admixture between Neandertals and non-African modern humans. An alternative scenario is that the ancestral population of non-Africans was closer to Neandertals than to Africans because of ancient population substructure. Thus, the study of North African populations is crucial for testing both hypotheses. We analyzed a total of 780,000 SNPs in 125 individuals representing seven different North African locations and searched for their ancestral/derived state in comparison to different human populations and Neandertals. We found that North African populations have a significant excess of derived alleles shared with Neandertals, when compared to sub-Saharan Africans. This excess is similar to that found in non-African humans, a fact that can be interpreted as a sign of Neandertal admixture. Furthermore, the Neandertal's genetic signal is higher in populations with a local, pre-Neolithic North African ancestry. Therefore, the detected ancient admixture is not due to recent Near Eastern or European migrations. Sub-Saharan populations are the only ones not affected by the admixture event with Neandertals.

    The interesting aspect of the paper is that the authors attempted to separate the ancestry of North African samples into a pre-Neolithic indigenous African component, and a residual component that represents more recent gene flow into North Africa, from all sources. The historic movement into North Africa has been fairly cosmopolitan, involving sub-Saharan Africans, Arabs, Medieval Europeans, Romans, Carthaginians and many other peoples. Sánchez-Quinto and colleagues used the ADMIXTURE program to try to sort out a pre-Neolithic indigenous component and analyze that specifically for Neandertal similarity.

    Unsurprisingly, the fraction of estimated sub-Saharan African ancestry in each population sample was inversely correlated with the estimated Neandertal ancestry. That is, the more a population looks like sub-Saharan Africans, the less Neandertal it has.

    Here's what's surprising: When they sorted out parts of the genome in Tunisians that ADMIXTURE determines to be most likely from pre-Neolithic North Africans, they found these parts of the genome had more Neandertal ancestry than typical of the CEU sample of northern European ancestry. Is it possible that ancient North Africans had more Neandertal similarity than today's Europeans?

    Sánchez-Quinto and colleagues suggest that the Neandertal ancestry in this population came in Upper Paleolithic times from the Near East. That is possible, or some of the Neandertal similarity may reflect ancient African population structure. Really I think we will have to do a finer analysis of chromosome blocks to examine the subset of shared Neandertal derived alleles that reflect introgression versus incomplete sorting from the ancestral African population. It will be very interesting to examine more closely the mixture of population history within Egypt, through which most Near Eastern pre-Neolithic population movement must have come.

    The authors note that the distribution of Neandertal similarity outside Africa increases with distance from Africa.

    A previous study [26] observed that the similarity to Neandertals increases with distance from Africa and suggested this could be explained by SNP ascertainment bias plus a strong genetic drift in East Asian populations. Nonetheless more complex, population-biased, ascertainment schemes might have additional effects (i.e bottlenecks), but these are not expected to significantly increase the rate of false positives in admixture tests [31]. The Tunisian population has been reported to be a genetic isolate [17] so it is plausible that part of the signal detected is actually due to genetic drift. However, this should not affect the other North African groups in our study. Finally, given that SNP arrays are based on common alleles and probably the relevant admixture information is encoded within the rare and very rare alleles, the potential bias, if anything, will underestimate ancient hominid admixture signals, as shown in previous studies [2],[3].

    This pattern was also observed by Meyer and colleagues earlier this year [2], and I discussed it in my post on that paper ("Denisova at high coverage"). Both papers note that ascertainment bias may contribute to this pattern. I added that Meyer and colleagues had assumed that genes found in sub-Saharan African populations could not have come from Neandertals, which greatly biased their estimates against Europe and West Asia, considering historical and prehistoric gene flow across the Sahara and along the Indian Ocean coast. So I'm not yet accepting the relative numbers of Neandertal ancestry from different populations, as we don't know that they have all come from consistent assumptions. In particular, an elevated amount of Neandertal ancestry in China -- this paper puts it almost as double the amount of Neandertal ancestry in northern Europeans -- is unlikely. There is no pattern of bottlenecks that can give rise to that excess without additional population mixture, and hard to see where such population mixture would have happened without also affecting the ancestors of Europeans. Instead, we have some work to do in reducing the biases on these comparisons.


    References

    Synopsis: 
    A study of North African genetic variation shows that Neandertal genes were widespread in the area before the Neolithic.
  • Denisova at high coverage

    Thu, 2012-08-30 15:25 -- John Hawks

    Science today has released the new paper on the Denisova high-coverage genome by Mattias Meyer and colleagues from Svante Pääbo's group [1]. There is a lot of material in the supplements of the new paper, and it will take some time to work through implications.

    The basics are quite simple: The paper confirms the initial interpretation of the genome by David Reich and colleagues [2] in most respects. The mixture with a whole-genome sample from Papua New Guinea is estimated at 6% Denisovan ancestry. Confirming the later paper by Reich and colleagues [3], the new analysis finds no significant evidence of Denisovan ancestry in a mainland south Chinese (Han Dai) individual, and can exclude it down to a very small fraction:

    However, in contrast to a recent study proposing more allele sharing between Denisova and populations from southern China, such as the Dai, than with populations from northern China, such as the Han (17), we find less Denisovan allele sharing with the Dai than with the Han (although non-significantly so, Z = –0.9) (Fig. 4B) (table S25). Further analysis shows that if Denisovans contributed any DNA to the Dai, it represents less than 0.1% of their genomes today (table S26).

    That is a mystery to be explained. How did Asians end up lacking any evidence of Denisovan ancestry, when the peoples of Sahul (Australia and New Guinea) have six percent? It's nutty! The early modern humans who were the ancestors of present Sahulian peoples surely came from Asia, and they surely mixed with Denisovans there somewhere, right? But today there's no sign that present Asian peoples descended from those early Asian peoples.

    We must, I think, conclude that there was at least one, and possibly several episodes of massive population movement across South and Southeast Asia.

    I have recently completed a review of the analogous problem for Neandertals in Europe -- late and early Neandertals themselves appear to have been a dynamic population. I'm now working on a review of the situation in Southeast Asia. We may fundamentally have to look at the archaeological record in a new, and much more dynamic, way than has been the case.

    Neandertal gene flow

    To me at the moment, this is the most interesting paragraph of the new paper:

    Interestingly, we find that Denisovans share more alleles with the three populations from eastern Asia and South America (Dai, Han, and Karitiana) than with the two European populations (French and Sardinian) (Z = 5.3). However, this does not appear to be due to Denisovan gene flow into the ancestors of present-day Asians, since the excess archaic material is more closely related to Neandertals than to Denisovans (table S27). We estimate that the proportion of Neandertal ancestry in Europe is 24% lower than in eastern Asia and South America (95% C.I. 12–36%). One possible explanation is that there were at least two independent Neandertal gene flow events into modern humans (18). An alternative explanation is a single Neandertal gene flow event followed by dilution of the Neandertal proportion in the ancestors of Europeans due to later migration out of Africa. However, this would require about 24% of the present-day European gene pool to be derived from African migrations subsequent to the Neandertal admixture.

    This is a very interesting result, partially because it is the opposite of what we are finding. As I explained earlier this year, we are finding Europeans to share more Neandertal alleles than Asians do. The difference in our results has been much smaller than 24%; really only an increase of less than 0.5% on the whole genome, or maybe 10% relative to the overall amount in Europe (which is on the order of 3%).

    My initial reaction to this difference is that it reflects the sharing of Neandertal genes in Africa. Meyer and colleagues filtered out alleles found in Africa, as a way of decreasing the effect of incomplete lineage sorting compared to introgression in their comparison. But if Africans have some gene flow from Neandertals, eliminating alleles found in Africans will create a bias in the comparison. If (as we think) some African populations have Neandertal gene flow, that probably came from West Asia or southern Europe. So as long as the present European and Asian (and Native American) samples have undergone a history of genetic drift, or if (as mentioned in the quote) they mixed with slightly different Neandertal populations, this bias will tend to make Asians look more Neandertal and Europeans less so.

    Anyway, this demands further investigation. The Denisova genome makes a more compelling outgroup for these kinds of comparisons, because it is much closer to us than chimpanzees are. But it isn't really an outgroup because it shares alleles by descent with Neandertals. So it takes some clever genetics to compare the distributions of derived alleles in these genomes in terms of introgression versus incomplete lineage sorting.

    Denisovan demography

    It has become possible to make some good estimates of demographic history using only a single diploid genome, using a technique developed by Li and Durbin [4]. Meyer and colleagues applied this technique to the Denisova genome, finding that its genetic history contrasts with that of living human populations:

    To estimate how Denisovan and modern human population sizes have changed over time we applied a Markovian coalescent model (22) to all genomes analyzed. This shows that present-day human genomes share similar population size changes, in particular a more than two-fold increase in size before 125,000–250,000 years ago (depending on the mutation rates assumed (23), Fig. 5B). Denisovans, in contrast, show a drastic decline in size at the time when the modern human population began to expand.

    There is not yet enough data from Neandertal genomes to apply the same method, but to the extent that we understand their diversity, they show a similar picture. These archaic humans in Eurasia had much, much smaller effective population sizes than the ancient population of Africa. That's not surprising, given what we understand about ancient hunter-gatherer population dynamics.

    What may be a bit more surprising is the geography. We know that Neandertals of Europe and Central Asia lived in an environment that was relatively marginal for their technology and subsistence pattern. The Denisovan population could well have lived in parts of South or Southeast Asia -- subtropical and tropical areas comparable to Africa in their ecological diversity and resource richness.

    We might have imagined that the Denisovan population would be more diverse than Neandertals -- that it might have been comparable in diversity to part of Africa, if not the entirety of Africa. The genome is inconsistent with that picture.

    How can we explain the apparent contrast?

    1. Maybe Denisovans didn't live in South or Southeast Asia at all. If not, that demands that we explain how Australians got their genes.

    2. Maybe the population was geographically extensive and diverse, but the genome from Denisova Cave doesn't represent it well. If so, we might discover that Sahulians actually have even more ancestry from this group. Alternatively, we might find that the early history of the population was widely shared, but the recent history diverged between Siberian and other branches of the Denisovan-inhabited region.

    3. Maybe African diversity emerged from a much more complex series of interactions than we now appreciate. The demographic model of Li and Durban doesn't encompass admixture, just the probability of gene coalescence across time. We have recently begun to appreciate the reality of ancient African population structure. If those initial African populations were more divergent from each other than Neandertals and Denisovans, their later mixture would give rise to a picture of early population expansion, even if each of them had relatively low (Denisovan-like) diversity.

    This picture is already complicated. It will get more so. We have a long way to go before the archaeology of MSA and Middle Paleolithic peoples will be reconciled with these genetic models.

    The "modern human" catalog

    I think it's tremendously interesting that the authors have compiled a list of gene variants shared by living humans that are absent from this high-coverage archaic human genome. It's a first step to identifying networks of genes that have been subject to recent evolutionary change in human ancestors.

    That being said, the list of genes itself doesn't lend itself to concrete conclusions:

    One way to identify changes that may have functional consequences is to focus on sites that are highly conserved among primates and that have changed on the modern human lineage after separation from Denisovan ancestors. We note that among the 23 most conserved positions affected by amino acid changes (primate conservation score ≥ 0.95), eight affect genes that are associated with brain function or nervous system development (NOVA1, SLITRK1, KATNA1, LUZP1, ARHGAP32, ADSL, HTR2B, CBTNAP2). Four of these are involved in axonal and dendritic growth (SLITRK1, KATNA1) and synaptic transmission (ARHGAP32, HTR2B) and two have been implicated in autism (ADSL, CNTNAP2). CNTNAP2 is also associated with susceptibility to language disorders (27) and is particularly noteworthy as it is one of the few genes known to be regulated by FOXP2, a transcription factor involved in language and speech development as well as synaptic plasticity (28). It is thus tempting to speculate that crucial aspects of synaptic transmission may have changed in modern humans.

    Interesting. I can imagine a Ph.D. dissertation looking into the function of each of those genes. It is surely true that in the last 300,000 years, human brains have been evolving. But why these genes as opposed to others? And how many regulatory changes (as opposed to amino acid changes) may have been further involved?

    Maybe even more interesting: How many times will the human alleles be found in some other Denisovan (or Neandertal) genomes, and how often will the "archaic" allele be found in anyone living now?

    A limited series of comparisons is too small to exclude that the range of variation will overlap, as fossil analysts have known for a long time. So we will need to work on extending our knowledge of the range of variation within living people, by increasing the sample of genomes representing populations around the world, particularly in Africa.

    The technology

    Of course, the most exciting thing about the new paper is the proof of concept for future high-coverage archaic genomes. The lab was able to generate the high-coverage sequence using its existing samples, by sequencing single-strand DNA instead of requiring double-strand DNA. This is a massive advantage when working with ancient DNA, because damage to the sequence often prevents double-stranded DNA from being amplified.

    The paper makes explicit that the Denisova phalanx simply has better endogenous DNA preservation than any other specimen known. That being said, the new sequencing method has greatly increased the sequence yield from the sample:

    We applied this method to aliquots of the two DNA extracts (as well as side fractions) that were previously generated from the 40 mg of bone that comprised the entire inner part of the phalanx (2, 8). Comparisons of these newly generated libraries to the two libraries generated in the previous study (2) show at least a 6-fold and 22-fold increase in the recovery of library molecules (8), which is particularly pronounced for longer molecules (fig. S4).

    It would be too soon to say that a similar increase in yield will happen for other specimens, but obviously, this may bring higher coverage into reach for several specimens that are currently only sequenced at very low coverage, including the Vindija, Mezmaiskaya, and El Sidron Neandertals. We will have to wait and see how the new technique affects ancient DNA recovery going forward.

    I keep telling people that I think it's exciting that research into human evolution is now pushing technology forward. It has often been that paleoanthropology uses technological advances in other fields. But with ancient DNA, we really see an organic growth of technology along with research questions about our evolution. In our work on the ancient genomes, we're making some progress pushing forward knowledge about human biology by understanding human evolution. Evolution really is the fundamental principle of biology, but using evolution to learn about biology sometimes requires traveling through time. Ancient DNA gives us a time machine bringing new insights into reach.


    References

    Synopsis: 
    A technological advance in library preparation gives rise to much better knowledge of the ancient Denisovans
  • Neandertal ancestry "Iced"

    Wed, 2012-08-15 15:24 -- John Hawks

    I've been mobbed with e-mails from readers asking about my reaction to the new paper by Anders Eriksson and Andrea Manica in PNAS, titled "Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins" [1]. The paper asserts that Neandertal similarity in the genomes of living people outside Africa can be explained only in terms of incomplete lineage sorting from the shared human-Neandertal common ancestral population in Africa. If the paper's assertions were accurate, we could go back to thinking that all the genetic heritage of people today traces back to Africa, although we would still need to abandon the idea that the African population had undergone a small bottleneck.

    I have not been posting as frequently the last month or two because I have been out of the country doing science.

    The new paper's press release has given rise to quite a lot of media attention, much of which unfortunately misrepresents our current knowledge of human and Neandertal genomes. Razib Khan summarized the situation on Monday, in a post titled, "Why you shouldn't publish in PNAS". I agree with his criticism, although I have a perspective coming out soon in PNAS. In fact, I suppose this episode shows why everyone should publish in PNAS, because so many journalists will just parrot press releases instead of asking relevant experts. Ewen Callaway did a great job on this story by putting it into the broader context ("Neandertal sex debate highlights benefits of pre-publication"). You will notice how no other science writers with any Neandertal knowledge picked up this press release...

    Paleoanthropology is a field where data are rare and precious, and we do a lot of arguing about the validity of models. I love arguing about the validity of models (Cliff Notes version: All models are wrong).

    Genomics is not such a field. We have abundant data today to compare with Neandertal genomes. Yet puzzlingly, the idea of Neandertal ancestry has been challenged by several papers that haven't performed any new empirical comparisons at all. I'm struggling to figure this out. We have an unparalleled ability to explore the genomes of humans and Neandertals, and we should believe a computer model with no empirical data?

    I've been assessing the Neandertal similarity of 1000 Genomes Project samples here on my blog (e.g., "Which population in the 1000 Genomes Project samples has the most Neandertal similarity?"). This is ongoing research here in my group, but we've been making it open because it tells us immediately that some hypotheses about Neandertal similarity must be wrong. Modeling is a lot of work. We're trying to avoid putting a lot of investment into modeling that will be easily refuted by the next piece of genomic data. Data are flowing now so rapidly that we can afford to be naive empiricists.

    For example, our comparisons quickly refute the hypothesis that Neandertal similarity comes only from ancient population structure in Africa. That hypothesis predicts much more heterogeneity within Africans in Neandertal similarity than exists today. We've shown that the heterogeneity in Africans is basically the same as within Europeans or Asians, and that the variance among African populations so far is quite small. Those are very simple observations, which are consistent with what Yang and colleagues [2] concluded on the basis of the frequency spectrum of Neandertal alleles in large samples of living people. Even though many Neandertal-shared SNP alleles came from incomplete lineage sorting, the signature of excess Neandertal sharing outside Africa must come mostly from recent introgression. In Ewen Callaway's article about this research, David Reich dismissed the new paper by Eriksson and Manica as "obsolete". I agree. The paper describes a model without carrying out any new empirical comparisons, and so has fallen behind where the science has gone.

    Another example is the proportion of Neandertal ancestry. Initially, the proportion of ancestry from Neandertals in living people was argued to be between 1 and 4 percent [3]. That was a model-based estimate that was the best possible under the assumption that Africans have no Neandertal ancestry. We now have a lot more human comparisons, which would make possible a more precise estimate of the mean. I hesitate to provide a new estimate, because we have shown that some Africans have substantial evidence of Neandertal similarity, which throws the baseline for any estimate into question. How much Neandertal ancestry is present in living people must depend on a more complex model of mixture among later populations. The result will still be small (probably less than 6 percent) but understanding this proportion will help us to evaluate when and where Neandertal genes flowed into our populations.

    Here's a third example. I haven't written about here yet, but I have been lecturing about it quite widely over the past few months. Earlier this year, the genome of Ötzi the Tyrolean Iceman was reported by Andreas Keller and colleagues [4]. Aaron Sams and I downloaded the data and have been carrying out several different kinds of comparisons. A picture:

    Otzi 1000 Genomes Neandertal comparison

    I'd like to see the model of African population structure that could explain this result...

    If you'll remember my earlier posts on the 1000 Genomes Project samples, this chart is a histogram of the number of shared Neandertal derived SNP alleles in different samples. The European and Asian samples are substantially greater than either African sample (here, Luhya and Yoruba colored differently). If we took as a baseline that Europeans have an average of 3.5 percent Neandertal, Ötzi would have around 5.5 percent (again, the actual percentage would be highly model-dependent). He has substantially greater sharing with Neandertals than any other recent person we have ever examined.

    You can imagine, we have carried out just about every comparison we can think that could explain this result as anything other than greater Neandertal ancestry. Aaron and I will be putting our manuscript on the arXiv as soon as we've both signed off on all the text and figures, hopefully this week. This is simple stuff, and I see no reason not to be open about it -- anybody with the Ötzi data can immediately do the same thing.

    We think that showing and sharing these comparisons will save people a lot of useless effort. Personally, I can't believe that these people spending effort on population models for Neandertals aren't talking to those of us who have already carried out these comparisons and have already presented them in public. I guess we'll find out if secrecy or openness leads to better science.

    Meanwhile, I can share the abstract of the conference paper I'll be presenting in September at the meeting of the European Society of Human Evolution in Bordeaux:

    Evaluating recent evolution, migration and Neandertal ancestry in the Tyrolean Iceman

    Paleogenetic evidence from Neandertals, the Neolithic and other eras has the potential to transform our knowledge of human population dynamics. Previous work has established the level of contribution of Neandertals to living human populations. Here, I consider data from the Tyrolean Iceman. The genome of this Neolithic-era individual shows a substantially higher degree of Ne- andertal ancestry than living Europeans. This comparison suggests that early Upper Paleolithic Europeans may have mixed with Neandertals to a greater degree than other modern human populations. I also use this genome to evaluate the pattern of selection in post-Neolithic Europeans. In large part, the evidence of selection from living people’s genetic data is confirmed by this specimen, but in some cases selection may be disproved by the Iceman’s genotypes. Neolithic-living human comparisons provide information about migration and diffusion of genes into Europe. I compare these data to the situation within Neandertals, and the transition of Neandertals to Upper Paleolithic populations – three demographic transitions in Europe that generated strong genetic disequi- libria in successive populations.


    References

  • Neandertal similarity in the HapMap samples

    Mon, 2012-06-25 11:36 -- John Hawks

    In my last installment on Neandertal introgression in present-day human samples, I covered whole genome data from the 1000 Genomes Project ("Which population in the 1000 Genomes Project samples has the most Neandertal similarity?". For the next few weeks I'll be releasing more of these comparisons, made with the help of my Ph.D. student, Aaron Sams.

    Just to remind about our methods for comparing genomes, what we have done is to examine every base reported as a single nucleotide polymorphism by the 1000 Genomes Project. If the sequencing data had no errors, then this would be an account of every point mutation in the human genome. However, the data are imperfect in various ways, as I'll note below. Likewise, the Neandertal sequence data are imperfect in various ways.

    Here's one of the 1000 Genomes Project comparisons, showing the histogram for pooled European, African, and Chinese samples. In this chart, the number of shared Neandertal derived SNP alleles is the x-axis, divided into bins of around 500. The y-axis is the number of individual genomes in the sample found in each bin. So on this chart, the largest number of European genomes (nearly 120) share very approximately 645,000 derived SNP alleles with the Vindija 33.16 genome.

    Comparison of shared Neandertal derived variants in African, Chinese and European samples

    I find it necessary to be very explicit about these charts, because after showing them to many people I know how easily they can be misinterpreted. It's natural to assume that they are bar charts, where higher y values mean more Neandertal. But with more than 2000 genomes to compare, a bar chart is really just noise. These histograms are much like bell curves, in which the shape of the distribution on the y-axis indicates the dispersion within the population of Neandertal shared alleles.

    Percentages

    Everyone is excited to find out what percentage of Neandertal ancestry people have. I'm hesitant to report percentages, because I think they are misleading on these data. There is some filtering hiding beneath the data. In particular SNP alleles that are found only in one individual ("singletons") are likely to be undersampled by the project's sequence analysis. Because gene variants that have introgressed from Neandertal populations tend to be rare in present-day samples, when we miss some rare alleles, this tends to reduce our estimate of Neandertal similarity. This bias in resequencing data should affect populations roughly in proportion to their Neandertal ancestry. Our comparisons of different populations are therefore likely to give the right order of Neandertal ancestry (e.g., Europeans more than Asians) but may underestimate the total fraction of ancestry by some amount. We are counting human SNP variants and not every base pair in the Neandertal genome data, so the effect of sequencing error in the Neandertals will be minimal, but nevertheless present in a small fraction of comparisons. These errors should be randomly distributed with respect to human population differences, but they also add noise that should decrease the accuracy of percentage estimates.

    For another thing, we don't know where the zero point may be. Europeans have around 3 percent more than Yoruba; Yoruba (as I showed in the last post) have around a half percent more Neandertal similarity than Luhya in the 1000 Genomes Project sample. The Luhya are almost certainly not minimal for living people, in fact I would put some money against it. Since some Neandertal alleles have proceeded right up to high frequencies outside Africa, there has been ample opportunity during the last 30,000 years or more for other alleles to have spread into Africa.

    Our conservative approach is to rely on comparisons of large samples of people, ideally hundreds, and to trust a comparison only when it achieves statistical significance in these samples. That still allows us to detect very slight excesses of Neandertal ancestry in some populations, because the data from hundreds of individuals is very strong evidence. But the overlap among populations is sometimes very extensive even if their means differ significantly.

    Incomplete lineage sorting (ILS) is one pattern by which living people share alleles with Neandertals. ILS should be equally distributed among populations today, under the assumption that Neandertals and ancestral Africans stem from a single unstructured population. Obviously, Europeans and Asians share more derived SNP alleles with Neandertals than do Africans today, so we can strongly reject the hypothesis of isolation between African and Neandertal populations.

    Given that, three patterns of evolution could have caused some populations to share more derived alleles with Neandertals than others.

    1. Population structure in the ancestors of Africans and Neandertals may have caused some populations to share more ILS with Neandertals than others.

    2. Continued gene flow between Neandertals and Africans could have spread Neandertal alleles into Africa and vice-versa.

    3. Recent introgression from Neandertal populations into the ancestors of today's populations may have transferred new Neandertal alleles into recent humans.

    These three processes actually overlap with each other. Very likely all three of them happened -- although to date, the descriptions of Neandertal genome data have accentuated the last and argued that the first two are relatively less important [1] [2]. A "new" allele in a Neandertal may actually have originated from a mutation more than a half million years ago, have been lost within ancient Africans, and transferred into today's Europeans when they encountered and mixed with Neandertals. We cannot tell these processes apart from the standpoint of any single SNP allele. Only by comparing many SNP alleles across many populations can we sort out their relative importance.

    To this end, we have been comparing populations with each other and ancient Neandertals in many different ways. The 1000 Genomes Project has continued to sample and resequence many of the same samples that were initially amassed for the International HapMap Project. The HapMap was a project based on genotyping individuals with microarray technology. Genotypes are just as informative in many cases as whole-genome sequences. If you already know which genetic variations you want to examine, a microarray can save a substantial amount of wasted effort.

    With Neandertal comparisons, we don't start out knowing in advance which genotypes will be useful. For this reason, genotyping data yields a potential bias when comparing to Neandertal or other human genomes. The microarray was designed to include genotypes that were already known to vary in some human population. With the HapMap, this bias tends to overrepresent the genetic variations in the initial HapMap samples -- generally, Utah residents of northern European descent, ethnic Yoruba people from Nigeria, ethnic Han Chinese from Beijing, and Japanese people from Tokyo. If these samples share some common derived SNP alleles with Neandertals, they will very likely be represented in the genotyping array. But very rare alleles won't be represented. And alleles that are uniquely in other populations -- such as East Africans or South Asians -- may not be represented, either. The bias is called "ascertainment bias" because it comes from the "ascertainment" of SNPs, or their initial discovery in some populations but not others.

    It is possible now to find sets of SNP markers that have been statistically chosen to minimize ascertainment biases. The filters used in such comparisons are complex, and in some cases actually rely on the Neandertal genotype, so I haven't used them here. For our first paper we have focused on the whole-genome sequence comparisons, but here I'll give the same comparisons on some HapMap samples to show approximately where they fit. I will focus here on raw comparisons instead of standardizing them in terms of the predictive ability of informative SNPs on whole genome data. Finding the most informative SNPs is part of the process of sorting introgression from earlier population structure, and is rather more complex; I prefer to start with something very simple and visually easy to interpret.

    South Asia

    One interesting place is India. The HapMap includes a sample of Indian-Americans with origins in Gujarat, in western India. Here's a plot comparing the Gujarat ancestry (GIH) sample with the CEU and LWK samples:

    Comparison of shared Neandertal derived variants in CEU, LWK and GIH samples

    The GIH sample has substantially fewer shared Neandertal derived SNP alleles than the CEU sample. What may be more curious is that the GIH sample also has fewer than East Asians on average. The JPT+CHB samples, for example, exceed the GIH mean by around 100 derived SNPs.

    Comparison of shared Neandertal derived variants in JPT+CHB, LWK and GIH samples

    On a mean of more than 43,000, 100 is around a fourth of a percent, so it's not much -- and it may fall within the amount expected from ascertainment bias. It will be much more enlightening to have GIH whole genome data. In the meantime, we can probably confirm the picture from sequence data that indicates Europeans today have the highest degree of Neandertal ancestry.

    East Africa

    The situation within Africa is potentially very complex also. From sequence data, we were able to show that Yoruba (YRI) and Luhya (LWK) population samples have different numbers of shared derived Neandertal SNP alleles. The YRI sample in West Africa has significantly more Neandertal similarity than the LWK sample in East Africa. We speculate that this relation may reflect trans-Saharan gene flow, which has continued throughout history and prehistory.

    Is this a question of east versus west in Africa? That might seem unlikely considering the extent of population movements into northeastern Africa and continued trade along the East African coast throughout historic time.

    The HapMap includes a sample of ethnic Maasai people from Kenya, which allows us to provide another perspective on African variation. Here is the chart, compared to LWK and CEU:

    Comparison of shared Neandertal derived variants in CEU, LWK and MKK samples

    The Maasai have substantially more Neandertal similarity than Luhya, despite their present geographic proximity. In fact, the mean amount of Neandertal similarity in the Maasai is approximately the same as that in the ASW sample, which is composed of African-American ancestry people in the Southwest U.S.:

    Comparison of shared Neandertal derived variants in CEU, LWK and ASW samples

    You see immediately more dispersion in the African-American ancestry sample, because the mixture between African and European ancestors is more variable and much more recent than the events that gave rise to the Neandertal ancestry of Maasai people.

    We speculate that there may have been a substantial amount of interaction in northeast Africa. Obviously this has been true in historic times, but the Maasai suggest that it may go back long before the origins of the present ethnic groups and their movements into this area. The present heterogeneity of Neandertal similarity in these populations suggests a really complex population history. Some of the present Neandertal similarity may derive from ILS within the ancient African population.

    Probing assumptions

    Of course my lab is not the only one presently engaged in comparing the archaic human genomes with recent populations. One of the reasons why we're pursuing a more open science strategy in our reporting is that different groups using different methodologies ought to converge on the same population history. Where we see different results, it's often an indication that the alternative approaches involve substantially different assumptions about the way ancient humans interacted. As we've probed more deeply into the data, we have confronted the reality that long-term population mixture between Neandertal and African ancestral populations is extremely difficult to rule out. Assuming that long-term interactions were impossible because Neandertals and Africans were completely isolated will probably lead to erroneous results. That makes it harder for us to clearly identify gene variants that came from Neandertals within the last hundred thousand years, as opposed to those shared with Neandertals via more ancient gene flow.

    What makes long-term interactions seem more likely is that some of the Neandertal genomes seem to be more closely related to living people than others. More on that in my next installment.


    References

    Synopsis: 
    I examine the pattern of Neandertal ancestry in India and East Africa.
  • Butterfly genetic theft

    Thu, 2012-05-17 00:39 -- John Hawks

    The Heliconius butterfly genome paper [1] is supercool for many reasons. Most important from my point of view is the attention to introgression among the different species of these South American butterflies.

    The Heliconius reference genome allowed us to perform rigorous tests for introgression among melpomene–silvaniform clade species. We used RAD resequencing to reconstruct a robust phylogenetic tree based on 84 individuals of H. melpomene and its relatives, sampling on average 12 Mb, or 4%, of the genome (Fig. 1a and Supplementary Information, sections 12–18). We then tested for introgression between the sympatric co-mimetic postman butterfly races of Heliconius melpomene amaryllis and H. timareta ssp. nov. (Fig. 1) in Peru, using ‘ABBA/BABA’ single nucleotide sites and Patterson’s D-statistics (Fig. 3a), originally developed to test for admixture between Neanderthals and modern humans 21, 22 (Supplementary Information, section 12). Genome-wide, we found an excess of ABBA sites, giving a significantly positive Patterson’s D of 0.037 ± 0.003 (two-tailed Z-test for D = 0, P = 1 × 10−40), indicating greater genome-wide introgression between the sympatric mimetic taxa H. melpomene amaryllis and H. timareta ssp. nov. than between H. melpomene aglaope and H. timareta ssp. nov., which do not overlap spatially (Fig. 1b). On the basis of these D-statistics, we estimate that 2–5% of the genome was exchanged between H. timareta and H. melpomene amaryllis, to the exclusion of H. melpomene aglaope. (Supplementary Information, section 12). Exchange was not random. Of the 21 chromosomes, 11 have significantly positive D-statistics, and the strongest signals of introgression were found on the two chromosomes containing known mimicry loci B/D and N/Yb (Fig. 3b and Supplementary Information, section 15).

    The paper goes on to demonstrate that color patterning genes have introgressed preferentially in cases where one geographically variable species mimics the local variants of another. Mimicry in these butterflies amounts to genetic theft, pure and simple.

    I'll point out that the introgression of 2% of the genome is not a small amount. In the case of these butterflies, introgressed regions are clustered in particular areas, and some of them appear to have happened under the influence of selection (adaptive introgression). Still, there must be some strong reinforcement selection keeping the "species" reproductively separate enough to maintain their gene pools in the face of large-scale sympatric hybridization. Either that, or the current pattern is really a temporary snapshot of a longer, dynamic process of population dispersal and introgression.

    There's also a section describing the extent of the chemosensory genes in butterflies, which have more than moths (34 compared to 23) despite their diurnality and greater reliance on visual cues. Funny to read of these being the most complicated insect olfaction systems yet known, considering the hundreds of olfactory receptors in mammalian genomes (UPDATE 2012-05-17: the paper refers to odorant-binding and chemosensory families, which are a subset of the total olfaction system [2]).


    References

  • Blond as a window to ancient pigmentation variation

    Sat, 2012-05-05 13:57 -- John Hawks

    Blond hair is relatively common in island Melanesia, even though the skin pigmentation of Melanesian peoples is relatively dark. Eimear Kenny and colleagues report in this week's Science that one SNP variant in the gene TYRP1 explains a high proportion of the variance in hair color in this population [1].

    Resequencing of TYRP1 exons detected a single previously unknown polymorphism, a C-to-T transition at chr9:12,694,273 (GrCH37/hg19), that corresponds to a predicted arginine-to-cysteine mutation (R93C) in exon 2 of TYRP1 at amino acid position 93 (TT in blond- and CT or CC in dark-haired individuals)...[more on assessing effect in a GWA panel].

    We genotyped R93C in 918 Solomon Islanders for whom we had measured hair pigmentation with spectrometry. A recessive model provided the best fit for the data, and R93C genotypes accounted for 46.4% of the variance in hair color (linear regression; P = 2.19 × 10−90; Fig. 1D and table S2). The frequency of the 93C allele in the Solomon Islands is 0.26, and genotyping of R93C in an additional 941 individuals from 52 worldwide populations revealed that the 93C allele is rare or absent outside of Oceania (table S3). Furthermore, we found no evidence for recent gene flow from Europe (i.e., admixture) (figs. S5 and S6) nor a strong signature of recent positive selection for the 93C allele (figs. S9 to S11).

    This paper is very short, only a few paragraphs. When I read through it, I got one impression of the results, and that impression changed greatly when I looked into the supplement.

    Some underreported facts:

    1. The blondness allele is present in all the samples from the Solomon Islands, at a frequency as high as 49% in a large sample from Malaita. In this study, the authors found it at its lowest frequency in "Polynesian outlier" islands near the Solomons.

    2. The allele was not found in any of the HGDP samples, even when they were genotyped specifically for this study. That includes the "Melanesian" and "Papuan" samples. These two are relatively small in HGDP (n=14 and n=16 in this study) but even so would probably present this allele were it present at anything like the frequency in the Solomon Islands.

    3. The text of the paper reports that a recessive effect model is the best explanation for the relation of hair pigmentation and TYRP1 genotypes. The supplement shows that the recessive model is only very slightly better than a "codominant" model, as it only explains an additional 3 percent of the variance. In the best case considering this allele along with age and geographic origin of the individuals, only 48% of the variation of hair pigmentation can be explained. That leaves 52% to be explained by other genetic and nongenetic causes. There may be a lot of genetic background, which may include more alleles of large effect.

    4. Skin pigmentation varies greatly among these Solomon Islands samples, with more than a third of the overall variance in skin pigmentation explained by geography. The tables don't make it clear how pigmentation is patterned by geography. The TYRP1 allele that is the subject of this paper does not explain much variation in skin pigmentation.

    5. Sex and age have strong effects on hair pigmentation in this sample, but not on skin pigmentation. Again, these point to background genetic factors. Many populations have sex and age effects on hair pigmentation, so some of the additional causal factors may be widely shared.

    I began looking more deeply into TYRP1 R93C for a couple of reasons. The prehistory of human populations in the Solomon Islands goes back more than 30,000 years. Because this allele is not present in mainland Asian populations, as far as we know, but it is present thoughout the Solomons, suggests that it may have become common at or near the initial founding of this population. The LD pattern around the mutation likewise suggests that it has been segregating in this population for a long time. The data are consistent with the idea that blond phenotypes were present in the Solomon Islands as early as the initial colonists who founded the population.

    It will be interesting to look further into nearby populations to see if it characterized early colonists more broadly. Blond phenotypes occur very commonly in Aboriginal Australians, also age-dependent in expression, as many children have blond hair that darkens with age. Other Melanesian islands, such as Vanuatu and Fiji, also have a high incidence of blondness. For the islands, I expect that the same allele will be responsible for a similar fraction of the variance. For Australia, I would guess that this allele is also present, but with 40,000 years of evolution, there could well be a more diverse genetic explanation.

    Pigmentation variation in Eurasia is clearly a phenotype that has been affected both by recent positive selection and selection on old, standing genetic variants. Europe and East Asia today each have a dozen or more alleles that individually have strong effects on skin, hair, or eye pigmentation. Many of the alleles common in one region are rare in the other. These are well explained by recent selection on pigmentation; if there had been no selection on pigmentation, the populations would not show as extensive a pattern of differences, and new alleles would not have reached high frequencies. But if we had only a single mutation at 30 percent distinguishing one of these populations, which had arisen as early as 30,000 years ago, we would not have a strong case for selection.

    In Melanesia, we have just the opening sketch of pigmentation variation. We know that there is substantial variation in skin and hair pigmentation, and that one mutation unique to this part of the world explains a large fraction (but still a minority) of the variance in hair pigment. The other genes that contribute to variation in hair and skin pigmentation are not known. Possibly, skin pigmentation variation among the geographic regions in this study may reflect late prehistoric migration of people through this region, as agriculture moved into the area and Polynesia was settled. But the genetic part of this story remains to be demonstrated.

    Both Asia and Europe have a similar pattern of selection which has favored new alleles along with some old, standing alleles. Across the temperate regions of Europe, East Asia, and the Americas, it is plausible that the disadvantages of dark pigment for vitamin D production manifested themselves. It is also plausible across these regions that the advantages of dark pigment as protection from UV radiation would have been relaxed, allowing sexual selection on pigmentation to play an important role.

    The evidence here suggests that this allele in Melanesia has not been recently selected from a new mutation. Additionally weighing against recent selection is the observation that the mutation acts recessively on hair pigmentation -- recent selection is much more likely for mutations with dominant or additive effects.

    Together, these observations suggest that variation in human pigmentation emerged in stages. Some genes, such as ASIP, have old alleles that explain some of the variation in pigmentation today and are geographically ubiquitous, in Africa, Eurasia, and the Americas. This genetic variation was older than the Late Pleistocene. Such genes (ASIP is probably an example) today have alleles associated with darker pigment that are common in sub-Saharan Africa. Probably many other genes have variation within Africa that are part of the ancestral pigment variation of humanity. As people dispersed throughout the world, mixing with archaic humans, they carried some of these pigmentation variants along with them.

    What's interesting is that even though some of these ancient alleles lighten skin pigmentation, they remain segregating in today's light-pigmented populations. They were not selected to fixation, even though there was plenty of time for them to increase toward fixation, and even though strong selection on pigmentation appears to have been present in many high-latitude populations. Later mutations that lighten pigmentation were strongly selected in these same populations, some reaching very high frequencies, while the old mutations still were not selected to fixation.

    The story is of course more complex than a simple count of standing and new mutations. Some genetic changes that lighten pigmentation may have countervailing negative effects. Solving the problem of becoming light pigmented in just the right way may really be a different problem in different populations. Founder effects may have shifted the genetic background of early Eurasian populations just enough to create strong path-dependence for later mutations, allowing some to proceed rapidly and blocking the rise of others.

    The story of TYRP1 gives a new perspective on the early evolution of pigmentation outside Africa. Here is a novel allele that originated within the earliest colonists to Oceania, which affects hair pigmentation strongly, in a population that was always low-latitude. It did not come from earlier archaic humans as far as we know so far (not in the Denisova genome). It may have become common by a founder effect. We cannot rule out selection, such as social or sexual selection, as a cause of its initial spread or current geographic distribution, but we have no genetic evidence in favor of such selection. We know from the data that there must be many other loci that affect pigmentation in this population.

    This may have been much like the original pigmentation genetics of early modern human populations. It may also be much like the pattern that accounts for pigmentation variation within Africa today. It is not a simple story in which a few loci of large effect explain the evolutionary pattern. It is a story in which a substantial store of segregating variation persists within populations for tens of thousands of years.

    Why does that matter? Here's one reason: We're looking at possible pigmentation variants in archaic humans, and we have counted many of them. Anyone might begin this project with the presumption that Neandertals and Denisovans had pigmentation variants that were fixed relative to living people. In that context, it would be surprising to find that they had not introgressed.

    But if all these ancient populations had a large store of small-effect variants affecting pigmentation, a mutation that we find in one individual might have been rare in the population. The TYRP1 R93C allele varies from 5 to 50 percent in the Solomon Islands samples. We already know that the MC1R coding variant in some Neandertals is not found in the Vindija genomes. Variation in pigmentation loci may have been ubiquitous in human populations, with few fixed alleles separating populations. The ancient landscape was more like ASIP than SLC24A5.


    References

    Synopsis: 
    Pigmentation genetics in the Solomon Islands gives some perspective on the process of phenotype evolution

Pages

Subscribe to introgression

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.