john hawks weblog

paleoanthropology, genetics and evolution

introgression

  • Olivia Judson: Museum collections in the DNA age

    Wed, 2009-08-05 12:32 -- John Hawks

    In "Dawn at the museum", Olivia Judson points to the huge potential of ancient DNA techniques to wring new answers out of old taxidermied specimens.

    I think this is one thing among many -- consider also chemical/isotopic analysis, new microscopic techniques to examine histology or look for hidden pathogens, CT scanning specimens to study internal structures -- that are revitalizing museum science. Those of us who work in museums recognize the huge activity behind the scenes, enabling and advancing scientific inquiry. It's not like the Relic back there.

    Judson ends on an example that no doubt excited her inner Dr. Tatiana:

    Or take the American black duck. During the 19th century, black ducks were the most common duck to the east of the Appalachians. That changed in the 1940s, when mallards started to arrive in large numbers; by 1969, mallards had become more common than black ducks. Moreover, genetic analysis of modern specimens shows that the two species are close — so close that they might as well be considered one.

    Again, it wasn’t always thus. DNA analysis of museum specimens collected before 1940 show that black ducks and mallards used to differ much more markedly. So what has caused the change? Hanky panky. Yes, members of the two species have been interbreeding. There are even hints that the female black ducks prefer to mate with male mallards.

    Using museum specimens to establish the historical course of genetic introgression breaking down species barriers. Cool.

  • A new study of genetic introgression and human ancestry

    Fri, 2009-05-08 00:49 -- John Hawks

    Fed up on hobbit news? Well, I'm going to do my best this week to scoop the science journalists, covering stories in paleoanthropology that ought to get some more attention but might be drowned out by otherwise hobbitrocious stories.

    I'll start with a story in which I have a special interest -- a new paper by Jeff Wall, Kirk Lohmueller, and Vincent Plagnol, titled, "Detecting ancient admixture and estimating demographic parameters in multiple human populations."

    A couple of years ago, Wall and Plagnol (2006) looked at a sample of genes in the "Environmental Genome Project. At that time, the sample consisted of 135 genes in 12 Yoruba and 22 CEPH individuals. It's not a large sample by today's 3.9-million genotype standards. But the EGP sample has one important thing going for it -- with resequencing data, we have access to a much larger number of mutational differences at very small map distances from each other. Tight linkage between sites means that we can use the genealogical properties of samples to examine much more ancient events. The HapMap gives us a vast number of genotypes from a large sample of individuals, but the density of loci is quite low -- an average of nearly 1000 base pairs between loci. The EGP doesn't sample as many loci, but it gives a denser representation of the variation at each locus. Only this kind of sample is sufficient to test for genetic ancestry of modern human populations in ancient populations of the Middle Pleistocene.

    Plagnol and Wall applied a simple admixture model to these data, and found that the complete out-of-Africa replacement model did not adequately explain the variation in the European-derived sample. Instead, they found that a model with 5 percent admixture of some non-African Middle Pleistocene ancestral population was a much better fit for the current diversity of European gene trees. In other words, the low variation of recent humans cannot be explained by a small population in a single ancient population; instead, there must have been several populations, partly isolated from each other, one or more of which gave regionally-specific alleles to modern Europeans. Multiregional evolution fits those observations very well -- this is not one or two introgressive genes, and there is no specific evidence of selection on them (although selection may be responsible).

    A number of people picked up on that study in the course of later work. Gregory Cochran and I discussed it in our own 2006 paper about genetic introgression. In late 2005, Dan Garrigan and colleagues had published their own analysis of a pseudogene region on the X chromosome, called RRM2P4. Garrigan reviewed this work together with Mike Hammer (2006) and again with Sarah Kingan (2007). Early last year, I also reviewed the evidence together with Cochran, Henry Harpending and Bruce Lahn (2008).

    We and many other people are following up on this research, trying to discover the ancestry of human populations beyond the simple out-of-Africa replacement scenario. In the new study, Wall and colleagues extend their analysis to a more recent release of the EGP, including 222 genes, and adding 24 Chinese individuals to the 12 Yoruba and 22 CEPH individuals. It's a simple paper and relatively short. In a word, they find that their data reject the simple out-of-Africa replacement scenario, and that the genetic variation of coding genes in their sample must be explained in part by long-standing population structure.

    It's not proof that the Neandertals, or any other particular group of ancient humans, survived and passed their genes on to more recent people. This is a study of the genes of recent human populations, and it merely concludes that their ancestors could not have lived in a single small population. Maybe every Neandertal became extinct, and present-day Europeans got this genetic variation from somewhere else. But it is logical to figure that non-Afircan populations may have been among the contributors to present non-African peoples -- particularly since the statistical test focuses on region-specific gene frequencies. The study also finds evidence that today's African population has a complex ancestry -- a kind of multiregional scenario playing out inside Africa (or potentially involving gene flow back into Africa from elsewhere).

    Testing for admixture

    Wall and colleagues reasoned that an allele coming in from an ancient, partially isolated human population would vary in a distinctive pattern. Because of the long history of partial isolation in an ancient subpopulation, they expected that such an allele would come in with multiple mutational differences from the non-introgressive allele. And if it came in from some non-African population, it ought to show relatively strong differences in frequency between populations. So they devised a statistic, mathematically combining FST and a linkage measure -- the idea being to detect alleles that differentiate populations and that are surrounded by large sets of tightly linked polymorphisms.

    This kind of pattern might also occur under positive selection. But a new mutation under positive selection would start out weakly linked to nearby polymorphisms, each of which already exists at some substantial frequency in the population. An introgressive allele might be linked to several other unique mutations that happened during the long period of limited gene flow between ancient populations. And a new mutation would not tend to be surrounded by high FST polymorphisms, until it got to be very common in the population -- up above 50 percent. In contrast, an introgressive allele coming into the population with several nearby mutations would generate a cluster of relatively high FST polymorphisms even at low frequencies. It may not be a perfect test for any individual locus -- there's a lot of uncertainty. But applied to more than 200 loci, it should be possible to test the hypothesis that "archaic admixture" is zero.

    Wall and colleagues do test that hypothesis, and they are able to refute it strongly for each of the three groups. Living European and Chinese samples refute the out-of-Africa replacement model with p<0.01. The Yoruba sample refutes the hypothesis of panmixia in ancient Africans at p<0.0000001.

    The authors also provide a supplementary table with a list of genes that may be candidates for introgression. I didn't see any really obvious genes on the list, but each of them bears some examination. I expect that we will be able to use more detailed analytical techniques to look at the regions around these genes and see what is going on. Or at least, in the next couple of years more and more resequencing data will become available, allowing us to test the same hypotheses with larger samples.

    It's worth pointing out that nothing in the approach of Wall and colleagues implies that any of the putative introgression occurred under natural selection. I've argued that introgression may have occurred under selection in ancient humans, but so far few other people have looked at the question with the idea of ancient selection in mind. No doubt we can improve a bit on the methods in the paper if we are willing to make some assumptions about the evolutionary dynamics involved in Late Pleistocene populations.

    Lingering uncertainty

    So what's not to like about this study? After all, here we have what appears to be strong evidence against an exclusive out-of-Africa replacement. It suggests that the ancestry of recent Europeans and Asians owes something to the Middle Pleistocene populations of those regions, and gives an estimate of that contribution consistent with what we know so far about the Neandertal genome.

    But I have to approach this study as critically as I would any other piece of population genetics. In this case, there is a clear weakness to their model. The authors tested for significance of a single parameter, which they call "archaic admixture." Consider their Figure 1, a schematic of their population model:

    Population model schematic from Wall et al. 2009

    Is "archaic admixture" significantly different than zero? Well, you can see that must depend on the values of no less than six other parameters. When did the European population start growing significantly -- was it after the Last Glacial Maximum? During the Neolithic? The Aurignacian? How about the African population? Was there really a long bottleneck in the ancestry of Europeans?

    The reason why I'm so critical of population models used in genetics is simple. The authors of studies almost never try to make the simplest effort to justify these kinds of parameters against the archaeological or fossil record. Their conclusions -- in this case, the significant finding of ancient admixture -- depend on some range of values for these other parameters.

    Now, Wall and colleagues take a fundamentally different approach than I would use. I would draw upon non-genetic sources of information about these parameters, to increase confidence about the others. In contrast, they performed a broader range of simulations, attempting to find maximum likelihood estimates for all the parameters simultaneously.

    The problem with that approach is that it's hard to say that some other parameters may not have been more important. Consider recent positive selection. As I mentioned above, a recent positively selected mutation could in principle create a pattern like that described for an introgressive allele -- at least under the statistics used in this paper. The chances are low for any randomly chosen mutation under positive selection, because a new positively selected mutation isn't likely to be linked to other rare mutations -- it's much more likely to be linked to common polymorphisms. But if we actually have many hundreds, or even thousands, of recently selected alleles (as we do in humans), then there is a pretty good chance that some of them will look like introgression under the test used here. Another scenario that could mimic introgression under this statistical approach is long-standing balancing selection.

    There are probably too many genes on these lists for all of them to reflect selective balances or recent positive selection -- there are a lot of recently selected genes, but few of them will have the specific kind of linkage that would show up as significant in this study. But I think the authors could do more to validate the demographic model against non-genetic evidence. Besides that, there is plenty of morphological evidence for gene flow among these ancient human populations. The authors would be well-served to work more directly with the morphological record of human evolution -- when they write that:

    To our knowledge, the question of ancient admixture in other parts of the world has been relatively neglected by the evolutionary genetics community

    it is both true and sad. There is abundant anatomical evidence addressing the issue of genetic continuity or gene flow in parts of the world other than Europe.

    UPDATE (2009-05-08): Dienekes also looks at the paper, and suggests that finding evidence for ancient population structure in Europe and East Asia may be no big deal, because it may simply derive from population structure within Africa before the putative out-of-Africa migration. I'd have to review the data to be sure, but it seems to me there are two arguments against that explanation:

    1. The East Asian and European comparisons come up with different genes showing evidence of putative introgression. There's not a lot of overlap between the sets. If this were merely ancient East African genes, we'd expect the populations outside Africa to have the same ones. And the numbers had actually been cut down by the serial founder effect scenario (Chinese having undergone more and larger bottlenecks), then we'd expect China to have a subset of the European introgressive genes. I wouldn't go out on a limb about this without looking at the actual frequencies of the supposed ancient alleles, but the pattern isn't consistent with Europe and China being drawn randomly from the same ancient African population.

    2. The entire point of the out-of-Africa replacement idea is to draw humans from an unstructured ancient population. Humans have to be inbred to explain the low genetic variation today. A long bottleneck in Africa is one explanation for this inbreeding -- but the bottleneck has to have been severe, down to an effective size around 10,000, and it has to be very long. A long history of population structure within Africa works against that bottleneck -- population structure featuring several partially isolated populations would prevent the kind of inbreeding that a long bottleneck could create. If Wall and colleagues are correct, we would have to scrap the long bottleneck idea and come up with some other explanation for high inbreeding. There are some others, as I've pointed out before.

    There are other arguments against exclusive continuity outside Africa, and in favor of some significant -- perhaps overwhelming -- gene flow from Africa into the rest of the world during the late Pleistocene. But no other argument is exclusive of some continuity outside Africa. And if we don't need the bottleneck anymore, accepting some continuity is the reasonable explanation for the facts that don't fit, including the observations in this paper and the morphological and archaeological evidence suggesting continuity.

    References:

    Evans PD, Mekel-Bobrov N, Vallender EJ, Hudson RR, Lahn BT. 2006. Evidence that the adaptive allele of the brain size gene microcephalin introgressed into Homo sapiens from an archaic Homo lineage. Proc Nat Acad Sci doi:10.1073/pnas.0606966103

    Garrigan D, Kingan SB. 2006. Archaic human admixture: A view from the genome. Curr Anthropol 48:895-902. doi:10.1086/523014

    Garrigan, D., Mobasher, Z., Severson, T., Wilder, J. A., Hammer, M. F. 2005b. Evidence for archaic Asian ancestry on the human X chromosome. Mol. Biol. Evol. 22:189-192. doi:10.1093/molbev/msi013

    Hardy, J., Pittman, A., Myers, A., Gwinn-Hardy, K., Fung, H. C., de Silva, R., Hutton, M. and Duckworth, J. 2005. Evidence suggesting that Homo neanderthalensis contributed the H2 MAPT haplotype to Homo sapiens. Biochemical Society Transactions 33:582-585.

    Hawks J, Cochran G. 2006. Dynamics of adaptive introgression from archaic to modern humans. PaleoAnthropology 2006:101-115. Open access

    Hawks J, Cochran G, Harpending HC, Lahn BT. 2007. A genetic legacy from archaic Homo. Trends Genet doi:10.1016/j.tig.2007.10.003

    Plagnol, V., Wall, J. D. 2006. Possible ancestral structure in human populations. PLoS Genet. 2:e105. doi:10.1371/journal.pgen.0020105

    Wall JD, Lohmueller KE, Plagnol V. 2009. Detecting ancient admixture and estimating demographic parameters in multiple human populations. Mol Biol Evol (early online) doi:10.1093/molbev/msp096

    Zietkiewicz, E., Yotova, V., Gehl, D., Wambach, T., Arrieta, I., Batzer, M., Cole, D. E., Hechtman, P., Kaplan, F., Modiano, D., Moisan, J. P., Michalski, R., Labuda, D. 2003. Haplotypes in the dystrophin DNA segment point to a mosaic origin of modern human diversity. Am. J. hum. Genet. 73:994-1015.

  • The 10,000 Year Explosion

    Thu, 2009-02-19 00:22 -- John Hawks

    I want to point people interested in recent human evolution to a new book, The 10,000 Year Explosion: How Civilization Accelerated Human Evolution.

    The authors, Gregory Cochran and Henry Harpending, are good friends of mine, and I have worked with them on some of the material covered in their book. So, you can hardly expect me to give an unbiased review!

    However, I have now heard from a number of people not connected to the authors, who have read the book and enjoyed it. So it's not just me.

    T. J. Kelleher reviewed the book in SEED, bringing out several interesting points:

    Cochran and Harpending also find value in such work [as the Genographic Project], but they argue for a fuller appreciation of the geographic distributions of genes, and in doing so, they herald a new era not only in biological anthropology, but also for history. They do not stop with what information about human history can be found in the genes, precisely because many gene variants are not neutral. Where the usual geographical analysis treats the distribution of genes as an effect of history, in Cochran and Harpending's view, the genes themselves are a cause: Two variants in the same gene do not necessarily have the same effect, and the relative selective advantages and disadvantages of them will — not surprisingly, to anyone versed in evolutionary biology — influence the movements of genes through populations over both space and time.

    That's a very ambitious agenda. On the way there, the book covers several topics of great interest to me. Naturally recent evolution by natural selection, particularly in post-agricultural populations, comes to the fore. The possible introgression of genes from Neandertals, as another source of possible adaptive variation in recent human evolution, also gets a chapter.

    With this background in place, Cochran and Harpending explore some hypotheses that may link the distinctive histories of human groups to recent genetic changes and exchanges. One is the expansion and dispersal of Indo-European languages, a series of events that anthropologists have tried to connect to a jumble of different factors, ranging from conquering hordes of steppe nomads to conquering hordes of Anatolian farmers. Cochran and Harpending suggest that pastoralism and the resulting population growth connected to milk consumption was the prime mover.

    Another hypothesis connects the psychometric literature on Ashkenazi Jewish people to some of the distinctive genetic disorders common in that population, such as Tay-Sachs disease, Gaucher disease, torsion dystonia and others. In a nutshell, Cochran and Harpending suggest that natural selection for general intelligence has occurred during Ashkenazi history, resulting in a distribution of IQ between 0.5 and 1 standard deviation above the European mean.

    I can just imagine many readers twisting in their chairs when reading this chapter. And they should: relying upon both documentary evidence and whole-genome surveys of variation Cochran and Harpending puncture several myths about Jewish history, psychometrics, and admixture of populations. In the past, human geneticists have been all-too-willing to believe completely bogus scenarios of population history. The idea that Ashkenazim underwent a severe and prolonged population bottleneck, completely isolated from the surrounding European population, is one of the most pernicious of these scenarios. Cochran and Harpending's hypothesis, that alleles causing sphingolipid storage disorders were positively selected in Ashkenazi populations of the last 1500 years, is plausible and certainly testable. Plausibly, these alleles may have been selected for their roles in some other function, although none suggest themselves. The bottleneck theory, on the other hand, is not plausible, refuted by both the historical record and the genomic variation of living people of Ashkenazi descent.

    I found the book to have a good combination of humor, interesting anecdotes, and description of new science. I've read most of the recent popular books about human evolution or genetics. To me, this one stands above the others. Maybe that's because I'm already thinking hard about the central proposition -- indeed the subtitle -- that "civilization accelerated human evolution." Like I said, I'm hardly unbiased.

    But I think it's mostly more fun than most other genetics books. Some science writers cover their tentative approach to genetics by using dark, brooding prose. This book doesn't suffer from ponderosity, and its organization helps -- divided into dozens of little stories with odd historical facts, it's the kind of book you can stash in your bag for bus rides.

    UPDATE (2009-02-19): I wanted to mention that Cochran gave a wide-ranging interview about the book to 2blowhards.

    I had fun reading the interview because Cochran's suffer-no-fools attitude toward purely speculative ideas about recent evolution. He's looking for testable ideas, not mere generalizations. My favorite quote, with reference to an idea about behavior and mythological characters:

    I think this line of analysis is about as sound and solid as Citibank.

    Harpending also makes an appearance worth quoting, referring to the question of how much of the book is scientifically established and how much is speculative:

    The basics are secure -- population genetics, demography, history, etc. But there are certainly a number of hypotheses we have that are not solidly established. But that is the way science works. If something were rock solid it would be widely known and would be too boring to talk about in the book. We don't spend a whole lot of time for example on malaria defense polymorphisms.

    We really hope to see our hypotheses tested, maybe modified, maybe falsified, or not. We don't believe them in any strong sense. Whenever you read a scientist who is deeply committed to his or her ideas, hang on to your wallet!

  • The Neandertal genome FAQ, February 2009 edition

    Tue, 2009-02-17 16:38 -- John Hawks

    I was out of town last week when the Max-Planck Institute made its announcement about the completion of 1x coverage of the Neandertal genome. It was an exciting day for me. Already, I had scheduled a number of radio shows and a public lecture to commemorate Darwin Day. Several press interviews regarding the news of the Neandertal sequencing project added to the hectic nature of the day, so I didn't get a chance at the time to sit down and write my reactions.

    So, nearly a week later, I've finally caught up. I've answered many questions about the Neandertal genome before, so I'm focusing these on the current announcement.

    For answers to other kinds of questions, try these posts:

    And now, some new questions arising this week:

    Has the Neandertal genome now been reconstructed?

    No. This announcement is a milestone, not an endpoint.

    Much remains to put together an entire genome sequence. The ongoing work represents a massive technical achievement, and is well worth celebrating. But we are not yet at the point where we can talk about structural variants in the Neandertal genome compared to humans, length polymorphisms, or a number of other things. Plus, as noted below, only 63 percent of the nucleotides have been sequenced once -- leaving a lot of basic sequencing left to get even a single pass over the whole genome.

    Some stories have used the term "decoded" -- that also would be a misstatement. We don't know the import of the variations that might so far have been found. That is, we cannot yet convert the information that Neandertal sequences provide to us about their genome into information about their phenotypes. Keeping that in mind, "decoding" the human genome is an ongoing process. With the Neandertals, we have barely begun.

    I heard that this was a whole Neandertal genome, but then the fine print says that it's only 60 percent completed. What gives?

    They set up an announcement when they knew they would be past sequencing 3 billion bases. And in fact they've reached 3.7 billion.

    That would be more than the whole genome, if they could pick out exactly which parts they are sequencing. But the shotgun sequencing approach they are using means that some parts of the genome are represented several times in their 3.7 billion bases, while others are not represented all.

    It's sort of like painting your house. You could calculate how many gallons it would take for "full coverage" with a paintbrush, but if you shoot that many gallons out of a paint gun there are going to be a lot of gaps that didn't get painted.

    For the Neandertal sequence right now, the gaps add up to around 36 percent of the whole genome. Which is an awful lot of missing data.

    So why make an announcement now? I dunno. Darwin's birthday makes a good occasion? They could easily have published last year or the year before on many different genes, just as they published the whole mtDNA last year. It seems likely to me that they've been holding off announcing or publishing until they were sure they had worked out a solution to the contamination problems they were having.

    I think they deserve to pop some champagne bottles and celebrate. When there is a public data release, we can all celebrate!

    What about those contamination problems?

    If you've been around a while, you'll remember that I thought the initial report of contamination was a bit overblown. Nevertheless, the possibility of substantial contamination, documented by comparisons between sequencing methods, stopped almost all work on the publicly available data. It was a serious problem, and the research groups responded seriously to the presence of contamination in the samples. Few details of this response were made public, but clearly there was a concern that the longer fragments coming out of the 454 machine didn't originate in the Neandertal sample.

    According to the Max-Planck press release, they've taken a number of steps to eliminate contamination. I'll quote the relevant sections:

    One essential element developed by Pääbo’s group was the production of sequencing libraries under “clean-room” conditions to avoid contamination of experiments by human DNA. They also designed DNA sequence tags that carry unique identifiers and are attached to the ancient DNA molecules in the clean room. This makes it possible to avoid contamination from other sources of DNA during the sequencing procedure, which was a problem in the initial proof-of-principle experiments in 2006. They also used minute amounts of radioactively labeled DNA to identify and modify those steps in the sequencing procedure where losses occur. Together with other advances implemented during the project, these innovations drastically reduced the need for precious fossil material so that less than half a gram of bone was used to produce the draft sequence of 3 billion base pairs.

    In order to reliably compare the Neandertal DNA sequences to those of humans and chimpanzees, the Leipzig group has performed detailed studies of where chemical damage tends to occur in the ancient DNA and how it causes errors in the DNA sequences. The researchers found that such errors occur most frequently towards the ends of molecules and that the vast majority of them are due to a particular modification of one of the bases in the DNA that occurs over time in fossil remains. They then applied this knowledge to identify which of the DNA fragments from the fossils come from the Neandertal genome and which from microorganisms that have colonized the bones during the thousands of years they lay buried in the caves. They have also developed novel and more sensitive computer algorithms to put the Neandertal DNA fragments in order and compare them to the human genome.

    I'm satisfied that they've done everything possible to eliminate contaminants. The examination of the chain of events from extraction to the final sequence is especially important. In many ancient bones, the steps taken to sterilize and extract from deep within the bone somehow still don't eliminate contamination in the final sequence data. Most of that contamination must arise during the processing and sequencing steps, despite the oft-quoted "clean room" conditions in ancient DNA labs. So the methodological advances toward understanding the sources of contamination are very scientifically significant.

    There's a hint in some of the earlier press coverage that the pace of sequencing has vastly sped up in the last few months. For example, in December, Ewen Callaway reported that the genome was halfway done:

    Half the Neanderthal genome has been decoded and the rest should be sequenced by year's end, a scientist involved in the project told a human evolution conference last week.

    Researchers will roll out a rough draft of the Neanderthal nuclear genome after their sequencers have read every letter in the genome on average once - "1x coverage" in genomics speak.

    Callaway is a careful reporter, but we should keep in mind that the comments in the story might not quite have conveyed the full situation. Still, if we take that assessment at face value, we can speculate that the process of working out the contamination issue took a long time during which sequencing was relatively slow or paused. If they actually had only sequenced half the 3 billion bases by December, that's pretty fast work since then (a perception that was echoed in some press reports prior to the announcement).

    The switch to the Illumina platform seems like an underreported aspect of the story. The press release claims that a billion reads were done on the Solexa, compared to only 100 million from 454 -- that also suggests a switch later in the process, since we know that they were using 454 initially and through early 2008. The press release doesn't explain why they moved from the 454 machine to Illumina. Maybe it's just efficiency of the current platform, but there must be a story there.

    What was the most boring aspect of the announcement?

    I was talking to a reporter on Tuesday before the press conference, and I said,

    "They're no doubt going to give us a list of some genes, with well-known variations in living people, that they've genotyped in Neandertals. And, aside from FoxP2, which we already know about, and microcephalin, I don't know what those will be. I think it would be the most boring possible outcome if they told us that the lactase persistence allele wasn't there. Because there's no news there.

    Well, I gave a big belly laugh when I saw the press release. Gee, Neandertals didn't have lactase persistence. Big surprise there! What did they think, they were secretly milking goats?

    OK, I admit, that's overly snarky. I mean, what if they'd found the opposite? It would be contamination, of course. So finding the wild-type lactase allele is worth something.

    But it's sort of like if your friend was looking through a telescope on Christmas Eve and caught the first-known glimpse of Santa and his reindeer. And you asked her, "What does he look like?" And she says, "He's wearing a red coat!"

    It's like being trapped in a Laurel and Hardy routine. And I'm Hardy.

    Does the Neandertal genome show that they were "distinct from us"?

    Experts on Neandertal bone morphology can readily distinguish them from later Europeans, assuming that the correct parts of the skeleton have survived. So from that perspective, Neandertals were clearly a "distinct" population. They had a morphological configuration no longer found anywhere in the world, and not found in the Europeans who immediately followed them in Europe.

    On the other hand, the bones of early Upper Paleolithic Europeans share some interesting similarities with the Neandertals. You wouldn't call the Oase 1 cranium a Neandertal. It lacks nearly all of the features that set Neandertals apart. But it has a mandibular foramen shaped like a small horizontal oval -- like a bit over half of Neandertals, and nearly a quarter of early Upper Paleolithic mandibles. This is a very rare morphology today, and it is rare elsewhere in the human fossil record, although it has been found in the very early Homo erectus sample from Dmanisi. There are two hypotheses for why this feature and others should be most common in two populations living in the same place in adjacent time periods: descent or parallel evolution.

    Looking only at the morphology, we have only our personal limit of credulity to argue one way or the other. How many features does it take to be convinced that descent must explain some of the similarities? Sadly, the answer to this question is different for different researchers.

    I think that the most reasonable explanation for the morphology is gene flow between Neandertal and other populations. But I have to say that others disagree.

    Genetic evidence may be most useful because we are much more likely to agree on the score. A unique gene sequence is unlikely to arise twice in parallel, and in any event the probability of such parallelism can be calculated in real numbers, not shopworn guesses. With 3 billion base pairs to compare between our populations, we have a good chance of finding and quantifying even low levels of genetic exchanges.

    However, these conclusions still depend on assumptions and models that not all anthropologists agree about. At the moment, the state of the science is such that the meaningful distinction is not whether Neandertals and humans may have interbred, but instead whether such interbreeding was common enough to be evolutionarily important, or to establish Neandertals as a "distinct" population. Since "important" and "distinct" do not have quantifiable meanings in evolutionary theory, you can see that we have a long way to go before paleoanthropology agrees on testable models of Neandertal population history.

    I think the science will be lively for the next few years, as the focus goes away from details of morphological characters and toward details of evolutionary models. The morphology will still remain important -- particularly as the observable evidence of variation within ancient populations. It will take many years before we have a good picture of genetic variability within these samples. But questions of "distinctness," which depend on shared characters and levels of interbreeding, must be answered at the level of models, not features.

    What about microcephalin?

    According to the press conference, the human-derived allele of MCPH1 was not found in the Vindija sequence. Bruce Lahn and colleagues had suggested that this allele might have come into the recent human population from Neandertals, based on its present pattern of variability. This allele is quite divergent from the rest of human variation at the locus, it is common outside of Africa but rare inside of Africa, and it appears to have been under positive natural selection for around 30,000 years. I have an FAQ on MCPH1 and introgression, and I've published on the topic. If the human-derived allele is not in the Neandertal genome, that obviously weakens the argument for introgression of this gene from Neandertals.

    We have interpreted this gene cautiously from the beginning. Neandertals are one likely source for such introgression, but not the only one. In my FAQ, I wrote this:

    Well, the D haplogroup [of MCPH1] is common in many areas outside of Africa in addition to Europe. So it isn't possible to really specify in what archaic population it may have originated. There is some chance that it may be found in the Neandertal genome sequence, when that becomes available. In fact, that would be the ultimate test for many candidate introgressive alleles.

    But there is a good chance that it won't be found in the Neandertal sequence. After all, Neandertals were probably pretty thin on the ground -- especially in Europe. A sampling of their genes would be sort of unlikely to yield a high proportion of archaic alleles that may have survived to the present day. So there is hope that we will find and document such alleles, but the best evidence for many of them may remain their current pattern of variation in living people.

    I think those points are important. There were not many Neandertals, and it may be much more likely for present-day humans to have genetic variation that originated in South or West Asia, or even multiple regions of Africa (a hypothesis suggested for some other gene loci).

    But I still think it very likely that out of the 20,000 genes in the human genome, some will have derived variants that were also present in the Neandertal genome. Human evolution over the last 50,000 or more years was driven by new variation, and multiple human populations would have been one of the largest potential reservoirs of adaptive variation for selection to work upon.

    What is the most important aspect of this announcement?

    Paleoanthropology is a science that generates huge public interest. But it gives very few chances for public participation. Those of us who are close to paleoanthropology know how much our science is driven by good ideas from many other fields. The pathways by which those insights enter our science tend to be highly constrained -- radiocarbon dating, scanning electron microscopy, isotopic analysis of enamel, and now genetics have all been brought into paleoanthropology by extremely skilled scientists from outside the field. I think that the Neandertal genome has the potential of breaking new ground.

    One year from now, there will be high school students working with sequences from the Neandertal genome. Who knows what they will discover?

    I just think that is tremendously exciting. For the first time, the primary data of paleoanthropology will be available to everyone.

  • Adaptive introgression of coat color in wolves

    Thu, 2009-02-05 15:22 -- John Hawks

    Mark Derr of the NY Times reports on a new study showing that black North American wolves got their melanism from dogs:

    In a bit of genetic sleuthing, a team of researchers has determined that black wolves and coyotes in North America got their distinctive color from dogs that carried a gene mutation to the New World.

    The finding presents a rare instance in which a genetic mutation from a domesticated animal has benefited wild animals by enriching their “genetic legacy,” the scientists write in Thursday’s Science Express, the online edition of the journal Science. Since black wolves are more common in forested areas than on the tundra, the researchers concluded that melanism — the pigmentation that came from the mutation — must give those animals an adaptive advantage.

    There are so many examples of this phenomenon in mammals now! This one is interesting because it would have been carried in by early dogs brought in via Beringia -- so it's another case where an intercontinental migration has brought a new adaptive allele that introgressed into a natural population.

    There is also a date:

    Comparing large sections of wolf, dog and coyote genomes, Dr. Barsh and his colleagues concluded that the mutation arose in dogs 12,779 to 121,182 years ago, with a preferred date of 46,886 years ago. Since the first domesticated dogs are estimated to date back just 15,000 to 40,000 years ago in East Asia, the researchers said that they could not determine with certainty whether the mutation arose first in wolves that predate that time, or in dogs at an early date in their domestication.

    This could have been selected in the very earliest domesticated dogs, based on that date. It would be useful to have a number of genomes from ancient wolves to screen against variation present in the wild population around the time of domestication.

    The really cool thing is that we will probably have samples like that within the next several years...

  • Dutch aurochsen 600 AD

    Sun, 2008-12-21 15:56 -- John Hawks

    Not a big story, but nice reminder that some extinct megafauna were still with us in historic times:

    Archaeological researchers at the University of Groningen have discovered that the aurochs, the predecessor of our present-day cow, lived in the Netherlands for longer than originally assumed. Remains of bones recently retrieved from a horn core found in Holwerd (Friesland, Netherlands), show that the aurochs became extinct in around AD 600 and not in the fourth century.

    Locally extinct, that is, since they survived longer to the east:

    The last aurochs died in Poland in 1627.

    So close.

  • An MAPT review

    Thu, 2008-11-06 23:26 -- John Hawks

    Elizabeth Pennisi writes this week a news focus in Science about the genome region labeled 17q21.31. I'm probably one of the few people who would recognize that address right away: A recently selected inversion in this region is one of the best candidates for introgression of Neandertal genes into recent Europeans.

    Eichler suspects that when H1 appeared, it somehow provided a strong fitness bonus and became much more common over time at the expense of H2. In Africans, H2 almost disappeared, except in the relatively few people who migrated to Europe 50,000 to 100,000 years ago. Then, for as-yet-unknown reasons, H2 provided its own advantage in the European population--as Stefánsson's data show--and the pendulum has begun to swing in the other direction.

    Hardy and, to a lesser extent, Stefánsson give credence to a more extreme explanation for the distribution of H2. Hardy thinks that H2 had disappeared from the modern humans moving out of Africa to populate the Northern Hemisphere but not from Neandertals, who reintroduced the inversion into the European gene pool through interbreeding with Homo sapiens 28,000 to 40,000 years ago. This view is not supported by the genetic evidence emerging from sequencing Neandertal DNA, and "I realize it's an off-the-wall idea," says Hardy. But he nonetheless thinks it's plausible.

    We covered the locus in our Trends in Genetics review earlier this year. Unless there are new data I don't know about, there is not yet any confirmation or test possible from the Neandertal genome. I'm not at all confident that there will be one, since detecting structural variants like inversions in the fragmented ancient DNA will not be trivial.

    This is very interesting also:

    The sequence comparisons also reveal that independently in humans, chimps, and orangutans, this 900,000-base region has reoriented itself into the H1 orientation, which explains why Eichler found both orientations in these primates. "This bit of DNA has been flip-flopping up and down. There must be an evolutionary reason for that, but we don't know what it is," says Hardy.

    Inversions shouldn't be trivial; on the average they should be deleterious. So finding inversion parallelism is curious. I wonder if there is some strategy variant here that might explain the flipping -- sort of like MHC alleles that can emerge in parallel?

    Pennisi spends much of the article describing the current medical research linking rare deletions in 17q21.31 with mental retardation:

    Now they have joined forces to describe 22 patients in molecular and clinical detail in a paper published online 15 July by the Journal of Medical Genetics. They calculate the prevalence of this new genomic disorder to be 1 in 16,000 newborns, and it may account for up to 0.64% of unexplained mental retardation in Europeans. "This is the first novel microdeletion syndrome identified and one of the most frequent ones," says collaborator Joris Veltman, a molecular geneticist at RUNMC.

    There are also links between the inversion polymorphism and schizophrenia and Alzheimer's, although these remain "enigmatic" because neither a biochemical nor a clear mutational explanation for the correlations has been found.

    Anyway, it's a good story and well worth reading as an example of how a single genetic region can fall subject to population genomics, medical genetics, contrasts of rare and common variants, structural variability, primate comparisons, and all the rest.

    It also shows how scientific attention can dogpile onto single genomic locations. That doesn't necessarily mean these are the best or only relevant candidates. Maybe more than anything, it means that grant reviewers recognize loci that have been interesting in prior studies, and reward further attention with more research dollars.

    UPDATE (2008-11-8): According to a reader, there's a rumor that the 17q21.31 region has been found in the Neandertal genome, and that the inversion (the putative selected version) is not present. That would weigh against the selected allele having been ubiquitous in Neandertals, although it cannot exclude that it may have been present. In any event, the practical effect would be to remove this as a likely case of introgression.

    The recurrence of inversions in this region in other hominoids remains very interesting...

    References:

    Pennisi E. 2008. 17q21.31: Not your average genomic address. Science 322:842-845. doi:10.1126/science.322.5903.842

  • Substitution rates and ancestral population sizes

    Thu, 2008-05-15 14:20 -- John Hawks

    The rate of neutral mutations varies across the genome. When studying a single gene, this variation in rates is not especially important -- it is generally possible to obtain an estimate of the neutral rate for a single locus by comparing just that locus among closely related species.

    But some comparisons involve looking at the pattern of variation among different loci. For instance, testing hypotheses about the ancestral populations leading to living species (like the common ancestor of humans and chimpanzees) involves comparing the amount of divergence among many independent loci. The variance in divergence times among loci gives an estimate of inbreeding in the ancestral population.

    I discussed this particular example two years ago this week, after the paper that proposed extended hybridization between ancestral hominids and chimpanzees. The conclusion of the paper was that the X chromosome displays much less divergence between humans and chimpanzees than the autosomes, and this might reflect a late introgression of the X chromosome into hominids from another population that (mostly) was ancestral to chimpanzees. The autosomes, by contrast, averaged very old genetic divergences, although there was substantial variance. As I concluded then, the data look consistent with a large population size in the human-chimpanzee ancestor species, coupled with greater selection on the X chromosome. The interpretation of large population size (or alternatively, the interpretation of long-term population structure) comes from the low inferred inbreeding in that ancestral population -- which caused the variance in divergence dates among loci.

    But there is another reason for a large variance in divergence dates: variance in mutation rates. Whenever mutation rates vary among loci, this variance adds to the variance among loci in their between-species genetic differences -- that is, the substitution rate. And as long as we are excluding selected sites (as we always try to do for these kinds of comparisons) we will overestimate the genetic diversity in ancestral species whenever the mutation rate varies among loci.

    A new paper by Svitlana Tyakucheva and colleagues looks at human and macaque genomes to find patterns underlying the variance in mutation rates among regions of the genome. They find that a number of factors may cause such variations, including chemical factors like the CG content of the genome, functional causes such as male versus female rates of recombination, and large-scale structural causes such as telomeric proximity:

    While a complete understanding of all biological mechanisms leading to variation in neutral substitution rates across the genome remains elusive, it is plausible that at least some of these mechanisms are conserved over relatively long evolutionary distances. For instance, both mouse-specific and rat-specific substitution rates are positively correlated with rodent-primate substitution rates [14], suggesting shared mechanisms persisting over ca. 90 million years [15]. Additionally, a positive correlation exists in substitution rates of homologous X- and Y-chromosomal introns that diverged from each other ca. 100 million years ago [16] (Tykucheva et al. 2008: R76).

    Their finding that male recombination is an important contributor to mutation rate heterogeneity puts the focus on the X chromosome -- which has little recombination in males -- as unusual. X versus autosomal position did not explain a large fraction of the variance in this study (only around 2 percent, controlling for other factors) but the deviation was in the right direction to help account for the low X chromosome divergence between humans and chimpanzees.

    Altogether in this study, a large fraction of variation in the human-macaque substitution variability could be explained by phenomena that affect the rate of mutations, including the structural and functional factors listed above as well as the corresponding homologous variability between mice and rats, and dogs and cattle. If these variations were explained by inbreeding in the human-macaque ancestral species, they would be random with respect to the dog-cow or mouse-rat divergences, and with respect to structural causes. So current estimates of the effective sizes of human-chimpanzee and other ancestral populations are almost certainly inflated. The amount of inflation is not clear, but a good estimate will require correcting for a large number of factors -- a complicated analysis.

    Since the date of the human-chimpanzee divergence depends on our assessment of the diversity within the human-chimpanzee ancestral population, it may be a while before we can settle the issue of human-chimpanzee divergence time. That may or may not provide hope for Sahelanthropus, Orrorin, and Ardipithecus kadabba -- all supposed hominids that would predate 5 million years ago, the current best genetic estimate of the human-chimpanzee divergence time. To be sure, if the date is simply in error, that error might encompass older dates consistent with a 7-million-year divergence. But I'm not sure we should believe that the error is biased toward an older divergence -- "error" might lean in either direction, and a younger species divergence remains possible.

    References:

    Tyakucheva S, Makova KD, Karro JE, Hardison RC, Miller W, Chiaromonte F. 2008. Human-macaque comparisons illuminate variation in neutral substitution rates. Genome Biol 9:R76. doi:10.1186/gb-2008-9-4-r76

  • FOXP2 is really recent, it really did introgress (if it's not contamination)

    Fri, 2008-04-18 10:34 -- John Hawks

    That's the thrust of a technical comment by Graham Coop and colleagues, now online in Molecular Biology and Evolution. The letter refers to the extraction of FOXP2 from two Neandertal specimens from El Sidrón, by Johannes Krause and colleagues, reported last year (I wrote about the paper here).

    First, the bad news. The current letter raises the prospect of contamination. Notably, the controls applied by Krause et al. (2007) may be relatively weak evidence against contamination, because of polymorphism within large human comparative samples. The tests rely on the assumption that there is little DNA from living humans in the samples. But if we cannot distinguish Neandertal from human DNA with great accuracy, then we will be mistaken some proportion of the time. Krause et al.'s test, based on derived human alleles absent from the Neandertal genome draft, can still go wrong if the human contaminants happen to have all the ancestral (non-derived) human alleles.

    Well, that seems to be the story these days with Neandertal DNA extraction. No test of contamination is good enough. (And remember, that every "test" of contamination is really a procedure for excluding the hypothesis that ancient sequences are identical to recent ones.)

    Now, the more interesting news. Coop and colleagues verify that the selective sweep affecting human FOXP2 was indeed recent -- they estimate 42,000 years ago:

    To demonstrate this, we estimated the time of the most recent common ancestor (tMRCA) of the selected haplotype (see Figure 1), using an approach sometimes called phylogenetic dating (Thomson et al. 2000; Hudson 2007). This method does not make assumptions about demography and selection, but only requires that the mutations in the intron be neutral or nearly neutral. Taking this approach, we obtained a mean tMRCA of 42 Kya (see SOM for details). While there is considerable uncertainty associated with this estimate, it is surprisingly recent if selection took place over 300 Kya (see SOM). In other words, the selective scenario proposed by the authors cannot account readily for patterns of variation in modern humans. Given that we have no power to detect a beneficial substitution that occurred over 250 Kya, (cf. Sabeti et al. 2006) yet we see a footprint of positive selection at FOXP2, the conclusion of a recent selective sweep at FOXP2 is not surprising (Coop et al. 2008:3-4).

    FOXP2 is in one of the ENCODE regions, so its variation is pretty well known. This is not a problematic case: it has a very limited amount of variation around it, and has a strong excess of rare alleles, both signs of a recent sweep.

    Coop and colleagues suggest that the beneficial human allele spread into Neandertals (or vice versa) by low levels of gene flow coupled with its selective advantage -- in other words, introgression.

    They do allow for an alternative -- perhaps the two amino-acid-coding mutations were not the target of selection, but instead some linked locus. This would not erase the necessity of gene flow from Neandertals, but would question whether this gene flow had involved the FOXP2-language scenario, since it might be some linked gene unrelated to language.

    (CORRECTION (2008/04/18): If selection were on a linked site, then Neandertals might share the human-derived amino acids as a result of ancient shared ancestry with humans, while the linked selected sweep might be absent in Neandertals, not necessitating any gene flow.)

    I doubt this hypothesis of a linked sweep, since the two sites with human-derived substitutions are otherwise very strongly conserved among mammals. This looks like a credible target for recent selection. But the hypothesis of selection on a linked site cannot presently be tested.

    So that's the story. It seems very likely that Neandertals got the language gene from us, or us from them, long after many other genes in the two populations diverged. I write "many" rather than "most" because we haven't really been able to assess the proportion of derived alleles shared by humans and Neandertals. The completion of the draft sequence may help, but I'm afraid that the specter of contamination is going to keep on being raised whenever a part of the Neandertal draft genome looks humanlike.

    (via Dienekes)

    References:

    Coop G, Bullaughey K, Luca F, Przeworski M. 2008. The timing of selection at the human FOXP2 gene. Mol Biol Evol (in press) doi:10.1093/molbev/msn091

  • Chicken introgression

    Tue, 2008-03-04 22:31 -- John Hawks

    Bees, dogs, and cattle have all provided interesting evolutionary stories this week. Now it goes to the chickens: A study by Jonas Eriksson and colleagues finds that introgression from grey junglefowl contributed to the gene pool of domesticated chickens:

    This study contradicts the assumption that the red junglefowl is the sole wild ancestor of the domestic chicken [5] and provides the first conclusive evidence that other species have contributed to the domestic chicken genome. We therefore propose that the taxonomy of the domestic chicken should be changed from Gallus gallus domesticus to Gallus domesticus to reflect the polyphyletic origin of chicken [27]. The emerging technologies for total genome resequencing can be readily employed to determine if other parts of the chicken genome also originate from other species of junglefowls. Such regions are expected to be enriched for functionally important variants, like yellow skin, because neutral sequences should have been diluted out during the extensive back-crossing that must have taken place after introgression. It is possible that the introgression of yellow skin was facilitated by the fact that it resides on a microchromosome (only 6.4 Mb in size) with a high recombination rate, which reduces the amount of genetic material affected by linkage drag.

    The need to reduce linkage to possibly disadvantageous genes around an introgressive allele is an important thing to consider, although breaking such an allele down to a 6 Mb block wouldn't take an terribly long time. The real question is why this trait in particular was brought into chickens -- whether it was linked to desirable pelage characters, or whether it may have had other advantages in survival or productivity under domestication.

    These two species are not known to hybridize in the wild.

    (via Blog Around the Clock)

    (also Greg Laden)

    References:

    Eriksson J, Larson G, Gunnarsson U, Bed'hom B, Tixier-Boichard M, Strömstedt L, Wright D, Jungerius A, Vereijken A, Randi E, Jensen P, Andersson L. 2008. Identification of the Yellow Skin gene reveals a hybrid origin of the domestic chicken. PLoS Genet 4:e1000010. doi:10.1371/journal.pgen.1000010

Pages

Subscribe to introgression

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.