john hawks weblog

paleoanthropology, genetics and evolution

introgression

  • Breeding nutritional Neanderwheat

    Fri, 2006-11-24 23:33 -- John Hawks

    On the topic of introgression, this article by Reuters' Will Dunham is a good illustration:

    A team led by University of California at Davis researcher Jorge Dubcovsky identified a gene in wild wheat that raises the grain's nutritional content. The gene became nonfunctional for unknown reasons during humankind's domestication of wheat.

    Writing in the journal Science on Thursday, the researchers said they used conventional breeding methods to bring the gene into cultivated wheat varieties, enhancing the protein, zinc and iron value in the grain. The wild plant involved is known as wild emmer wheat, an ancestor of some cultivated wheat.

    Introgression between domesticated crops and their wild relatives is one of the most common ways to introduce desirable traits into agricultural production. For the most part -- even when they are classified as different species -- domesticated crops can be crossed with wild progenitors.

    Emmer wheat is itself a tetraploid, presumed to be a hybrid of two wild diploid grasses. Tetraploidy (and other -ploidies) come to mind when talking about plant hybrids, because it often happens that new plant species originate from such crosses. There's nothing so exotic about emmer-domesticated wheat introgression, since wheat is also tetraploid. The ability to breed in characteristics is simple Mendelism:

    "We didn't do it by genetic modification. The normal wheat crosses perfectly well with the wild wheat. So we just crossed it after normal breeding," Dubcovsky said.

    Breeders can take the selection coefficient all the way up to 100 percent if they want, and so ensure the fixation of a desirable allele like this one. Natural popluations usually don't have this privilege -- and advantageous alleles are often lost, despite being favored by selection. A process carried out methodically by breeders reduces to a scattershot in nature. But it is an orderly scattershot in one way: the most strongly selected alleles have the highest chance of making it.

  • Neandertal genome FAQ

    Fri, 2006-11-17 20:04 -- John Hawks

    With the release of the initial two papers describing chromosomal DNA sequences from a Neandertal, I thought I would put together some frequently asked questions and answers to them. I actually have been frequently asked most of these questions this week -- mostly by journalists -- so I think this is a good list.

    I'll be following up over the next few weeks with additional details, particularly as some of our own work moves forward. I've left some loose ends dangling here deliberately -- sometimes for the sake of brevity, in other cases because they await further developments.

    UPDATE (11/17/2006): I'm editing through this, making changes here and there to make things clearer. So as this progresses, it won't be identical to the initial version, although changes will be minor.

    There are two papers in two journals, by two different teams of people. What's the difference?

    Both teams used samples from the same specimen, Vindija (Vi) 80 -- so in principle, they are sequencing the same genome. The difference between the two comes from their methods of sequencing the DNA.

    The Rubin group (Noonan et al. 2006) is using a metagenomics method based on the creation of a clone library from the ancient DNA. To make a clone library, DNA from a sample is cut with a restriction enzyme, which cuts the DNA at every place that it displays the same short sequence (usually 4- or 6-bp sequences, such as "ATTA"). The short fragments of DNA are mixed together and bound to vectors that can be maintained and replicated in cells. This is the "cloning" process, and the "library" consists of all the short fragments, which (hopefully) overlap each other so that they can be reconstructed.

    People have made libraries for a long time. For example, the entire mRNA complement in a given tissue type may be made into a library of complementary DNA (cDNA). Once the library is made, it can be probed with short, labeled DNA sequences to assess whether a given gene is expressed in that tissue type. Or contrariwise, after cDNA from the library is sequenced, it can be used to design probes to find where in the genome it came from.

    The unique aspect of the metagenomic approach is that all DNA sequences from a sample will be included in the library, potentially seqeunced, and ultimately reconstructed with computers into separate genomes. Usually cloning is preceded by an amplification step (generally using PCR), which selects and amplifies DNA of particular interest for cloning. But metagenomic methods skip this amplification -- because they cannot predict in advance what they are looking for. One of the most important early applications of metagenomics has been to reconstruct the genomes of microbes that cannot be cultured. Even though these organisms are not amenable to keeping in laboratory colonies, their genomes can be reconstructed by sampling their environments -- for example, soil or pondwater.

    Or fossils. For the Vindija 80 fossil, the extract includes only around 6 percent identifiable "primate" DNA sequences. Out of the roughly 20 percent that are identifiable at all, over half are microbial.

    I suppose if you were interested in the long-term microbial decomposition of fossil bone, you could do your disseration on those. For the rest of us, the final step is to let the computer spit out the humanlike sequences, which are assumed to be the Neandertal DNA plus some proportion of human contamination.

    In contrast, the 454 group (Green et al. 2006) used a method called bead-based emulsion PCR. That is a mouthful, so it bears some explanation (for which I'm paraphrasing material from Margulies 2005 and Ronaghi 2001).

    The "polymerase chain reaction," or PCR, is a method of replicating many copies of a DNA sequence from a single template. Usually to do PCR, you design a "primer," which is a short sequence of DNA that causes the target sequence to be preferentially replicated by the DNA polymerase. With a number of heat cycles and sufficient primer, you end up with a whole lot of copies of just the piece of DNA that you want.

    This is, of course, exactly why standard PCR is so problematic for ancient sequences. There, you can't get exactly what you want, because it is broken into tiny bits and damaged. You would be happy to get anything. But if you amplify everything together in one giant vat, then the less damaged sequences will be the ones that amplify preferentially, and these are going to be worthless to you because they all represent contaminants of various kinds, like microbial DNA or modern human sequences.

    The 454 method attaches all the tiny bits of sequence to tiny beads and separates these beads into oil droplets within a water suspension. The oil droplets are the "emulsion" part, and by separating them in this way, the process can employ PCR while keeping all the tiny sequences seperate from each other. Because they are kept separate, one good sequence can't swamp out all the others in the solution. The PCR products all stick to the bead so that after they come out of the emulsion the copies of different sequences are still separate.

    After PCR, the DNA is broken down into single strands, still attached to their beads, and the beads are deposited on a fiber-optic slide assembly. The slide has tiny wells that are optically connected to a light-sensing CCD, which is essential for the "pyrosequencing" step. Nucleotides flow across the slide and into these wells one after another (T, A, C, then G). When the DNA polymerase connects one of these nucleotides to the single-strand DNA in a well, it releases a molecule of pyrophosphate (PPi).

    That's when the magic happens. The solution also contains luciferase -- the enzyme that makes fireflies glow. With some additional chemistry, the PPi gives a burst of energy to the luciferase, which then emits a spark of light. The CCD picks up the light, which is a signal that the nucleotide was incorporated into the sequence.

    Since nucleotides are added only every few seconds, a clever person with a notebook could reconstruct the sequence of the DNA fragment in each well. The real trick is that the fiber-optic slide contains well over a million wells, all being sequenced simultaneously. As the CCD picks up the series of flashes from every cell, the system is tracking many megabases of DNA in every run.

    At present, this is the fastest method of DNA sequencing on the planet. It can construct the complete genome of a microbe in a couple of hours.

    If the 454 sequencing method is so much faster, then why would anybody ever want to build clone libraries?

    The claim is that the library approach is superior as a way to probe for specific genetic loci. For instance, here's a passage from p. 1071 of the Pennisi article:

    [Rubin] envisions several libraries, each from a different Neandertal. Researchers would pull out the same fragment from each library to compare with each other and with living people. A pilot project has already demonstrated probes that ferret out specific target sequences, so the team needn't analyze the billions of bases shared by Neandertals and living humans, or among different Neandertals. "We will be able to identify and confirm sequence changes in more than one Neandertal without having to sequence several Neandertals to completion," Rubin says. "Seeing the same change in multiple Neandertals will give us confidence that we got [the sequence] right.

    This sounds similar to the study earlier this year that found Mc1r variants in different mammoths, but in fact that study used direct PCR rather than cloning (I suppose because they have a heck of a lot more mammoth tissue to work with!).

    It's not obvious to me that this is really that much of an advantage. I mean, it's certainly true that we really want to sample some genes (like MCPH1) from several different Neandertal fossils. But I don't see any point to drilling into fossils for this purpose without also sequencing their full genomes.

    Now, somebody will say, "Well, sequencing the full genome of every fossil is just too expensive. We can limit to work on just a few genes much more cheaply, and we can use the same samples later to sequence other genes, or whole genomes."

    Personally, I don't see the rush. These fossils were in the ground for 40,000 years, and they're not going anywhere. If we can sequence whole genomes cheaply in 10 or 20 years, and additionally have better means of dealing with contamination, I don't see why we just shouldn't wait. Training graduate students in metagenomics is not a good enough reason to work on these rare fossils.

    One may say that the same samples will be sufficient for later sequencing of whole genomes, or other genes, or Neandertal athlete's foot fungus, or whatever, but in my experience it somehow never works out that way. Somebody is always coming back to grind up, dissolve, or laser ablate more bone.

    In fact, if I were looking to make the next advance in metagenomics, I would take some of that mammoth flesh, mix in some elephant blood, and find ways to resolve the parts of the resulting mix. That would be something.

    Are you saying you are against destructive sampling of these fossils?

    Not at all. In fact, I think that genomics gives the most compelling reason ever for grinding up more bones.

    There is just a huge quantity of information from DNA sequences; far more than from the morphology -- especially for samples like bone fragments or isolated teeth.

    Heck, if the devil came to me and said I could have the full genome sequence of every fossil if I would agree to their destruction, I think that would be a good bargain!

    But it's pretty clear that we're not in that situation. We can have our cake and eat it too -- and the longer we wait, the cheaper and less destructive this is likely to be. And frankly, just one Neandertal genome is going to give us plenty to work on for a long time.

    But then, I was trained as a fossil guy, and I'm used to working with a few bits and pieces. It gives me a natural advantage!

    They say there's no significant evidence of interbreeding. Yet you told us last week that there is significant evidence of interbreeding. What gives?

    A few years ago I gave a talk where I laid out what I saw as the problems interpreting nuclear DNA sequences from Neandertals. Now, this was long before we had any reasonable prospect of getting such sequences, so it was purely based on knowledge about human genetic variation. As I saw it then, there were two problems:

    1. Human mtDNA is really variable, with greater than 1 percent sequence divergence between people, and much higher in some places. In contrast, human nuclear DNA has less than one base pair in a thousand different between copies. To get a reasonable picture of variation among people, you need long nuclear sequences so that you will find polymorphisms. But ancient DNA is broken into short little sequences that are very difficult to reconstruct. With mtDNA, this is less of a problem because it is clonal and a person basically has one sequence in many copies. But most nuclear DNA (all autosomal DNA) exists in two, possibly different copies. So reconstructing long enough sequences to study polymorphisms is very difficult.
    2. The coalescence age of human mtDNA is only a couple hundred thousand years, so sampling ancient humans is sort of likely to result in sequences that lie outside this range of variation -- and with Neandertals, that is precisely what happened. But nuclear loci have coalescence ages on the order of 600,000 to 2 million years or older. With these dates, the diversity among living people must significantly predate any divergence of archaic humans for most nuclear genetic loci. This means that Neandertals ought to have shared a high proportion of polymorphisms that are still variable in humans. Since we can expect that Neandertals will not be very genetically divergent for these nuclear genes, compared to the genetic differences among living people, we can conclude that no gene is likely to tell us very much about the phylogenetic relationships of an ancient Neandertal with living people.

    These two problems are still stumbling blocks for interpreting Neandertal sequences. But the research teams found a very clever way to circumvent them, by using genomics approaches instead of genetic approches.

    If you've been scratching your head wondering exactly why "genomics" has a buzz, then this is a good example.

    Because of projects like the HapMap and the chimpanzee genome project, we know a lot (not everything, but a lot) about human genetic polymorphisms and our genetic differences from chimpanzees. In fact, we have databases of human single nucleotide polymorphisms (SNPs), and human-chimpanzee comparisons. For each SNP, some humans have an ancestral nucleotide -- generally the one that chimpanzees have. Other humans have a derived nucleotide -- the one that appeared in some ancient human, and different from chimpanzees.

    For the most part, derived SNP alleles are recent. A few of them are very old, and these tend to be found at high frequencies (because the person who originated them had lots of descendants in that time). But many more of them are recent, found in a relatively small number of people today, who descend from a common ancestor during the past couple hundred thousand years.

    If Neandertals diverged from humans over 200,000 years ago, and they didn't interbreed after that time, then the Neandertal genome should have relatively few derived human SNPs. In contrast, if the two populations continued to interbreed after 200,000 years ago, they might share fairly many of these derived SNPs.

    Hence, we have a potential test for Neandertal-human genetic interactions.

    Noonan et al. (2006) looked for these derived SNPs and found very few of them. They concluded that there was no significant evidence of Neandertal-human interbreeding, although their statistical test couldn't rule out as much as 25 percent admixture (for reference, Plagnol and Wall 2006 estimated only 5 percent ancestry from all archaic humans, not only Neandertals).

    Green et al. (2006) also looked for derived SNPs. They had a much bigger sample of DNA to work with, so they ought to have a stronger test. Here's what they wrote (p. 334):

    Using the SNPs that overlap with our data from two large genome-wide data sets (HapMap, 786 SNPs and Perlegen, 318 SNPs), we find that the Neanderthal sample has the derived allele in 30% of all SNPs. This number is presumably an overestimate since the SNPs analysed were ascertained to be of high frequency in present-day humans and hence are more likely to be old. Nevertheless, this high level of derived alleles in the Neanderthal is incompatible with the simple population split model estimated in the previous section, given split times inferred from the fossil record. This may suggest gene flow between modern humans and Neanderthals. Given that the Neanderthal X chromosome shows a higher level of divergence than the autosomes (R.E.G., unpublished observation), gene flow may have occurred predominantly from modern human males into Neanderthals. More extensive sequencing of the Neanderthal genome is necessary to address this possibility.

    If this observation holds (i.e., if it is not influenced by contamination, and the ascertainment function does indeed show this to be an excess of derived SNPs), then it is one of the strongest pieces of evidence for genetic intermixture of Neandertals and modern humans. Note that there are two avenues for this gene flow -- either from the ancient ancestors of modern humans into Neandertals, or out of Neandertals into early modern humans. I'm sure we will hear more about this when they have more sequence.

    In the meantime, the other source of evidence about Neandertal-human genetic interaction is the genomic variation of living people. Last week's paper on MCPH1 (discussed here) is a good example of what that evidence looks like. The key feature is that if you troll through the genome, you begin to notice some loci with interesting genealogies. The interestingness is a combined signature of recent selection and ancient population structure.

    Looking for genes like MCPH1 in the Neandertal genome is a no-brainer. We probably won't find a lot of them, because the Neandertals were a small subset of the ancient human population.

    There is one further problem. We can recognize these interesting loci in living people because they lie on relatively long haplotypes with little recombination. The inference is that such an allele must have begun from a very low copy number around 30,000 years ago, presumably because it was introduced from some archaic population. But the SNPs that are presently linked to the selected site were probably polymorphic within the archaic population, not fixed on a long haplotype. Unless we know exactly which SNP is the selected site on a human allelic variant, we may have some trouble telling whether an archaic genome has the allele. And as I note below, a large proportion of SNPs are going to be missing from the draft Neandertal genome even when it reaches an average 1x coverage.

    This just means that evidence from the genomics of living people and from the Neandertal genome won't mesh together seamlessly. There remains some complexity interpreting these relationships.

    The divergence date of Neandertal and human sequences is estimated at around 520,000 years ago. What does that mean?

    First, what it doesn't mean. It doesn't mean that the human and Neandertal populations diverged 520,000 years ago. I noted above that the estimate of the genetic divergence time comes from the proportion of chimpanzee-human differences for which the Neandertal shares the human allele. But of course, some living humans have the ancestral, chimpanzee-like allele for many polymorphisms, so this comparison of polymorphisms is not saying that Neandertals were like chimps. Instead, we are just disregarding the Neandertal-specific evolutionary events.

    I'm sticking with the 520,000 year genetic divergence estimate from Green et al. (2006), instead of the older estimate from Noonan et al. (2006), because of the vastly larger sample in the Green paper. Still, most of the discussion does not hang too critically on the precise date; although the date changes the interpretation by degrees.

    The real interesting observation is the Neanderal-human genome draft difference compared to the human-human difference. Here's a passage from p. 354 of Green et al. (2006):

    We analysed the DNA sequences generated from a contemporary human using the same sequencing protocol as was used for the Neanderthal. Although ancient DNA is degraded and damaged, this comparison controls for many of the aspects of the analysis including sequencing and alignment methodology. In this case, 7.1% of the divergence along the human lineage is assigned to the time subsequent to the divergence of the two human sequences. The average divergence time between alleles within humans is thus 459,000 years with a 95% confidence interval between 419,000 and 498,000 years. As expected, this estimate of the average human diversity is less than the divergence seen between the human and the Neanderthal sequences, but constitutes a large fraction of it because much of the human sequence diversity is expected to predate the human-Neanderthal split. Neanderthal genetic differences to humans must therefore be interpreted within the context of human diversity.

    They don't specify where this "contemporary human" was from. The draft human genome is a chimera made up of anonymous people from different populations. That means that wherever the "contemporary human" is from, it will be the same region as represented by some part of the draft genome, but not all. So the divergence between these two mystery sequences is likely to be greater than average within a single population, and less than average between different populations.

    Keeping that in mind, the human-Neandertal difference is startlingly close to this human-human difference measurement. The Neandertal is only 10 percent more different from the draft human genome than these two human sequences are from each other.

    It seems very likely that we will find pairs of living human populations where the average genetic divergence is older -- maybe much older -- than this human-Neandertal divergence. For instance, it seems almost certain that the great genetic variability among living African groups will exceed this human-Neandertal difference.

    Some geneticists have noted that European and Asian populations seem to be a genetic "subset" of African populations, at least for many genetic loci. With these kinds of numbers, it looks like Neandertals may be a subset of living human diversity in the same sense. I've never much liked that formulation, because "subset" is never really an accurate description of the genetic relationships. But if the seat of living human diversity is Africa, adding Neandertals to the mix may not change that pattern at all.

    As Green and colleagues note, most of the genetic divergence between humans and Neandertals, and between humans and other living humans, is actually much older than the divergence of these populations from each other.

    At one limit (that is, assuming complete isolation of humans and Neandertals after some date), the population divergence time depends on the effective size of the population that was ancestral to living humans and Neandertals. It is basically not possible to obtain a good estimate of this ancestral effective population size from the current Neandertal data -- mainly because good estimates depend on heterogeneity in divergence times among loci, which we can't infer for the short Neandertal sequences.

    Both papers assume that this ancestral effective population size was small -- even smaller than the long-term human effective population size of around 10,000 individuals. A smaller effective size for the human-Neandertal ancestral population is fairly unlikely, though, since it must have been distributed across large parts of Europe and Africa at a minimum. More likely, the effective size was close to 10,000, just as in humans, since the human effective size is inferred to have been that small over at least the past million years.

    If you're reading the term "effective population size" for the first time, don't worry. It doesn't mean "population size", and it has mainly a technical genetic meaning. It is sort of important that the Neandertal sequence supports this particular effective size over the long term, but it will take another post to explain why.

    As noted above, the populations may never have been isolated. The derived SNP evidence might suggest that there was never any population divergence, or at least no long period of complete isolation, between humans and Neandertals. We'll have to wait and see.

    Why does this bone have such a low level of contamination compared to other Neandertals?

    I should start by pointing out that "contamination" here means "modern human sequence". All fossil bones are loaded with exogenous DNA, like bacterial and fungal genomes that invaded after the animal died. From a certain point of view, those exogenous genes are contaminants -- we are generally not interested in their sequences, and sorting them out from the endogenous Neandertal DNA is a real nuisance. But because we have a reference genome from humans to compare with the sequences from the ancient bone, we can sort out these bacterial and other exogenous sequences. So although they do "contaminate" the bone, they don't distort our picture of the sequence.

    The real problem is that there are contaminating sequences from recent humans in the ancient bones. These sequences come from excavators, anthropologists who studied the bones, museum personnel, graduate students who cleaned and prepared the bones for sequencing, other samples from the labs doing the work, and who knows where else.

    I have been asked many times why they can't eliminate this contamination. For example, why can't they just clean the bone, or take samples from deep inside the bone, or take samples from deep inside of teeth, or use a clean room, yada yada yada.

    The answer is that they do wash the bones, and they do eliminate the outer surface, and they do take samples from deep inside of bones, and they do work in a clean room, with ultraviolet lights and positive air pressure so that DNA can't get sucked into the room, and rubber gloves and bunny suits, and the whole nine yards. And the bones are still contaminated, deep inside them.

    Now, you may imagine anthropologists picking their noses with the bones, and using them as chopsticks, and putting them up to their ears to hear them breathing, and all manner of other things. The truth is, I have no idea how the contamination gets in there, and neither does anybody else. It's just there, and apparently we can't avoid it.

    The extraction team looked at lots of Neandertal specimens, with one question in mind: How much human contamination does this bone have? To answer this question, they amplified mtDNA sequences, and assessed what proportion of transcripts were Neandertal-like and what proportion were human-like. Vindija 80 stood out as having a very low proportion of human-like transcripts -- less than 2 percent. So they inferred that there was little contamination of the sample by recent human DNA, and are working under the assumption that the nuclear genome is contaminated in a similar low proportion.

    As for why this particular bone has such low contamination, well, nobody really knows that either. Svante Pääbo speculates that it is because Vi 80 was originally identified as fauna and hasn't been handled much. He might well be right. Which would bring us back to the nose-picking chopstick bone theory, I suppose.

    If Vindija 80 was put in a box with fauna, it can't be very diagnostic. This high preservation seems very unusual. How do they know it was a Neandertal?

    The radiocarbon date is 38,310 +/- 2130, and they found very high preservation of a Neandertal-like mtDNA sequence. If you think that fails to answer the question, well...

    How can they deal with the damage to ancient DNA sequences?

    One of the things that has become clear about ancient DNA research is that DNA from ancient fossils undergoes various kinds of damage. The most obvious is the fragmentation of the DNA into very small pieces, a problem that both the sequencing approaches have been designed to circumvent.

    But a more serious problem is that some bases become degraded over time in ways that cause the sequencing methods to misidentify them. For example, cytosine (the "C" base) can be chemically modified over time into a base called uracil, which sequencing methods misidentify as a thymine (the "T" base).

    There seems to be no way to tell which base pair changes are diagenetic (i.e. DNA damage-induced) and which are genuine Neandertal changes.

    So, the teams took a radical approach: just ignore all the changes that are possibly damage. Instead of analyzing Neandertal-specific changes, they decided to assess the status of human polymorphisms and human-chimpanzee differences in the Neandertal seqeunce. This method is how they estimated the Neandertal-human genetic divergence time, for example -- because the Neandertals have approximately 96 percent similarity with humans for human-chimpanzee genetic differences, it is possible to infer that their genes diverged from the average human gene only 4 percent of the evolutionary time separating humans and chimpanzees. The research teams assumed that humans and chimpanzees are separated by 13 million years of evolution -- this includes the time on both the human and chimpanzee lineages since their common ancestor, assumed to be 6.5 million years ago. These dates and genetic differences produce an estimate of around 520,000 years ago for human-Neandertal genetic divergences.

    In the long run, it should be possible to sequence the genome with multiple coverage, which would allow damage to be resolved. With many copies, the damage to any individual DNA sequence will be unique, while changes that are evident in multiple copies must probably be real.

    But we are quite a ways from the long run, so for the time being we have to deal with DNA damage. For individual genes, it may be possible to reason exactly what effects changes would have and thereby arrive at a conclusion about which changes are diagenetic. For instance, only a minority of such changes will affect coding regions, and some of those will be synonymous changes, so only a small proportion will make amino acid changes, and if there are only a couple of these per gene the resulting protein structure may be able to be analyzed. So from a functional perspective, it should be possible to work with damaged sequence.

    The main problem is from the statistical perspective (i.e., assuming neutrality), and here I think the teams have taken a very reasonable approach by just throwing the changes out.

    Will they really be able to sequence the full Neandertal genome in two years?

    I got a lot of questions from journalists on this point. I really see no reason to doubt it -- they know their average sequence yield from a given amount of extract, and the proportion of that yield that is actually Neandertal DNA.

    The main caveat is a statistical one: 3 billion base pairs of sequence is -- on average -- one full coverage of the genome, but in practice some loci will be sequenced many times, while a fairly large proportion (a bit over 30 percent) won't be sequenced at all.

    A billion missing bases may not seem like a big deal, but there is a catch: the short average fragment size means that the missing patches will be distributed throughout every gene. Since the average gene covers a region of a few kilobases, complete gene sequences will be pretty rare -- most will have gaps in them amounting to around 30 percent of their length.

    Or to put it another way, a bit more than 30 percent of informative SNPs in humans will not be represented in the first Neandertal genome draft.

    A second issue is that the genome of Vindija 80 is not haploid -- there are two copies of most everything in that bone. Some of these copies were polymorphisms in Neandertals, and if these are reconstructed into a single sequence, there will be mixed-up haplotypes. This means that it will be difficult, if not impossible, to assess whether there were functional multi-SNP differences between the human and Neandertal sequences of particular genes.

    Anyway, that's probably getting beyond ourselves. No doubt somebody will think of some way to improve these problems; and it will eventually become cheap enough to do 10x coverage instead of 1x coverage.

    They're already making plans to clone Neandertal super-soldiers, aren't they?

    Maybe unsurprisingly, this question about Neandertal cloning is the one most journalists so far have wanted to ask me. I'm sure they're asking everybody, hoping that somebody will slip a really pithy quote for them.

    Since I have clones here at home, I can't bring myself to get to worked up about it. A Neandertal clone army would definitely be an improvement over a Neandertal Jar-Jar.

    Personally, I have another problematic scenario in mind, which I am developing elsewhere.

    References:

    Green RE, Krause J, Ptak SE, Briggs AW, Ronan MT, Simons JF, Du L, Egholm M, Rothberg JM, Paunovic M, Pääbo S. 2006. Analysis of one million base pairs of Neanderthal DNA. Nature 444:330-336. DOI link

    Lambert DM, Millar CD. 2006. Evolutionary biology: Ancient genomics is born. Nature 444:275-276. DOI link

    Margulies M and 55 others. 2005. Genomie sequencing in microfabricated high-density picolitre containers. Nature 437:376-380. DOI link

    Noonan JP, Coop G, Kudaravalli S, Smith D, Krause J, Alessi J, Chen F, Platt D, Pääbo S, Pritchard JK, Rubin EM. 2006. Sequencing and analysis of Neanderthal genomic DNA. Science 314:1113-1118. DOI link

    Pennisi E. 2006. The dawn of stone age genomics. Science 314:1068-1071.

    Römpler H and 8 others. 2006. Nuclear gene indicates coat-color polymorphism in mammoths. DOI link

    Ronaghi M. 2001. Pyrosequencing sheds light on DNA sequencing. Genome Res 11:3-11. Abstract

    Schloss PD, Handelsman J. 2003. Biotechnological prospects from metagenomics. Current Opinion in Biotechnology 14:303-310.

  • Introgression, Neandertals, and species concepts

    Sun, 2006-11-12 22:37 -- John Hawks

    A key issue (at least for some paleo folks) is whether the term "introgression" gives aid and comfort to the idea that Neandertals were a distinct species from us. To the extent that we rely on hybrid zones to account for the interaction, it sure looks like we are talking about the interaction of different species. If we are really talking about subspecific interactions, then we shouldn't really be using the term "hybrid".

    Even Wikipedia describes introgression as the movement of a gene "from one species into the gene pool of another" by backcrossing.

    Now, what do we know about whether Neandertals and modern humans were different species?

    1. Speciation in primates, from commencement of prezygotic isolation to full postzygotic isolation, has taken between 1 and 4 million years to occur, considering pairs of living primate sister taxa (Curnoe et al. 2006).
    2. Mitochondrial DNA suggests that modern humans and Neandertals derived from a single ancestral population at most 250,000 - 500,000 years ago (the population divergence time consistent with a 350,000 - 700,000 year genetic divergence).
    3. Craniometrics suggest that Neandertals and modern humans were more different than many primate subspecies pairs (Harvati et al. 2004).
    4. Nonmetrics suggest that archaic Homo populations were no more genetically differentiated than human races (Hawks and Wolpoff 2001).
    5. Early Upper Paleolithic Europeans had a relatively high proportion of traits otherwise common in Neandertals.

    I could go on with a few more, but you get the point: Despite their morphological idiosyncracy, genes and comparisons with other primates reject the hypothesis that modern humans and Neandertals were reproductively isolated. In that context, the morphological differences among archaic humans are (presumably) largely adaptive, and the reason that modern humans don't look like archaic humans is a matter of their different adaptations.

    But if we aren't talking about different species of Homo, at least not in the sense of complete reproductive isolation, then why are we talking about introgression?

    The thing is, introgression and species boundaries have emerged as different problems in the literature on genetics and biogeography.

    For example, here's a passage from Dowling and Secor's (1997) review of introgression in animals:

    Hybridization is defined as "the interbreeding of individuals from two populations, or groups of populations, which are distinguishable on the basis of one or more heritable characters" (Harrison et al. 1993, p. 5), and introgression is "the permanent incorporation of genes from one set of differentiated populations into another, i.e., the incorporation of alien genes into a new, reproductively integrated population system" (Rieseberg and Wendel 1993, p. 71) (Dowling and Secor 1997:595).

    It is worth noting that this definition involves populations that could be defined as phylogenetic species -- populations differentiated by at least one morphological character. Of course, phylogenetic species are not evolutionary or biological species, but concerning the definition of fossil taxa like Neandertals, this is precisely the point at issue!

    Another passage from Rhymer and Simberloff (1996:84) approaches the question from the standpoint of conservation genetics:

    We define "hybridization" as interbreeding of individuals from what are believed to be genetically distinct populations, regardless of the taxonomic status of such populations. "Hybridization" most commonly refers to mating by heterospecific individuals but has been applied to mating by individuals of different subspecies and even of populations that, though not taxonomically distinguished, differ genetically. Arnold et al. (1991) suggest restricting "hybrid" to matings between species and using "intergrade" for matings between subspecies and "cross" or "interbreed" for matings between individuals of geographically distinct populations. Although such distinctions might clarify future discussions, all these terms seem so widely used in the literature for matings at every taxonomic level that they are unlikely to be restricted. Instead one must depend on accurate taxonomic description of the entities between which mating occurs.

    Introgression is gene flow between populations whose individuals hybridize, achieved when hybrids backcross to one or both parental populations. Beyond F1 hybrids, the point at which an individual is no longer viewed as a hybrid but rather as a member of one of the parental populations that has undergone introgression is arbitrary. A hybrid swarm is a population of individuals in which introgression has occurred to various degrees by varying numbers of generations of backcrossing to one or both parental taxa, in addition to mating among the hybrid individuals themselves. Hybridization need not be accompanied by introgression; for example, offspring of hybrid matings might all be sterile. Introgression can be unidirectional, with backcrossing to one parental population only (Rhymer and Simberloff 1996:84, citations omitted).

    From these passages, it becomes clear why "introgression" is used so broadly: Biologists still don't agree on what constitutes a species! This should be no surprise -- the species problem is one of the fundamental issues in biology. But it is useful to remember that fossil species are not an exceptional case.

    The problem is not with defining "hybrid" or "introgression." The problem is with defining species.

    The different definitions of the term "hybrid" evident in those passages also carry a lot of baggage. For the conservation geneticist, "hybridization" may mean something more or less undesirable -- something that ought to be avoided. From the point of view of defining species, "hybridization" ought to be unusual -- out of the ordinary. From the point of view of evolutionary genetics, "hybridization" may just mean reticulation -- a process making it possible for genes to move between populations that are more or less isolated. It is not just very common to talk about trans-subspecies matings as "hybridization" -- it is ubiquitous.

    And for that matter, the classical genetics definition of "hybrid" has nothing whatever to do with species. Remember hybrid corn? Mendel's peas? Hybridization is about crossing lines maintained by selection. And lest we forget the etymology of "hybrid", the original Latin hybrida was the offspring of a tame sow and a wild boar. In other words, all this disagreement about the relevant taxonomic level for "hybridization" is highly subject-specific, and emerges from the conservation literature rather than from genetic principles.

    I would make two observations. First, the threshold for "introgression" is arbitrary. For example, Ellstrand et al. (1999) define "introgression" as the gene flow between taxa (implying species), but discuss it mainly in connection with introgression from domesticated to wild plants, where the "species" distinction is based on the history of domestication. In the conservation literature, "introgression" concerns the detection of "alien genes", largely from invasive or cosmopolitan species (e.g., mallard genes entering American black duck populations). In the last several years of journals like Molecular Ecology there have been one or two papers per issue dealing with introgression between natural populations of animals -- mainly documenting the apparent movement of alleles between classical subspecies and morphospecies.

    References to introgression are accelerating in part because of the prominent role of mitochondrial systematics in the 1990's -- people are discovering that mtDNA phylogenies don't tell the whole story of gene flow between wild populations. This is no surprise at all from an evolutionary perspective, but it has pretty clear application to the systematics of Homo, where much (so far) has ridden on the proposition that mtDNA is an accurate guide to population histories.

    My second observation is that the movement of adaptive alleles from one population to another is especially likely to take the form of introgression. Genes under selection doesn't respond to population boundaries in the same way as neutral genes. The way that most people have framed the issue of the archaic-modern transition is in terms of neutral genes and population movements. But this is a poor model for the behavior of adaptive genes. This means that most people's notion of ancient population dynamics is different from the expectations of population genetics. Like the problem defining "hybrids", the mismatch of models and theory is deeply rooted in the species problem: If you think Neandertals were a different "species" from moderns, then you probably think it must follow that there was no "important" genetic interaction between the two populations.

    Genetics over the past couple of decades has shown that species "boundaries" are permeable, that postzygotic isolation in mammals takes millions of years, that the flow of adaptive alleles across species boundaries in mammals is ubiquitous, and that reticulate evolution between mammalian genera is far from rare.

    We could just conclude (as some of my readers have) that biology just got the species problem "wrong", and that we should be talking about subspecies instead of species. Maybe we should limit species to "really, really" isolated populations, or populations that "diverged at least 4.5 million years ago", or some other metric. There may be a lot of truth in that, but if wolves and coyotes are subspecies, cattle and bison are subspecies, and all baboons are subspecies, then I think we have to abandon the idea that species are a meaningful unit of adaptation! More to the point, most biologists use subspecies to mean "allopatric", or at least "peripatric" populations, yet hybridization and introgression commonly occur among sympatric (yet partially isolated) populations.

    (UPDATE: A reader let me know that it sounds like I am actually proposing that wolves and coyotes are subspecies here. Quite the opposite -- wolves and coyotes are good species for reasons of their clear adaptive differences in sympatry. My -- possibly botched -- point is that the problem is not that the species concept is wrongly applied here; the problem is that the correct application of the species concept still gives us species that interbreed a lot! If you try to fix the problem by applying a different species concept, then we end up with a lot of very strange looking "subspecies".)

    I take a different tack. There will never be any tidy solution to the species problem, because all species have unique evolutionary histories and constraints. Given these difficulties, the species status of archaic Homo populations is basically an intractable problem. That is, I am happy to suggest that archaic Homo populations correspond to classical subspecies, and as far as I know, no evidence strongly contradicts that position. But I can recognize that some people will never agree with this assignment. And from the perspective of their evolution, it just doesn't matter. Evolutionarily important gene flow occurs between mammal species, subspecies, and populations.

    As you can probably tell, I have become greatly disgusted by the species problem. My reasons for this extend beyond the present discussion, but in any event I think it is a hopeless task to build any kind of consensus about the nature of fossil species.

    So we have to begin by identifying patterns of interaction and gene flow. Introgressive gene flow is then a category of gene flow between differentiated populations. In particular, introgression is extensive (as opposed to merely local) and permanent (as opposed to ephemeral). Because of this, the pattern of introgression is fairly likely to involve adaptive alleles, but it need not do so. However, a widespread signature of interbreeding in neutral (or even deleterious) alleles is very likely to reflect a higher level of gene flow than would usually be indicated by "introgression". Is this a distinction without a difference? I think it's a pattern, and one that has now been replicated by several genes. It remains to be seen if it is the dominant pattern, or whether a broader pattern of genetic similarities will emerge -- but keep in mind that I think another pattern is also at play that will help to explain much.

    Finding evidence for introgression in genes like MCPH1 is basically the operational procedure by which people are now looking for introgression in natural populations -- with one exception: for extant populations, we can test the genes of both populations directly. For extinct archaic populations, we can have evidence of introgression only by inference, which means that we will likely miss many true instances of gene flow from archaic humans. This does raise the risk of valuing "introgression" more substantially than it may "deserve" -- in particular, that adaptive alleles like MCPH1 will get a lot more attention than other genes that may have more ambiguity.

    But I think that evidence of introgression reinforces the hypothesis that modern humans emerged in an adaptive context, making use of adaptive variation from a widespread (possibly pan-Old-World) population of archaic Homo. It's one of the two main patterns in the evolution of modern humans.

    References:

    Harrison RG. 1993. Hybrids and hybrid zones: historical perspective. In: Hybrid zones and the evolutionary process, ed. Harrison RG. pp. 3-12. Oxford University Press, Oxford UK.

    Rieseberg LH, Wendel JF. 1993. Introgression and its consequences in plants. In: Hybrid zones and the evolutionary process, ed. Harrison RG. pp. 70-109. Oxford University Press, Oxford UK.

    Dowling TE, Secor CL. 1997. The role of hybridization and introgression in the diversification of animals. Ann Rev Ecol Systemat 28:593-619.

    Ellstrand NC, Prentice HC, Hancock JF. 1999. Gene flow and introgression from domesticated plants into their wild relatives. Ann Rev Ecol Systemat 30:539-563.

    Rhymer JM, Simberloff D. 1996. Extinction by hybridization and introgression. Ann Rev Ecol Systemat 27:83-109.

    Synopsis: 
    I don't view Neandertals as a distinct species, yet still think "introgression" is a useful way to refer to gene flow from them into recent humans.
  • Why introgression?

    Fri, 2006-11-10 12:22 -- John Hawks

    I heard from a long-time correspondent this morning concerning introgression of microcephalin from archaic humans. I'm not sharing the whole message, but I thought it would be worth paraphrasing a key point for some thought.

    The basic point is this: Why are we talking about "introgression"? Why isn't this just gene flow?

    Let me start by saying this: "Introgression" is a useful term because it conveys a genetic reality, regardless of the taxonomic rank we are talking about. The literal meaning is "moving into", and what we are talking about is an allele moving into a new population. But more than that (and what distinguishes the term from gene flow) we are talking about an allele moving onto a new genetic background. The "genetic background" implies that there might be constraints on the movement of such an allele coming from epistasis or negative effects of linked alleles.

    I think it is especially useful in the case of MCPH1 because we are interested in the clear positive selection of this allele as a contrast to the clear decline in frequency of most archaic morphologies. The differential fates of different genes seem like a good example of some genes introgressing into a new genetic background.

    Now, one may object that "genetic background" isn't really a meaningful term. At the very least, it isn't very specific -- it might be better to have a list of genes that interact with each other and exert epistasis on potential introgressions. But it has the virtue of being empirically quantifiable. The overall genetic differences between archaic humans will eventually be measured, including their differentiation from the later modern population. As I mentioned in the FAQ, we can't narrow these values down right now, but more knowledge of genomics is going to make it quite possible. I think that the idea of archaic genes moving into a modern genetic background is going to describe the some of the evolution of early modern humans -- and I think these are important because they are selected. In other words, it is their dynamics that makes them important, not the other way around.

    UPDATE (11/9/2006): Razib gets into the introgression-defining act:

    Gene flow is a generic term, and can correctly characterize a whole host of dynamics, while introgression is very specific and precise, a subset of gene flow rather than a synonym.

    The description that follows is worthwhile, but it is a little problematic. For instance, there is the introduction of a hybrid zone as a mediator through which introgressive genes move in the process of transfer from one population to another. From some points of view this seems to work. For example, cottonwoods in Utah have well-defined hybrid zones (determined by altitude), through which introgressive alleles are thought to have passed, although now they are distributed widely into the range of the opposite parental population.

    But lots of other populations don't have hybrid zones at all. Wolves and coyotes (and dogs) mate fairly extensively wherever they are sympatric. Bison had a time in history when they received lots of genes from cattle, and introgression has continued here and there. There was never any well-defined hybrid zone, unless we consider the entire surviving population of bison to have been the zone. Introgression from mountain hare into European hare in Spain seems to have been structured around ancient Pleistocene contact zones rather than current distributions.

    And the whole concept of a "hybrid zone" doesn't really apply well to subspecific interactions.

    More in my new post, "What about species?"

  • Introgression and microcephalin FAQ

    Wed, 2006-11-08 09:53 -- John Hawks

    Considering the paper by Evans and colleagues, I've come up with a list of questions and answers:

    What is introgression?

    Introgression is the transfer of alleles across species or subspecies boundaries. In other words, it describes gene flow between populations that are partially isolated. For archaic humans, there is no test of the strength or permeability of boundaries between populations; it is common to use the term "introgression" to describe gene flow in such situations, even if such gene flow is fairly common.

    The paper by Evans and colleagues describes a scenario of adaptive introgression. In such cases, an allele with a selective advantage moves from one population to another.

    Adaptive introgression must be a very unusual event, right? I mean, I've never heard of it before!

    If you haven't heard of adaptive introgression, you haven't been reading the literature. Adaptive introgression across species boundaries is very common in mammals, and is almost ubiquitous where closely related species are sympatric. It has long been known to happen on the basis of morphological characters that spread through hybrid zones into adjacent populations. But now that molecular surveys have become common, introgressive genes have been found moving out of current hybrid zones, and also in the areas where hybrid zones likely occurred long in the past.

    Hybrid zones themselves are often quite obvious. But introgression is not about hybrids. It occurs when backcrosses spread alleles into the other parental species. Hybrids may have a mixture of many genes and characters. Introgression involves a small number of genes, which are much more likely to spread if alleles are adaptive. Where different populations are in reproductive contact, adaptive introgression may often be the most important source of adaptive alleles -- it provides a way for a species or population to benefit from the adaptive evolution of neighboring species.

    There is one thing that impedes introgression: linkage to deleterious alleles. Species separated for longer times are more likely to have alleles that are bad on the genetic background of related species, and so potential adaptive alleles must have advantages outweighing all the deleterious alleles they are linked to. In these situations, adaptive introgression may only occur after enough recombination has broken the adaptive allele apart from some or all of its linked deleterious neighbors.

    But I thought that "species" means "no interbreeding!"

    Get with the times, man! Mammal species just don't establish reproductive barriers very quickly. Comparing mammals, postzygotic isolating mechanisms take between 2 and 10 million years to evolve. No primate species pairs have evolved postzygotic isolation on the timescale represented by the evolution of Homo. When archaic and modern humans were in contact, they certainly interbred.

    OK, but why is this gene introgression? Why couldn't it just have originated in ancient Africans?

    The current evidence for introgression comes from the mismatch between the ancient coalescence time for all haplogroups of the microcephalin gene, compared to the very recent selection on the D haplogroup. Now recent selection on an ancient variant could occur within a single population, for example, if the allele was formerly neutral and gained a new advantage with some difference in the genetic background. And an ancient coalescence date would not be unusual in a single population -- several other loci match the 1.7 million years estimated for the microcephalin genealogy.

    Two things make this case especially persuasive. First, there is almost no evidence of recombination between the D and non-D haplogroups. If they existed within the same population for 1.7 million years, they should have recombined a lot with each other, and we should see some of those recombinants today. We don't. The best explanation is that the alleles were in different ancient populations, somewhat isolated from each other so that recombination was very rare.

    Second, the D haplogroup is common in Europe and Asia, but is very rare in Africa. If it increased under selection from its origin in some ancient African population, then it ought to be most common in Africa now. We might also expect a deeper origin for the D haplogroup in Africa, similar to the structure of many other genetic loci. We observe neither.

    Hey, why should this gene be so unique? There's never been any evidence for archaic genes before!

    Now, this is clearly where I have let you down, by not blogging about these papers as they have been coming out. What can I say, I have to make a living somehow! If I give away all my research, how can I stay a step ahead?

    The most similar locus to microcephalin is the region around MAPT on chromosome 17. Hardy and colleagues (2005) suggested that this locus is a Neandertal introgression. Like microcephalin, the locus has an ancient coalescence (>2 million years), and like microcephalin, an allele is under selection, with its highest current frequency in Europe. Like microcephalin, MAPT is brain-active, with most research centered on its possible role in Alzheimer's and Parkinson's disease. Unlike microcephalin, there are no recombinants between the major (H1 and H2) haplogroups; this is due to a chromosomal inversion between them. Evans and colleagues (2006) note that balancing selection might not be statistically ruled out when there is such an inversion preventing recombination. Still, balancing selection doesn't easily explain the recent positive selection, nor the geographic distribution of variation.

    Garrigan et al. (2005b) found evidence for an ancient Asian allele being retained in living Asians. This allele was from a non-coding locus, so it seems unlikely that adaptive introgression is the cause, which might suggest even more widespread genetic survival of archaic DNA. Some loci suggest the survival of archaic lineages within Africa, including another X chromosome noncoding region (Garrigan et al. 2005a) and the dystrophin gene (Zietkiewicz et al. 2003). These would presumably be attributable to partial isolation of Middle Pleistocene African populations, with introgressive gene flow among them.

    The widest survey for introgression thus far was by Plagnol and Wall (2006), who conclude that around five percent of human genes show some evidence for introgression from archaic humans. Their statistical test was looking for loci with ancient divergence times and in particular divergent alleles centered in Eurasian (non-African) populations. So this is a kind of estimate under the assumption of relatively great genetic differentiation among archaic human populations.

    I'll end with Templeton (e.g., 2005), who found that human autosomal variation supports a broad ancestry of living humans among Eurasian and African archaics, with evidence of genetic dispersals from Africa several times during the Pleistocene. Under this model, intermixture among archaic populations would have been fairly common, at least intermittently. This is the argument that I made with Milford Wolpoff several years ago (Hawks and Wolpoff 2001) -- we just don't see a lot of evidence for genetic differentiation among archaic humans.

    This kind of model would imply that genes like microcephalin -- with strong evidence for some isolation of populations -- might be fairly rare. The fact that several of them have now cropped up (the 5 percent estimate from Plagnol and Wall, 2006, being the most informative on this score) means that we have a lot about archaic human population structure yet to discover.

    But notice the nature of this uncertainty. We have a difference between substantial introgression among populations structured like hominoid subspecies on one side, and ubiquitous genetic exchanges among populations structured like human races on the other side. Complete replacement is completely out. "Mostly" replacement, or "assimilation" is still in, but with the observation that archaic human genes had substantial evolutionary importance in the adaptation of modern humans.

    In other words, we have moved the ball down the field. Time to line up for the next play.

    What is all this about microcephalin possibly not being from Neandertals?

    Well, the D haplogroup is common in many areas outside of Africa in addition to Europe. So it isn't possible to really specify in what archaic population it may have originated. There is some chance that it may be found in the Neandertal genome sequence, when that becomes available. In fact, that would be the ultimate test for many candidate introgressive alleles.

    But there is a good chance that it won't be found in the Neandertal sequence. After all, Neandertals were probably pretty thin on the ground -- especially in Europe. A sampling of their genes would be sort of unlikely to yield a high proportion of archaic alleles that may have survived to the present day. So there is hope that we will find and document such alleles, but the best evidence for many of them may remain their current pattern of variation in living people.

    Now, bear with me here. Neandertals were stupid, right? So why would one of their brain genes be advantageous in modern humans?

    There are so many possibilities here.

    1. Late Neandertals certainly weren't stupid. Consider the Châtelperronian. And the European Mousterian includes basically all the elements that are thought to represent cognitive sophistication in MSA Africans.

    2. Neandertal brains were big, and their heat generation requirements means that energetic constraints were very different from other archaic populations. The brain doesn't function in isolation -- its development, growth, and ongoing maintenance depend on metabolic constraints. So Neandertals might easily have had brain development alleles that had different responses to their high-energy lifestyles. Considering that early Upper Paleolithic people had much more effective foraging strategies than Neandertals, high-energy brain development may have had an even greater advantage than it had previously enjoyed.

    3. Modern humans are variable in brain morphology and cognition. That variability certainly includes alternative strategies (for example, personality types) that may be maintained by frequency-dependent selection. An archaic population that had particular constraints on its behavioral strategies might have given rise to strategies that worked within the modern human mix. In that context, Neandertals are fairly unique in having a very strong dietary dependence on meat, and their means of hunting was both risky and required cooperation. That adaptation may have led to behavioral strategies that succeeded in modern humans, even as Neandertal anatomies disappeared.

    Those are some possibilities we are working on. There are probably many others. The key is that we are looking at the function of some genes that survived, through our reconstruction of the total phenome of a population that no longer survives. We are limited by the evidence, but there are many suggestive hypotheses.

    Neandertals went extinct! Their features disappeared in later humans! How can any of their genes have survived?

    This is my favorite one to answer, because it invokes the true paradox of introgression. The features that we recognize as Neandertal features, were defined as Neandertal features by virtue of the fact that they are mostly gone! That means that any alleles correlated with Neandertal morphological features were almost certainly selected against, or were at best neutral. That means that those recognizably Neandertal genes are gone!

    But here we have a gene that looks to have come from some archaic population. Adaptive introgression occurs when adaptive alleles are selected, and broken apart from their genetic background. So even as many (perhaps most) Neandertal alleles disappeared, some of their alleles began to increase in frequency -- slowly at first, then very rapidly.

    Some adaptive introgressions may already have been fixed, particularly in Europe (from Neandertals). Others, like microcephalin, are still growing in frequency. The key is to remember Mendel -- this is not blending inheritance of Neandertal traits, it is the extinction of many alleles and the proliferation of some others.

    The reduction in frequency of Neandertal-like morphological traits over time is entirely consistent with this scenario. In fact, it shows the widespread importance of Neandertal-modern matings, which led to the emergence of a modern population with many Neandertal traits. The widespread genetic contact is documented by the distribution of the traits -- with different Neandertal-like traits in different specimens. That kind of contact is most likely to enable adaptive introgression to proceed.

    UPDATE (11/8/2006): Fixed some citations.

    References:

    Evans PD, Mekel-Bobrov N, Vallender EJ, Hudson RR, Lahn BT. 2006. Evidence that the adaptive allele of the brain size gene microcephalin introgressed into Homo sapiens from an archaic Homo lineage. Proc Nat Acad Sci (early edition) DOI link

    Garrigan, D., Mobasher, Z., Kingan, S. B., Wilder, J. A., Hammer, M. F. 2005a. Deep haplotype divergence and long-range linkage disequilibrium at Xp21.1 provides evidence that humans descend from a structured ancestral population. Genetics 170:1849-1856.

    Garrigan, D., Mobasher, Z., Severson, T., Wilder, J. A., Hammer, M. F. 2005b. Evidence for archaic Asian ancestry on the human X chromosome. Mol. Biol. Evol. 22:189-192. DOI link.

    Hardy, J., Pittman, A., Myers, A., Gwinn-Hardy, K., Fung, H. C., de Silva, R., Hutton, M. and Duckworth, J. 2005. Evidence suggesting that Homo neanderthalensis contributed the H2 MAPT haplotype to Homo sapiens. Biochemical Society Transactions 33:582-585.

    Hawks, J., Wolpoff, M. H. 2001. The accretion model of Neandertal evolution. Evolution 55:1474-1485.

    Plagnol, V., Wall, J. D. 2006. Possible ancestral structure in human populations. PLoS Genet. 2:e105. DOI link.

    Templeton AR. 2005. Haplotype trees and modern human origins. Yrbk Phys Anthropol 48:33-59. DOI link

    Zietkiewicz, E., Yotova, V., Gehl, D., Wambach, T., Arrieta, I., Batzer, M., Cole, D. E., Hechtman, P., Kaplan, F., Modiano, D., Moisan, J. P., Michalski, R., Labuda, D. 2003. Haplotypes in the dystrophin DNA segment point to a mosaic origin of modern human diversity. Am. J. hum. Genet. 73:994-1015.

  • Neandertal introgression, genetic-style

    Wed, 2006-11-08 00:30 -- John Hawks

    The paper by Patrick Evans and colleagues, from Bruce Lahn's lab, is now live (and free) at PNAS. There is a short news report by Michael Balter at ScienceNOW, and the Howard Hughes press release is admirably clear.

    If you've been hearing a lot about the word "introgression" lately, this is why. At least, the first of the reasons why.

    Here's the abstract:

    At the center of the debate on the emergence of modern humans and their spread throughout the globe is the question of whether archaic Homo lineages contributed to the modern human gene pool, and more importantly, whether such contributions impacted the evolutionary adaptation of our species. A major obstacle to answering this question is that low levels of admixture with archaic lineages are not expected to leave extensive traces in the modern human gene pool because of genetic drift. Loci that have undergone strong positive selection, however, offer a unique opportunity to identify low-level admixture with archaic lineages, provided that the introgressed archaic allele has risen to high frequency under positive selection. The gene microcephalin (MCPH1) regulates brain size during development and has experienced positive selection in the lineage leading to Homo sapiens. Within modern humans, a group of closely related haplotypes at this locus, known as haplogroup D, rose from a single copy 37,000 years ago and swept to exceptionally high frequency (ca. 70% worldwide today) because of positive selection. Here, we examine the origin of haplogroup D. By using the interhaplogroup divergence test, we show that haplogroup D likely originated from a lineage separated from modern humans 1.1 million years ago and introgressed into humans by ca. 37,000 years ago. This finding supports the possibility of admixture between modern humans and archaic Homo populations (Neanderthals being one possibility). Furthermore, it buttresses the important notion that, through such adminture, our species has benefited evolutionarily by gaining new advantageous alleles. The interhaplogroup divergence test developed here may be broadly applicable to the detection of introgression at other loci in the human genome or in genomes of other species.

    I'm starting a second post with Q and A regarding the paper.

    Here, I want to note some news:

    1. I have my own paper on introgression (with Greg Cochran) that will be coming in PaleoAnthropology, so it is a topic that to which we've devoted a lot of consideration.
    2. There will be more Neandertal news in the next week. These are busy Neandertal times!
    3. Because we've been working on this topic, I've been avoiding it. But now that some of this stuff has come out, I will point out that there is now a substantial literature on genetic introgression from archaic humans. More in the Q and A.

    UPDATE (11/8/2006): My colleague, Greg Cochran, has a post at GNXP discussing introgression and microcephalin further:

    If this pans out the way we think it will, introgression from Neanderthals (and maybe with other archaics) may have been one of the two fundamental patterns underlying recent human evolution.

    One of the two.

    References:

    Evans PD, Mekel-Bobrov N, Vallender EJ, Hudson RR, Lahn BT. 2006. Evidence that the adaptive allele of the brain size gene microcephalin introgressed into Homo sapiens from an archaic Homo lineage. Proc Nat Acad Sci (early edition) DOI link

  • Anagrams, part 1

    Mon, 2006-10-30 15:16 -- John Hawks

    I was putting together some anagrams on human evolution-related phrases, which turns out to be a bit of a challenge. I have no idea how hard they are to unscramble, but in many cases I've been able to make the anagram somewhat topically related to the real phrase, so you have a bit of a clue. Numbers 2, 4, and 5 are not particularly topical; the rest are to a greater or lesser degree.

    In the topical anagram category, my favorite is number 6 (for which you will want to retain the question mark at the end!).

    And that number 3! Who knew?

    Anyway, here they are. I think they would make fun extra credit items on exams, but moreso for the topical clues. As is the usual rule, punctuation marks and capitalization patterns are not part of the original.

    1. He has meat dinners, loon.

    UPDATE (10/30/2006): OK, Gretchen was doing these, and she has chewed me out because I left letters out of this one. If you wrote this down and were working on it before, there should be another "h" and another "s" besides the ones in "He's". Yes, I knew the "h" was in the original, I just didn't check the anagram carefully enough. I've changed the original now to the right number of all letters.

    2. Unhappiest torch.

    3. Learn the DNA.

    4. Hopeful Nassau caricaturist.

    5. Introgression rune.

    6. Hail, oh handier assumptions?

    7. Transatlantic hoodwinker funeral.

  • The white buffalo

    Thu, 2006-09-14 11:40 -- John Hawks

    OK, this was local news here, and now it's national news:

    MILWAUKEE - A farm in Wisconsin is quickly becoming hallowed ground again for American Indians with the birth of its third white buffalo, an animal considered sacred by many tribes for its potential to bring good fortune and peace.

    This is much, much better than panda news. No question.

    Still, I have to think they are missing an opportunity here to make this a bit, well, educational:

    Dave Heider said he was inspecting damage on his farm after a late August storm when he saw the newly born buffalo, a male. His last white buffalo, a female named Miracle, died in 2004 at the age of 10. Thousands of people came to see the animal, whose coat became darker as it aged.

    ...

    [The new calf] is no relation to Miracle, he said.

    "We never even thought about having another white one until we got this one," he said. "There's got to be a reason that we're getting these white calves."

    Yes. The reason is called inbreeding. Remember that a hundred years ago, there were only around 500 bison in North America? Considering that the "white" pelage here is not really pigmentless -- and becomes darker through ontogeny -- this may well not be a simple Mendelian trait. But even if it were, very nearly all the bison in this herd must be close relatives!

    And there's another reason for white coloration --- the majority of today's bison have cattle genes introduced during the last century. Some of these genes influence coat color. Hence, color variation in today's bison includes a range that historical bison never would have had, because of genetic introgression from cattle.

    No, the story doesn't go into this sort of thing. It does quote an expert, though:

    Odds of having a white buffalo are at least 1 in the millions, said Jim Matheson, assistant director of the National Bison Association. For years buffalo in general were rare but their numbers are increasing, with some 250,000 now in the U.S., he said.

    OK, first of all, the entire population of bison in North America is now on the order of 500,000. Considering there have been three white bison at this one farm we know that the odds are not "1 in the millions"!

    In fact, white bison are bred at one ranch in Arizona, and a record of at least a dozen of them exist from there and other places. Now, I know that the AP can't be bothered to consult Wikipedia for this sort of thing, but this seems like an especially good chance to dispel some myths about genetics -- and this one connects very clearly to the problems of endangered species in small populations!

  • Duck species collapsing in face of mallard onslaught

    Thu, 2006-08-17 23:18 -- John Hawks

    Earlier this year, I discussed a paper about the collapse of stickleback species due to increasing hybridization. In a similar vein, I ran across this 2004 paper by Judith Mank reviewing the apparent breakdown of the distinction between American black ducks and mallards:

    American black ducks (Anas rubripes) and mallards (A. platyrhynchos) are morphologically and behaviorally similar species that were primarily allopatric prior to European colonization of North America. Subsequent sympatry has resulted in hybridization, and recent molecular analyses of mallards and black ducks failed to identify two distinct taxa, either due to horizontal gene flow, homoplasy, or shared ancestry. We analyzed microsatellite markers in modern and museum specimens to determine if the inter-relatedness of mallards and black ducks was an ancestral or recent character. Gst, a measure of genetic differentiation, decreased from 0.146 for mallards and black ducks living before 1940, to 0.008 for birds taken in 1998. This is a significant reduction in genetic differentiation, and represents a breakdown in species integrity most likely due to hybridization. Using modern specimens, we observed that despite a lower incidence of sympatry, northern black ducks are now no more distinct from mallards than their southern conspecifics.

    Turns out that this outcome was predicted a long time ago, as reflected in a 1967 paper by Paul Johnsgard:

    Owing to its much smaller gene pool, the Black Duck is vulnerable to eventual swamping through hybridization and introgression, although the present hybridization rate is sufficiently low as to make this unlikely in the foreseeable future (Johnsgard 1967:51).

    Hybridization of introduced mallards with endemic species is a problem all over the world. For instance, New Zealand grey ducks:

    Small numbers of Mallard (Anas platyrhynchos) were introduced into New Zealand from Great Britain and North America over 100 years ago. Both sexes have undergone differentiation in size and plumage characters as a consequence of hybridization with the indigenous Grey Duck (A. superciliosa). Pure forms of both species, as documented by early descriptions, appear to be disappearing, particularly the Grey Duck. As a consequence of hybridization, two morphologically distinct hybrid populations have been produced: one resembles the Grey Duck and the other the Mallard. By 1981-1982 levels of hybridization, based on plumage analysis, had reached 51%, and the proportion of pure Grey Ducks had dropped to 4.5%, which is below the level suggested for the maintenance of a species. In the absence of reproductive isolation or antihybridization mechanisms between these two species, the Mallard and hybrid populations represent a potential threat to the conservation of the New Zealand Grey Duck (Gillespie 1985:459).

    And a review entitled "Extinction by hybridization and introgression" by Rhymer and Simberloff, there is this passage:

    Hybridization with introduced mallards has contributed to the decline of the endangered, endemic Hawaiian duck (A. wyvilliana) and has hampered attempts to reintroduce this species to Oahu and Hawaii. Domesticated nonmigratory mallards that escaped or were released for hunting breed with the endemic Florida mottled duck (A. fulvigula fulvigula), and the resultant introgression threatens the existence of the latter subspecies. Introgression also occurs between domesticated introduced mallards and the native Australian (Pasific) black duck, A. superciliosa rogersi (Rhymer and Simberloff 1996:86)

    Mallards are not alone, apparently ruddy ducks of North American origin now threaten the endangered population of European white-headed ducks (Oxyura leucocephala).

    References:

    Gillespie GD. 1985. Hybridization, introgression, and morphometric differentiation between mallard (Anas platyrhynchos ) and grey duck (Anas superciliosa ) in Otago, New Zealand. Auk 102:459-469.

    Johnsgard PA. 1967. Sympatry changes and hybridization incidence in mallards and black ducks. American Midland Naturalist 77:1:51-65. DOI link

    Mank JE, Carlson JE, Brittingham MC. 2004. A century of hybridization: decreasing genetic distance between American black ducks and mallards. Conservation Genetics 5:395-403. DOI link

    Rhymer JM, Simberloff D. 1996. Extinction by hybridization and introgression. Ann Rev Ecol Syst 27:83-109.

  • Has the dam broken on mtDNA selection?

    Fri, 2006-04-28 14:49 -- John Hawks

    The current Science has a paper by Eric Bazin and colleagues comparing mtDNA diversity with population size, history and ecology of 3000 animal species.

    Here's the conclusion:

    This study reveals that the mitochondrial diversity of a given animal species does not reflect its population size: No correlation between mtDNA polymorphism and species abundance could be detected, despite the large body of data analyzed. Nuclear data, in contrast, are fairly consistent with intuitive expectations. We conclude that natural selection acting on mtDNA contributes to homogenization of the average diversity among groups, in agreement with the genetic draft theory. mtDNA appears to be anything but a neutral marker and probably undergoes frequent adaptive evolution, e.g., direct selection on the respiratory machinery, nucleo-cytoplasmic coadaptation, two-level selection, or adaptive introgression, perhaps hitchhiking with a maternally transmitted parasite. mtDNA diversity is essentially unpredictable and will, in many instances, reflect the time since the last event of selective sweep, rather than population history and demography. Low-diversity mitochondrial lineages, typically disregarded as important from a conservation standpoint, might sometimes correspond to recently selected, well-adapted haplotypes to be preserved (Bazin et al. 2006:571-572, emphasis added).

    This is a nice empirical comparison, and a very impressive exercise in data mining. To accumulate the dataset, they had to troll large data depositories for cases in which the same DNA segments had been sequenced in multiple individuals of single species, and then had to match those cases with ecological information about the species, as the accompanying perspective by Adam Eyre-Walker describes.

    But, aside from the very persuasive presentation here, the fact has been obvious for years. I blogged about mtDNA selection last year. Finding such widespread mtDNA selection across taxa -- even into invertebrates -- is certainly strong support for the idea that it evolved adaptively in humans. And finding that the chance of adaptive evolution in mtDNA is proportional to population size enhances the likelihood of recent mtDNA selection in humans even more.

    Eyre-Walker draws exactly the opposite conclusion than I do:

    Interestingly, humans are an exception to the pattern seen by Bazin et al. If the authors are correct, then the effective population size estimated from mitochondrial DNA should be lower than that estimated from autosomal DNA. This is not what we see in humans; the effective population sizes estimated from autosomal DNA, Y-chromosome DNA, and mitochondrial DNA are all approximately 10,000. Does this mean that Bazin et al. are incorrect? Probably not. It may be that humans have such small effective population sizes that adaptive evolution in the mitochondrial genome is very rare; the neutrality index in human mitochondrial DNA, and perhaps nuclear DNA, certainly gives no indication of adaptive evolution. (Eyre-Walker 2006:538).

    But of course, this is quite backward -- low mtDNA diversity cannot be evidence for neutrality; at best it can fail to refute a hypothesis of selection. With our long generation lengths, autosomal DNA would have to have coalescence dates in the Pliocene to make the low mtDNA diversity stand out statistically. It is not a question of them all being neutral, it is a question of packing most of human evolution into a space of 2 million years.

    Supporting that is the observation that Eyre-Walker points out next:

    Although nuclear diversity follows the expected pattern, with more diversity in organisms that are expected to have bigger population sizes, the differences are remarkably small; synonymous diversity varies by less than a factor of 10, and allozyme diversity by less than a factor of 4. This is striking given that the population sizes of marsupials and mussels, for example, must differ by many orders of magnitude, and one would expect diversity to be linearly related to population size. This observation is not new for allozyme data (4), but it is the first time this pattern has been so clearly illustrated for synonymous diversity in nuclear genes. The lack of a strong correlation between diversity and population size in nuclear DNA may also reflect the effects of genetic hitchhiking (ibid.).

    In other words, selection has restricted mtDNA diversity, and it has also restricted nuclear DNA diversity -- just not as much. The "not as much" here is a function of recombination, which makes the nuclear genes true subjects of genetic draft.

    This isn't news either. We've known about the restricted allozyme diversity since 1984. A few voices crying in the wilderness have been reminding us from time to time, like Gillespie.

    I would note that some of their ecological substitutes for population sizes may themselves induce selective effects. For example, Bazin and colleagues note that marine molluscs have more allozyme variation than terrestrial molluscs, which they view as consistent with the greater dispersal of marine species. But greater dispersal might also involve the necessity to maintain diversity for dispersing into to different local environments, which would tend to drive frequency-dependent or balancing selection for traits responding to these local forces.

    So there is more to be done on the nuclear DNA side of this equation, probably much more. But the mtDNA comparison is very important, and hopefully will drive some reevaluation of the use of mtDNA diversity as a proxy for genetic diversity in conservation and ecological studies.

    References:

    Bazin E, Glémin S, Galtier N. 2006. Population size does not influence mitochondrial genetic diversity in animals. Science 312:570-572. DOI link

    Eyre-Walker A. 2006. Size does not matter for mitochondrial DNA. Science 312:537-538. DOI link

Pages

Subscribe to introgression

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.