john hawks weblog

paleoanthropology, genetics and evolution

recent selection

  • Could genetic drift really break your heart?

    Mon, 2009-01-19 00:40 -- John Hawks

    Are these people crazy?

    The combination of such a large risk with such a high frequency is, fortunately, unique. "How can such a harmful mutation be so common?" asks Chris Tyler-Smith from The Wellcome Trust Sanger Institute, Hinxton, UK. "We might expect such a deleterious change to have 'died out'.

    "We think that the mutation arose around 30,000 years ago in India, and has been able to spread because its effects usually develop only after people have had their children. A case of chance genetic drift: simply terribly bad luck for the carriers."

    This is a 25-bp deletion in a muscle protein gene, MYBPC3. The current allele frequency in India is estimated to be 4 percent; it is estimated to be carried by 60 million people. The paper suggests that it originated 30,000 years ago. Carriers of the gene have a massive increase in their chance of cardiomyopathy.

    Here's the relevant passage from the paper:

    The presence of a disease-associated variant at substantial frequency raises an evolutionary question: if it is disadvantageous, how did it become so common? In principle, it could be evolutionarily neutral, manifesting its disadvantages only late in life; alternatively, its disadvantages could be outweighed by advantages early in life, or in a different environment, so that it could have been positively selected. To address this question, we examined the haplotype structure surrounding the deletion. Using five short tandem repeat (STR) markers, spanning ca. 3.4 Mb surrounding the deletion in 287 heterozygous individuals, we found similar high degrees of variation in the inferred haplotypes from chromosomes with and without the deletion (Supplementary Fig. 7 and Supplementary Table 6 online). We then used allele-specific amplification to resequence ca. 10-kb haplotypes centered on the 25-bp deletion from nine heterozygous individuals (Supplementary Tables 7 and 8 online). The chromosomes carrying the 25-bp deletion showed five closely related haplotypes (Supplementary Fig. 8 online). After excluding variants likely to have arisen by recombination, we estimated a time to most recent common ancestry (TMRCA) of ca. 33 ± 23 thousand years for the deletion haplotypes (Supplementary Methods). This time slightly postdates the initial peopling of the subcontinent 30,000–50,000 years ago and together with its restricted geographical distribution suggests that the deletion did not arrive with the first modern human settlers from Africa [more than] 50,000 years ago, but arose subsequently within the subcontinent. Its occurrence in two populations from Southeast Asia can be explained by recent gene flow from India (Supplementary Note online). Collectively, these observations provide no evidence for rapid spread of a recent founder haplotype or any departure from neutral evolution (Dhandapany et al. 2009:4).

    The issue is not really whether a gene could go from 1 copy to 4 percent in 1200 generations by chance. That wouldn't be so terribly unlikely in Pleistocene humans -- in fact, the mean time for a mutation to go from 1 copy to 4 percent by drift in a population of effective size 10,000 individuals is not 30,000 years, but only around 20,000 years. On the other hand, mtDNA variation today suggests that South Asia experienced early and rapid population growth -- so we're not likely talking about a population of 10,000, but more like a minimum of 100,000 effective individuals through the past 30,000 years at least. It would take genetic drift at least 10 times longer to accomplish the requisite frequency change given that demographic history. Still, a single allele at a single gene locus might be exceptional.

    But that scenario, however unlikely, is simply not the situation we have here. Here we have a deletion that must have some disadvantage, because it gives people a fatal disease. This disadvantage is apparently dominant in effect, based on the case-control study. Yet the deletion has managed to persist within the large South Asian populations of the last 10,000 years so that today it is still around 4 percent.

    People mainly die of cardiac problems after age 40. But human reproductive lives aren't over until they're done investing in their children. Further, a weakened heart may reduce work potential or health even if it kills slowly. The fitness cost of this deletion is smaller than if it gave people a chance at a fatal disease when they are 17, but a smaller fitness cost is still a fitness cost. In a large population, that small fitness cost is going to whittle away the frequency of the allele over time.

    A thousand generations is a lot of potential whittling. Using some quick calculations, it looks like selection against the deletion as low as 0.001 to 0.0015 in heterozygotes should have been enough to cut the frequency down to around 1 percent, from an initial value of 4 percent. So even if drift increased the deletion early after its origin, it ought to be much rarer today. Meanwhile, drift looks even more unlikely, since the chances of a mutation growing from 1 copy to 4 percent against such selection are nil.

    Did this deletion have a fitness cost as high as one in a thousand? It increases cardiomyopathy by 5-fold or more compared to the wild type. So it seems very plausible. But really, we don't have any good estimates of the fitness costs of chronic diseases in pre-industrial populations.

    If the deletion was favored by some selection, that would probably be antagonistic, that is, acting against the fitness cost of the deletion late in life. The authors briefly investigated this hypothesis, as described above. They found no evidence for a recent expansion of a single haplotype around the deletion. That means that if there was strong selection favoring this deletion, it must have happened early after its origin and then petered out. If the expansion had been late in South Asian history, it would show more LD around it, and most of the deletion-carrying chromosomes would share a single long-range haplotype. So this deletion has not been increasing rapidly in the past few thousand years.

    I would hypothesize that the disadvantages of the deletion have actually increased over time. The average lifespan increased into the Upper Paleolithic and probably later as well. Meanwhile, as the population grew, larger completed family sizes became more important to fitness. As people became more sedentary, the accumulation and inheritance of possessions and land became an important means of investing in children. The increasing importance of later survival and investment in children should have raised the fitness cost of chronic disease. That would explain a pattern of evolution in which this deletion increased in frequency early in its history, but later remained static or declined.

    So, I don't suppose I can say people are crazy for thinking genetic drift could explain this deletion's current high frequency. But considering the powerful effect of weak selection over the many generations involved here, and the very large size of the South Asian population during most of that time, genetic drift seems pretty unlikely.

    References:

    Dhandapany PS and 23 others. 2009. A common MYBPC3 (cardiac myosin binding protein C) variant associated with cardiomyopathies in South Asia. Nat Genet (online early) doi:10.1038/ng.309

  • Finding selection complicated by gene conversion

    Wed, 2009-01-14 00:39 -- John Hawks

    A short paper late last year by Danielle Jones and John Wakeley examines the influence of gene conversion on the linkage disequilibrium that results from positive selection. It ends:

    Then, among 100 chromosomes that all possess the selected allele, we would expect to see about four of these aberrant haplotypes, and the chance that all 100 chromosomes would show the classic, recombination-only sweep pattern would be 0.9625100 ≈ 0.024. Thus, it is possible that many selected loci have been missed in the recent genomic scans for selection.

    Just thought I'd point out this example, although its importance is probably not too great. Low-frequency selected alleles are probably a much more important reason why current scans lack statistical power to find recent selection. There should be a lot more than have currently been identified in any scan, since the samples only number in the hundreds.

    References:

    Jones DA, Wakeley J. 2008. The influence of gene conversion on linkage disequilibrium around a selective sweep. Genetics 180:1251-1259. doi:10.1534/genetics.108.092270

  • Early iron in Africa

    Mon, 2009-01-12 14:07 -- John Hawks

    The dawn of ironworking in Africa is a hot anthropological topic. My own interests in demographic growth and dispersals depends very closely on the chronology of ironworking in Africa, because the advent of iron may have enabled faster conversion of land to agriculture.

    Many anthropologists believe that the dispersal of the Bantu languages may be traced to an agricultural explosion driven by iron technology. Others dispute this connection, raising doubts about whether the ironworking chronology can match the timing this dispersal. Both these have some wiggle-room in their dating, as do the times of introduction or domestication of various crop species.

    For the purposes of our paper last year, it was sufficient to know that populations grew in Africa after roughly 2000 BC. But to test hypotheses about gene dispersal and selection among African populations -- data that are now available -- we have to be a bit more precise.

    Last week's Science includes a summary article by Heather Pringle, which discusses the controversy over the chronology of African ironworking.

    Now controversial findings from a French team working at the site of Ôboui in the Central African Republic challenge the diffusion model. Artifacts there suggest that sub-Saharan Africans were making iron by at least 2000 B.C.E. and possibly much earlier--well before Middle Easterners, says team member Philippe Fluzin, an archaeometallurgist at the University of Technology of Belfort-Montbéliard in Belfort, France. The team unearthed a blacksmith's forge and copious iron artifacts, including pieces of iron bloom and two needles, as they describe in a recent monograph, Les Ateliers d'Ôboui, published in Paris. "Effectively, the oldest known sites for iron metallurgy are in Africa," Fluzin says.

    Some researchers are impressed, particularly by a cluster of consistent radiocarbon dates.

    And, as you might expect:

    Others, however, raise serious questions about the new claims.

    The article casts the debate as an opposition between a diffusionist hypothesis (metallurgy entered Africa from the Near East) and local development. That's appropriate, since this pattern of opposition is one of the oldest stories in archaeology. But I'm more interested in the dates and resulting population dynamics. How did technology relate to demographic growth, and how were genes affected by these processes?

    The article makes the early development of ironworking in Africa seem very credible, particularly if the only other option is a late introduction via Carthage or the Nile corridor. It is not obvious how much of the apparent controversy is about the early dates from this one site in particular, and how much is about the presence of pre-first-millennium BCE ironworking generally. Critics raise various scenarios for the contamination of radiocarbon dates by old carbon. This always reminds me about how much error may lie within Paleolithic dates if we have to worry about contamination in Iron Age sites!

    Well, more on this issue later.

    References:

    Pringle H. 2009. Seeking Africa's first Iron Men. Science 323:200-202. doi:10.1126/science.323.5911.200

  • Surfing and recent selection

    Sat, 2009-01-10 19:45 -- John Hawks

    Genetic Future and Gene Expression have commented today on the relative roles of selection and demography in shaping the genetic differences between populations. They are reacting to a paper by Hofer and colleagues (2009) that examined the differences in frequency among human populations for a number of genetic markers, including STR (microsatellite), SNP and insertion-deletion mutations.

    That paper's abstract:

    Several studies have found strikingly different allele frequencies between continents. This has been mainly interpreted as being due to local adaptation. However, demographic factors can generate similar patterns. Namely, allelic surfing during a population range expansion may increase the frequency of alleles in newly colonised areas. In this study, we examined 772 STRs, 210 diallelic indels, and 2834 SNPs typed in 53 human populations worldwide under the HGDP-CEPH Diversity Panel to determine to which extent allele frequency differs among four regions (Africa, Eurasia, East Asia, and America). We find that large allele frequency differences between continents are surprisingly common, and that Africa and America show the largest number of loci with extreme frequency differences. Moreover, more STR alleles have increased rather than decreased in frequency outside Africa, as expected under allelic surfing. Finally, there is no relationship between the extent of allele frequency differences and proximity to genes, as would be expected under selection. We therefore conclude that most of the observed large allele frequency differences between continents result from demography rather than from positive selection.

    OK, so that abstract concludes that demography (including population bottlenecks and geographic dispersals) is a better explanation for the genome-wide pattern of interpopulation frequency differences than selection.

    I agree completely.

    When I teach Anthropology 105, our introduction to biological anthropology, I always force my students to learn how to calculate Wright's FST. They really don't like it. They think it's cruel and unusual punishment to have to do math in an anthropology course.

    Well, if they're going to take my courses, they'll have to get used to it. Because with me, it's all about the math.

    So, let's consider FST. The statistic represents the reduction in heterozygosity in subpopulations due to isolation, compared to the expectation under panmixia. The expression is:

    Fst equation

    Where HS is the average heterozygosity of subpopulations, and HT is the expected heterogosity of the total population, given the allele frequencies.

    I always use a two-allele locus as an example in class, and I always choose a case in which the frequency of an allele in one subpopulation is 70 percent, and the frequency of the same allele in the other subpopulation is 30 percent. Big difference in frequencies -- the frequency is 40 percent higher in one population than in the other. In fact, that frequency difference is well within the range considered "extreme" in the current paper by Hofer and colleagues.

    Well, if the subpopulations are the same size, the average allele frequency is 50 percent. So the expected heterozygosity of the total population is 0.5. (that's 2pq, where p and q are the frequencies of the two alleles). And the average heterozygosity of the two subpopulations is 0.42. So applying the formula above, we come to an FST of 0.16.

    Now, the average FST among human continental populations is between 0.1 and 0.15. A value of 0.16 for a single gene should not be in the least bit unusual. Under neutrality, there ought to be lots and lots of gene loci that show allele frequency differences this great or greater. And indeed, Hofer and colleagues find a large set of such loci -- something like one out of 10, which actually seems a bit low to me.

    Other surveys that have tried to test the neutral hypothesis have considered a much smaller range of frequencies -- essentially, genes in which an allele is 80 percent or higher in one population and rare or absent in others. This study included much smaller allele frequency differences as part of their "extreme" and thereby found that a very high fraction of sites had such differences.

    For the broader meaning of "extreme" used in this paper, which under neutrality would include one out of every 10 loci, it is no surprise that most would look, well, neutral. There are so many neutral loci fitting these characteristics that they completely swamp out any statistical expectation of selection. There might be a handful of selected sites among the high-FST loci in the paper (and the authors identify a few candidates from other studies), but most must be neutral. The study tests the adequacy of neutral hypothesis to explain low FST genes, and finds that population differences at that level have not been driven primarily by selection.

    I'm not sure why the authors didn't include the prosaic mathematical prediction of neutrality in their paper. It seems to me that the results were foreordained by theory.

    Still, several of the observations in the paper are interesting. In particular, the excess of STR alleles outside of Africa that have increased in frequency is a sign of a long-term demographic bias toward population growth outside of Africa. I have heard that observation from other research groups in other contexts, but this is the first paper I can think of that reported it clearly. The "allele surfing" explanation is a very credible explanation for that observation -- essentially, geographically-dispersed founder effect.

    The end of the discussion includes a statement about positive selection:

    While we find that positive selection is unlikely to have shaped the allele frequency spectrum at most loci, it may certainly have acted on fewer genes than previously believed, and our current results do not allow us to discriminate between the effects of demography and selection for an individual locus. Loci which are candidates for being under positive selection should therefore be more carefully scrutinized to find links between potentially selected alleles and a phenotypic effect (see e.g. Sabeti et al. 2007).

    I find nothing to disagree with here. Any individual instance of positive selection should be tested with reference to phenotypic effects, and collectively, most of the genome's diversity was not shaped by positive selection. Our own research on positive selection (discussed in this post from last year) addresses a relatively small subset of haplotypes across the genome. Even though the number of affected genes is quite large (on the order of several thousand), it did not strongly influence the genome-wide diversity parameters assessed by Hofer and colleagues.

    The limited genome-wide effect of selection, in the face of a large apparent number of selected alleles, is one of the strongest arguments that the rate of positive selection has recently accelerated. If the rate had been high throughout human evolution, we would find a much stronger effect on the genome-wide variation than we in fact observe. The demographic changes proposed by Hofer and colleagues in fact bolster the case for a recent acceleration -- the very demographic changes that might create "allelic surfing" would also tend to generate more positively selected mutations.

    References:

    Hofer T, Ray N, Wegmann D, Excoffier L. 2009. Large allele frequency differences between human continental groups are more likely to have occurred by drift during range expansions than by selection. Ann Hum Genet 73:95-108. doi:10.1111/j.1469-1809.2008.00489.x

  • Recent evolution in Newsweek

    Thu, 2009-01-08 11:34 -- John Hawks

    I very much appreciate that Newsweek has started including a regular opinion column on science, written by Sharon Begley. I don't always like it, but it places science properly as a regular feature. And it certainly beats Jonathan Alter.

    In the most recent issue, Begley reviews some of the pieces in last year's annual Brockman volume, What Have You Changed Your Mind About?: Today's Leading Minds Rethink Everything, now out in paperback. The theme of Begley's article is that scientists need to be willing to change their minds. Even in this volume, she finds few that really represent reversals, more common being shifts of opinion:

    Many of the changes of mind are just changes of opinion or an evolution of values. One contributor, a past supporter of manned spaceflight, now thinks it's pointless, while another no longer has moral objections to cognitive enhancement through drugs. An anthropologist is now uncomfortable with cultural relativism (as in, study the Inca practice of human sacrifice non-judgmentally). Other changes of mind have to do with busted predictions, such as that computer intelligence would soon rival humans'.

    Well, it's not that interesting to read an essay that begins like, "I used to think that we would never sequence the Neandertal genome, but facts have compelled me to change my mind." OK, there's a certain entertainment value there. But changing your mind in the face of mere facts just doesn't have the "man versus self" quality of great literature.

    Unless, of course, the conflict is applied to man's understanding of self. Begley finds that the most interesting reversals have resulted from our work on recent human evolution:

    The most fascinating backpedaling is by scientists who have long pushed evolutionary psychology. This field holds that we all carry genes that led to reproductive success in the Stone Age, and that as a result men are genetically driven to be promiscuous and women to be coy, that men have a biological disposition to rape and to kill mates who cheat on them, and that every human behavior is "adaptive"—that is, helpful to reproduction. But as Harvard biologist Marc Hauser now concedes, evidence is "sorely missing" that language, morals and many other human behaviors exist because they help us mate and reproduce. And Steven Pinker, one of evo-psych's most prominent popularizers, now admits that many human genes are changing more quickly than anyone imagined. If genes that affect brain function and therefore behavior are also evolving quickly, then we do not have the Stone Age brains that evo-psych supposes, and the field "may have to reconsider the simplifying assumption that biological evolution was pretty much over" 50,000 years ago, Pinker says.

    Well, the assumption that humans stopped changing in the Pleistocene was always obviously false. You won't find many people who will admit to making that assumption, but there it is anyway, strewn through their works. It made a useful assumption for some people, in that they could examine so-called universals instead of more messy variations. But those variations are proving to be the most interesting frontier of behavioral science. Some of them have been under strong selection, perhaps showing the adaptive reactions of minds to new social and cultural systems of the Holocene.

    I tend to think that the "Stone Age Mind" metaphor exists for two purposes. First, it jibes with the Darwinian idea that evolution leads to imperfect results. Rather than having the minds of angels, humans have minds that are saddled with various equivalents of the vermiform appendix -- useful once, but not yet fully discarded.

    The second purpose was to insulate "evolutionary psychology" from the Gouldian criticism drawn by its progenitor, sociobiology. If behavioral evolution occurred long ago, in the dim Pleistocene, then surely humans today are all fully identical in their behavioral capacities.

    Why that would be true for the mind, when it is false for more mundane functions like oxygen transport is not obvious. But it clearly was a useful fiction for some -- not Pinker, who always emphasized the possible importance of human genetic diversity. So maybe he didn't really change his mind, either.

  • The Amish heart-protecting triglyceride-busting null mutation

    Sun, 2008-12-14 18:51 -- John Hawks

    Toni Pollin and colleagues (2008) report one of the simplest medical research studies you'll ever see:

    Apolipoprotein C-III (apoC-III) inhibits triglyceride hydrolysis and has been implicated in coronary artery disease. Through a genome-wide association study, we have found that about 5% of the Lancaster Amish are heterozygous carriers of a null mutation (R19X) in the gene encoding apoC-III (APOC3) and, as a result, express half the amount of apoC-III present in noncarriers. Mutation carriers compared with noncarriers had lower fasting and postprandial serum triglycerides, higher levels of HDL-cholesterol and lower levels of LDL-cholesterol. Subclinical atherosclerosis, as measured by coronary artery calcification, was less common in carriers than noncarriers, which suggests that lifelong deficiency of apoC-III has a cardioprotective effect.

    Gina Kolata covers the story in the NY Times:

    For the sake of heart disease research, 809 members of the Old Order Amish community agreed to go to a clinic in Lancaster, Pa., near their homes, and drink a rich milkshake that was made mostly of heavy cream. Over the next six hours, a group of investigators took samples of their blood, determining how much fat was churning through their bloodstreams.

    Most of the study participants responded as expected — their levels of triglycerides, a common form of fat in the blood, rose steadily for three to four hours and then declined. But about 5 percent had an extraordinary reaction: their triglyceride levels started out low and hardly budged.

    I'm generally interested in novel protective mutations, and this is clearly one -- and far from the only one. Its current frequency is 5 percent in the Old Order Amish. Neither the article nor the paper report on its frequency in the general population; although there is the intimation that it is rare. The Amish individuals carrying the mutation all share a common haplotype, apparently (based on pedigree and LD) from a single 18th-century founder.

    It remains an open question whether homozygotes for the null allele are better or worse off than normal APOC3 homozygotes. With a frequency of 5%, the allele is rare enough that homozygotes are as few as one in 400 people. They were not included in the present study. I can't find any indication that homozygote nulls for APOC3 are a known Mendelian disorder.

    I wonder to what extent the allele frequency in the Amish is due to selection.

    The Amish have high frequencies of certain otherwise rare mutations. This is one of the textbook examples of founder effects -- extreme genetic drift due to sampling a small number of founders from a much larger population. Today's Old Order Amish in the United States trace most of their ancestry to an initial population of approximately 200 people in the eighteenth century. That means that any of the alleles carried by those 200 people, even if it was vanishingly rare in the European population, has a good chance of being half a percent or higher in today's Amish.

    But founder effect is only part of the story -- there is also subsequent population growth. Those initial 200 people have more than 200,000 descendants today within the Old Order Amish. This number doesn't count descendants who may belong to other sects that splintered during the nineteenth-century (like the Mennonites [see update below]), or descendants of people who left the church. These values suggest that the Amish population has increased by some 2.3% annually during the last 300 years; it's current rate of growth is estimated at 4%.

    This is very rapid population growth on an evolutionary time scale, equalling roughly 46% per generation. With this kind of population growth, strongly deleterious alleles may come to occur in a large number of individuals, even as they decline in frequency in the population. The susceptible population grows faster than selection can remove alleles. Hence, we find a number of rare genetic disorders within the Old Order Amish as a consequence not only of founder effect but also subsequent population growth.

    The APOC3 mutation in this study was evidently not deleterious. Its current frequency of 5% suggests it may have been advantageous.

    It's not too hard to hypothesize why a mutation that decreases the risk of heart disease might have conferred a benefit in an agrarian religious sect over the last 300 years. To the extent that heart disease affects men in their 30's and older, these are still active reproductive years for men who may have family sizes of eight children or more. Further, this is a time when men may come into property from their aging parents, may become leaders of new settlements, or may begin to affect the marriages of their children -- a time when young people formally join the church. Being alive would seem like a significant fitness advantage for men in this society. Or perhaps other effects of the gene determined its success.

    The question is just how strong such an effect might be. If the mutation began with a single copy in a population of 200 founders, its initial frequency would be 0.5 percent, or 0.005. Its present frequency in the Amish is ten times that, or 0.05. If we assume that 15 generations have passed, that growth would be consistent with a fitness advantage of around 15 percent for carriers of the null mutation. In other words, the Amish population grew around 46% per generation over the last 300 years; this mutation grew around 60% per generation.

    That kind of differential increase is unlikely to have been driven by genetic drift. Considering the rarity of the mutation in the non-Amish population today, it is unlikely to have been carried by more than a single founder, although we can't exclude the hypothesis that some number of founders were relatives who carried it. That hypothesis is the most likely way for an otherwise rare mutation to hit 5% by founder effect alone. Later, after the Amish population numbered more than a thousand or so, strong differential growth of a rare mutation by chance alone would be impossible. Still, we might imagine that in the initial few generations, one or two founders might have had a predominant effect on the subsequent Amish gene pool. We would need to suppose that the genes of such fecund founders now account for more than 10% of the present Amish gene pool. That's a testable hypothesis. Selection is simpler -- mainly because its effect can be spread across many more generations.

    The interesting thing about selection in the Amish is that their population growth greatly affects the fixation rate of new advantageous mutations. In a constant-sized population, the fixation probability of a new advantageous mutation is roughly twice the heterozygote fitness advantage, denoted as 2s. But in a growing population, the fixation probability is 2(s + r) -- when s and r are both small. If we assume a growth rate of 46% and a heterozygote fitness advantage of 15% for this null allele, it should be obvious that we've entered the territory where our small-value approximation no longer holds. New adaptive mutations are unlikely to exit the Amish population by genetic drift.

    The subject of positive selection in founder populations is under-explored, from a theoretical perspective. Especially considering the very rapid growth of some human founder populations -- measured against the generational time scale -- there is a good chance that we'll find many new adaptive mutations in such populations.

    UPDATE (2008-12-15): A reader writes:

    It is a common mistake to think that the Mennonites as a group broke off from the Amish. It is actually the other way around with the split occurring in Europe before both groups came to the Americas.

    He kindly provided a couple of sites with more information (here and here). I appreciate the correction!

    References:

    Pollin TI and 13 others. 2008. A null mutation in human APOC3 confers a favorable plasma lipid profile and apparent cardioprotection. Science 322:1702-1705. doi:10.1126/science.1161524

  • An MAPT review

    Thu, 2008-11-06 23:26 -- John Hawks

    Elizabeth Pennisi writes this week a news focus in Science about the genome region labeled 17q21.31. I'm probably one of the few people who would recognize that address right away: A recently selected inversion in this region is one of the best candidates for introgression of Neandertal genes into recent Europeans.

    Eichler suspects that when H1 appeared, it somehow provided a strong fitness bonus and became much more common over time at the expense of H2. In Africans, H2 almost disappeared, except in the relatively few people who migrated to Europe 50,000 to 100,000 years ago. Then, for as-yet-unknown reasons, H2 provided its own advantage in the European population--as Stefánsson's data show--and the pendulum has begun to swing in the other direction.

    Hardy and, to a lesser extent, Stefánsson give credence to a more extreme explanation for the distribution of H2. Hardy thinks that H2 had disappeared from the modern humans moving out of Africa to populate the Northern Hemisphere but not from Neandertals, who reintroduced the inversion into the European gene pool through interbreeding with Homo sapiens 28,000 to 40,000 years ago. This view is not supported by the genetic evidence emerging from sequencing Neandertal DNA, and "I realize it's an off-the-wall idea," says Hardy. But he nonetheless thinks it's plausible.

    We covered the locus in our Trends in Genetics review earlier this year. Unless there are new data I don't know about, there is not yet any confirmation or test possible from the Neandertal genome. I'm not at all confident that there will be one, since detecting structural variants like inversions in the fragmented ancient DNA will not be trivial.

    This is very interesting also:

    The sequence comparisons also reveal that independently in humans, chimps, and orangutans, this 900,000-base region has reoriented itself into the H1 orientation, which explains why Eichler found both orientations in these primates. "This bit of DNA has been flip-flopping up and down. There must be an evolutionary reason for that, but we don't know what it is," says Hardy.

    Inversions shouldn't be trivial; on the average they should be deleterious. So finding inversion parallelism is curious. I wonder if there is some strategy variant here that might explain the flipping -- sort of like MHC alleles that can emerge in parallel?

    Pennisi spends much of the article describing the current medical research linking rare deletions in 17q21.31 with mental retardation:

    Now they have joined forces to describe 22 patients in molecular and clinical detail in a paper published online 15 July by the Journal of Medical Genetics. They calculate the prevalence of this new genomic disorder to be 1 in 16,000 newborns, and it may account for up to 0.64% of unexplained mental retardation in Europeans. "This is the first novel microdeletion syndrome identified and one of the most frequent ones," says collaborator Joris Veltman, a molecular geneticist at RUNMC.

    There are also links between the inversion polymorphism and schizophrenia and Alzheimer's, although these remain "enigmatic" because neither a biochemical nor a clear mutational explanation for the correlations has been found.

    Anyway, it's a good story and well worth reading as an example of how a single genetic region can fall subject to population genomics, medical genetics, contrasts of rare and common variants, structural variability, primate comparisons, and all the rest.

    It also shows how scientific attention can dogpile onto single genomic locations. That doesn't necessarily mean these are the best or only relevant candidates. Maybe more than anything, it means that grant reviewers recognize loci that have been interesting in prior studies, and reward further attention with more research dollars.

    UPDATE (2008-11-8): According to a reader, there's a rumor that the 17q21.31 region has been found in the Neandertal genome, and that the inversion (the putative selected version) is not present. That would weigh against the selected allele having been ubiquitous in Neandertals, although it cannot exclude that it may have been present. In any event, the practical effect would be to remove this as a likely case of introgression.

    The recurrence of inversions in this region in other hominoids remains very interesting...

    References:

    Pennisi E. 2008. 17q21.31: Not your average genomic address. Science 322:842-845. doi:10.1126/science.322.5903.842

  • Poor Ötzi's doomed mitochondria

    Thu, 2008-10-30 23:24 -- John Hawks

    I must have seen a dozen stories today that started this way:

    Reuters:

    "Otzi," Italy's prehistoric iceman, probably does not have any modern day descendants, according to a study published Thursday.

    Washington Post:

    Sparking a new mystery about early man, Italian scientists have unraveled the DNA of the 5,300-year-old "Iceman" mummy, only to discover that he doesn't appear to have modern descendants anywhere near where he was found in Europe.

    I didn't really think about how funny that line is, until I was talking to someone about it this evening -- there's no way that the mtDNA can be informative about Ötzi's descendants, because, well, it's maternally inherited. D'oh!

    Meanwhile, we can ask what it means that a randomly picked Neolithic man would have a now-extinct mtDNA lineage. According to ScienceNews, Antonio Torroni thinks that it is a case of marker loss:

    It’s possible but unlikely that Ötzi belonged to a fourth branch of K1 that is now extinct or rare, Torroni says. He considers it more probable that a random mutation in the Iceman’s mitochondrial DNA erased the only genetic marker currently used to identify members of the most common K1 branch.

    Could be.

    Whether the sequence was a unique branch of K or a slight variation on a well-represented subtype, there's a natural hypothesis for why it no longer exists, that somehow is mentioned in none of the reports. The Iceman is hardly singular: Remember that the mtDNA pool of Central Europe in the Neolithic was dominated by lineages that are now rare. And Medieval Danes had several mtDNA sequences that are now rare or absent in Scandinavia. And the Cambridge sequence has been increasing in frequency in Britain since medieval times. And so on.

    There's no mystery here. These are large populations, and mtDNA haplogroups have been changing in frequencies between ancient DNA samples and the present. MtDNA has functions that plausibly were subject to changing environments after the Neolithic. This seems like a good candidate for recent selection.

    UPDATE (2008-10-31): A reader writes:

    Hi John,

    You raised the hypothesis that recent selection might explain the apparent dearth of modern examples of his haplotype, with private mutations at 3513T and 8137T. Those mutations are synonymous, which would seem to rule that out, leaving drift as the better alternative.

    That's a good point, and one often raised as a criticism of the selection hypothesis. If a sequence differs from some extant (still-existing) variant by only synonymous mutations, then selection can't explain why one is gone and one is still here. Only genetic drift can explain the extinction of one and the survival of the other.

    But this is not the entire story. In this instance, we have synonymy with one major haplogroup (K) which has coding differences from other haplogroups. Those haplogroups have been changing in relative frequencies in Europe over the last few thousand years. A decrease in the frequency of K would naturally cause rare K variants to become rarer or extinct, even though they are neutral with respect to each other. In this instance, lineage extinction would be the result of selection, even though the extinct lineages have no disadvantage relative to some that still survive.

    Now, is that the case with Ötzi's haplotype? I would say it's a good hypothesis, but not yet testable. We really would like to know the frequency of the haplogroup in the Neolithic. For that matter, further ancient DNA sampling will test many hypotheses of genetic drift and selection, because direct observation of ancient frequencies gives us a source of information that does not depend on sampling models from living populations.

  • Information theory and mutual information between genetic loci

    Fri, 2008-10-10 22:52 -- John Hawks

    This is the second in a series on information theory and tests for recent selection. The first entry, "Information theory: a short introduction" reviewed the basic concepts of information measures and their background.

    The International HapMap is a massive project to determine the genotypes for up to 3 million single nucleotide polymorphisms (SNPs) in samples of people from 11 population samples around the world. The current data release (Phase 3) includes genotypes for a subset of over 1.5 million SNPs in 1,115 people. The 11 population samples include people of African ancestry from the US Southwest, Utah residents of Northern and Western European ancestry, Han Chinese from Beijing, people of Chinese ancestry from Denver, people in the Houston Gujarati Indian community, Japanese people from Tokyo, Luhya and Maasai people from Kenya, people of Mexican ancestry from Los Angeles, Italians in Tuscany, and Yoruba from Ibadan, Nigeria.

    As impressive as this effort is, we may wonder why exactly SNP genotyping of so many people is a valuable enterprise in itself. The project’s homepage includes this short statement:

    The goal of the International HapMap Project is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. By making this information freely available, the Project will help biomedical researchers find genes involved in disease and responses to therapeutic drugs.

    There are theoretical and practical objections to this simple explanation (as I discussed here last month). However, what no one involved with the project seems to have expected is the extent to which the data would demonstrate the importance of recent adaptive evolution in human populations.

    Here, I am describing some of the ways that we can test hypotheses about natural selection by using the SNP genotypes from the HapMap. This is a theory-centric description, with some digression into practical aspects of handling the genotype data. First, I consider how we might use information theoretic concepts to test the hypothesis of independence between two genetic loci.


    Entropy and genotypes

    The data from the HapMap consist of an array of biallelic genotypes for each population. The size of this array is different for each population; we may consider it to have m rows, each row corresponding to a single SNP locus, and n columns, each corresponding to a person. The entries are genotypes: AA, AT, CG, and so on. Each SNP is biallelic, so it hardly matters what we call the two alleles—we can arbitrarily label them a and b. Thus, the three possible genotypes may be labeled aa, ab and bb.

    Of course, from a sample of genotypes, we can readily estimate the frequencies of the two alleles in the population. The sample allele frequency ˆp (a) = ˆp (aa) + 12ˆp (ab), where the estimate ˆp (aa) is the number of aa individuals in the sample divided by the sample size n. It is worth expressing some statistical caution about estimates. Although the HapMap SNPs are biased toward common alleles, nevertheless some of them are rare indeed. And as we stretch down the chromosome to consider multilocus haplotypes, many may be vanishingly rare or even singular in the population, even though they happen to occur in the sample.

    Now, how shall we estimate the entropy of a single locus? It seems there are several ways we might look at the question.

    1. Allelic entropy We might use the entire sample of genotypes at the locus to estimate the allele frequencies. In that case, the entropy of the SNP locus would be estimated by
      Hˆ(SNP ) = - [ˆp(a)log ˆp(a)+ ˆp(b)log ˆp(b)]
      (1)

      For a SNP locus with empirical genotype frequencies p(aa) = 0.16, p(ab) = 0.48 and p(bb) = 0.36, this estimate of allelic entropy would be 0.97 bits.

    2. Genotypic entropy Or, we might consider the genotype frequencies as the essential elements of the system. In this case, the entropy would be estimated by
      Hˆ(SNP) = - [ˆp(aa) log ˆp(aa)+ ˆp(ab)logpˆ(ab)+ ˆp(bb)log ˆp(bb)]
      (2)

      For the same genotype frequencies listed above, this genotypic entropy would be estimated as 1.46 bits.

    3. Sample entropy Or, we might consider the sample of genotypes as a series of n repeated trials. That would make the sample entropy in the example 1.46n bits. We might also calculate a sample entropy considering the gametes instead of the genotypes—but this would work out to be the same, as long as we don’t know which gametes came from which parents of heterozygotes.

    To understand this last point, imagine that the sample is a deck of shuffled cards. Each card may be red with probability p(a) or black with probability p(b). If we drew cards one at a time, we could record a sequence (red, red, black,…) of 2n cards. Each card drawn thereby has an exact position in the sequence. If p(a) = 0.4 and p(b) = 0.6 as above, then this entropy will be 1.94n bits. Now, imagine instead that we draw the cards two at a time, and record only these pairs (homozygote red, homozygote black, heterozygote,…). We will have a sequence half as long, and this sequence does not include the exact position of the red and black cards, only their position as part of a pair. The entropy of this sequence of n genotypes is only 1.46n bits. This gives some insight into the nature of the sample entropy as defined above: It is the entropy of a sequence of genotypes.

    By contrast, the allelic entropy is the uncertainty associated with drawing a single copy of the SNP at random from the sample. The genotypic entropy is the uncertainty associated with drawing a genotype (two SNP copies) from the sample. Again, these are empirical estimates derived from the sample; they represent the underlying population just to the extent the sample does.

    Which of these estimates will be useful to us? The sample entropy tells us how many bits of hard drive space we need to store the HapMap data under an efficient coding scheme. That’s interesting, but not necessarily what we have in mind. The other two are sample estimates of an underlying population characteristic—either allele or genotype frequencies. In what follows, we will be interested in quantifying how much our uncertainty about one individual’s SNP genotype is reduced by knowing that same individual’s genotype for some other SNP. It would seem that the appropriate measure for this problem will be the genotypic entropy.

    Mutual information between SNPs

    The joint entropy represented by two SNP loci may be less than the sum of their individual entropies. That is to say, there may be mutual information between the two loci. Mutual information can arise between SNPs for several reasons. Most obviously, if the sample of individuals actually consists of members of two distinct populations with different allele frequencies, then two distinct SNPs may have mutual information because of this hidden population structure. This source of mutual information is exploited by admixture mapping—a process that involves taking people with recent ancestry from two or more populations and finding genomic regions that are linked by virtue of the fact that little recombination has yet had time to reshuffle haplotypes that may be locally common but globally rare. Or two loci may share mutual information because they are physically linked.

    As an example of mutual information estimation, consider two sets of genotypes (the rows of the following table):

    A = 0 1 0 2 0 1 1 0 1 0 1 0 2 1 0 0 1 0 1 0
    B = 2 2 2 0 1 1 2 2 0 2 1 1 0 1 2 1 2 2 1 2

    Each SNP locus has three genotypes; twenty people are represented in the sample, each column represents the genotypes of a single individual. The empirical estimate of genotypic entropy from the first SNP locus (Ĥ(A)), based on 10 zeros, 8 ones and 2 twos, is 1.36 bits per individual. The same estimate for the second SNP locus (Ĥ(B)), based on 3 zeros, 7 ones, and 10 twos, is 1.44 bits per individual. The sum of the individual entropies for the two SNP loci is 2.8 bits. But the joint entropy of the two loci (Ĥ(A,B)), based on three (0,1), seven (0,2), one (1,0), four (1,1), three (1,2), and two (2,0) joint genotypes, is only 2.36 bits. Hence, the two SNP loci share an estimated 0.44 bits of mutual information (e.g., Î(A;B) = 0.44 bits).

    What does this mean? Consider the following contingency table, based on the genotypes above:

    0 1 2
    0 3 7
    1 1 4 3
    2 2

    None of the sampled individuals have the joint genotype (0,0) and none have (2,1) or (2,2). But even more important, a full seven have (0,2) even though only 2.5 would be expected if the two SNPs were independent.

    The contingency table is a clue (for the statistically-minded). The mutual information is telling us something like Pearson’s χ2 test of association. If we know the genotype for SNP A, we can do better than chance predicting the genotype for SNP B.

    Significance testing for mutual information

    The comparison with Pearson’s χ2 test raises another obvious question: How do we test whether a given amount of mutual information is statistically significant? In the example above, we have estimated 0.44 bits of mutual information between SNP A and SNP B. Is this a lot? How much mutual information should we expect between two random samples of data?

    Let’s take an even simpler contingency table — the relation between two coin flips. Each flip may have a 50-50 chance of producing “heads” or “tails”. On average, we expect to see one fourth (H, H), one fourth (T, T), one fourth (H, T) and one fourth (T, H) in our results. But if we perform this experiment (2 coin flips) a finite number of times, we will almost always see some slight divergence from these proportions. Ten sets of coin flips absolutely can’t give us one fourth of each result, because ten doesn’t divide by four evenly. On top of that problem, we might get six or seven (H, H) by chance, even if we only expect 2.5. Any of these cases will give us a positive non-zero estimate of mutual information, even if there is no causal connection between the coin flips.

    Another way of stating this observation is that the estimate of mutual information, Î(A;B) is biased. We should expect the estimate from a small sample to be larger than the true mutual information in the population at large.

    The analogy between mutual information and the χ2 test is more apt than it might appear. In fact, over many trials, the distribution of sample estimates of mutual information will approximate a χ2 distribution, multiplied by a factor 2nlog 2, where n is the number of cases in the sample. Our sample of genotypes of A and B above has 20 individuals and 3 possible genotypes, so 40(log 2)Î(A;B) should be distributed as a χ2 with 4 degrees of freedom.

    Reflecting on the three ways we might estimate the entropy of a locus, above, this is twice the mutual information calculated from the sample entropy, measured in nats instead of bits. Even though we are interested in estimating the population characteristic, the genotypic entropy, the significance of our estimate can only be evaluated by considering the entropy of the sample.

    And indeed, we can perform a randomization of the two sets of genotypes and show the correspondence. Here, I’ve done ten thousand permutations of the genotypes in A with respect to those in B:

    Simulation outcome

    The bars are the histogram representing the 10,000 permuted samples, the curve is the χ2 density function with 4 degrees of freedom. You can see that the permutations show significant clumpiness. With only 20 sampled individuals, some fractional combinations come up quite often while others are impossible. Also, the permutations are more biased than the χ2 would predict—there are too few small values for Î(A;B). This is another small sample effect. Generally, the χ2 approximation fails when there are fewer than 5 observations in a cell, and shouldn’t really be trusted with fewer than 10. Here, we have nine cells and only 20 total observations. But our value of 0.44 bits—equal to a sample mutual information of 12.2 nats on the scale of the figure—is significant according to the (uncorrected) χ2 approximation with p = 0.016, and according to the permutation test with p = 0.014. If we are very concerned about the deviation from the χ2 distribution, we might decide to use Fisher’s Exact Test on the underlying contingency table.

    With more observations, data do tend to converge to a χ2 distribution. For example, here I have run 10,000 permutations of a sample of 10,000 individuals, for two loci, each locus with 10 alleles at equal frequencies. The curve is a χ2 distribution with 81 degrees of freedom:

    Simulation outcome

    Very nice.

    This comparison suggests a couple of things. First, we can always do a permutation test of the hypothesis of independence. Take a sample of paired genotypes, shuffle up one of them, and see if the observed pairs share higher mutual information than a large fraction (say, 95%) of the permuted sets. (Naturally, if at this point you are already thinking of a genome-wide survey, you will need to consider ways to correct for multiple comparisons….)

    Second, we can get approximate results for mutual information using a χ2 test. All we do is multiply the mutual information estimate Î(A;B) by 2nlog 2 and compare it to the appropriate significance level of the χ2 distribution with the appropriate number of degrees of freedom. This approximation will be poor for small samples, including the HapMap samples. Again, if we were testing the hypothesis of independence in those cases, we would likely want to use Fisher’s Exact Test instead.

    But in what follows, we will generally not be testing the hypothesis that two loci are independent; we will be testing the hypothesis that they are linked under neutrality. In that context, these statistical tests of independence will be useful for quickly weeding out genetic regions where linkage is negligible. Then, we can employ different tests for regions where linkage appears to be more substantial, tests that make more effective use of the properties of mutual information.

    Next: Genetic drift reduces mutual information

  • David Goldstein profile

    Tue, 2008-09-16 00:38 -- John Hawks

    Nicholas Wade profiles Duke University geneticist David Goldstein in the current NY Times. This article covers several different topics that are worth comment.

    He begins by describing the flawed premise of the HapMap:

    The principal rationale for the $3 billion spent to decode the human genome was that it would enable the discovery of the variant genes that predispose people to common diseases like cancer and Alzheimer’s. A major expectation was that these variants had not been eliminated by natural selection because they harm people only later in life after their reproductive years are over, and hence that they would be common.

    This idea, called the common disease/common variant hypothesis, drove major developments in biology over the last five years. Washington financed the HapMap, a catalog of common genetic variation in the human population. Companies like Affymetrix and Illumina developed powerful gene chips for scanning the human genome. Medical statisticians designed the genomewide association study, a robust methodology for discovering true disease genes and sidestepping the many false positives that have plagued the field.

    Of course, it turned out great for me, and others who wanted to study recent evolution of human genes. But the entire thing was built on an idea that was obviously false. Sure, a variant that causes mortality late in life might be only weakly selected. But it still shouldn't be common! And any knowledgeable reader of the early HapMap publications could tell that the common variant model was built on illusions. To sell the idea, they depended on genetic disorders like sickle cell, cystic fibrosis, and lactose intolerance. Most were selective balances; the few that weren't (like lactase) would later turn out to be cases of very recent selection.

    In other words, the common variant idea needed selection to be common, even ubiquitous -- even as its proponents were arguing that selection was rare or nonexistent.

    Goldstein points this out:

    “After doing comprehensive studies for common diseases, we can explain only a few percent of the genetic component of most of these traits,” he said. “For schizophrenia and bipolar disorder, we get almost nothing; for Type 2 diabetes, 20 variants, but they explain only 2 to 3 percent of familial clustering, and so on.”

    The reason for this disappointing outcome, in his view, is that natural selection has been far more efficient than many researchers expected at screening out disease-causing variants. The common disease/common variant idea is largely wrong. What has happened is that a multitude of rare variants lie at the root of most common diseases, being rigorously pruned away as soon as any starts to become widespread.

    I should add to those comments: Of the variants that have been found in these genome-wide association studies, for Alzheimer's, Type 2 diabetes, schizophrenia -- a significant number appear to have been recently selected. So even these few that have been found wouldn't have been predicted under the "common variant" model. But most variants that cause senescence must be rare. That's Medawar's theory. Or they may be balances. That's Williams' theory. This is a case where modern evolutionary theory gives very clear predictions, which have now been confirmed at enormous cost.

    I suppose I shouldn't worry. After all, the physicists certainly spend a lot of money to confirm their theories...

    The article goes into some detail about Goldstein's work on genetics and Jewish history, the subject of his recent book. I don't have much to add, but I'll be linking to another interesting article on that topic later on.

    Toward the end, the article moves into my special area of expertise:

    Another pursuit that interests him, one of high promise for reconstructing human evolutionary history, is that of discovering which genes bear the mark of recent natural selection. When a new version of a gene becomes more common, it leaves a pattern of changes that geneticists can detect with various statistical tests. Many of these selected genes reflect new diets or defenses against disease or adaptations to new climates. But they tend to differ from one race to another because each human population, after the dispersal from Africa some 50,000 years ago, has had to adapt to different circumstances.

    This newish finding has raised fears that other, more significant differences might emerge among races, spurring a resurrection of racist doctrines. “There is a part of the scientific community which is trying to make this work off limits, and that I think is hugely counterproductive,” Dr. Goldstein said.

    This has indeed become a great concern for the people who fund research into genetic variation. NIH is conducting a panel next month on the "ethical concerns" raised by the study of recent selection, complete with advice to journal editors about how to review such research. I think Goldstein's worry -- that some are "trying to make this work off limits" -- is largely justified.

    Goldstein argues that finding recent selection will be ultimately unimportant:

    He says he thinks that no significant genetic differences will be found between races because of his belief in the efficiency of natural selection. Just as selection turns out to have pruned away most disease-causing variants, it has also maximized human cognitive capacities because these are so critical to survival. “My best guess is that human intelligence was always a helpful thing in most places and times and we have all been under strong selection to be as bright as we can be,” he said.

    This is more than just a guess, however. As part of a project on schizophrenia, Dr. Goldstein has done a genomewide association study on 2,000 volunteers of all races who were put through cognitive tests. “We have looked at the effect of common variation on cognition, and there is nothing,” Dr. Goldstein said, meaning that he can find no common genetic variants that affect intelligence. His view is that intelligence was developed early in human evolutionary history and was then standardized.

    I have no opinion about whether Goldstein's argument about genetic causation of IQ is correct. It's clearly heritable within populations, but there has been very little success identifying genes that may explain the genetic variance. So his argument about common variants could well be right.

    Still, it seems to me that he wants to have his cake and eat it too. Some thoughts:

    1. The passage seems contradictory. If we're not going to find anything interesting, why is it such an interesting topic?

    2. Of course, intelligence isn't the only thing that's interesting. My research on language and hearing, diet change, food preferences, disease resistance, aging and longevity -- all those things are pretty interesting too, and vary historically among populations. I can understand why people think intelligence is ominous and threatening, but is it really more so than, say, disease susceptibility?

    3. If Goldstein is right, and IQ is like other traits for which the common variant model is false, that still doesn't lead to his conclusion. After all, Type 2 diabetes varies in risk both among individuals and between populations for genetic reasons, even though we've found few common alleles of significant effect. The logical conclusion of Goldstein's argument is that the brain is complicated, thousands of rare genetic variants may have relatively large effects on IQ in different families, and any differences that exist must have many causes.

    4. If the "intelligence" function of the brain is really affected by thousands of different rare mutations, in hundreds or maybe thousands of different genes, doesn't that mean that IQ should be strongly influenced by pleiotropy? After all, at least some of those hundreds of genes must be doing other things, and if they're anything like the rest of the genome, around one in seven of them has been strongly selected in the last 10,000 years.

    The assumption here that I find the most troubling is that intelligence is somehow the purpose of recent human evolution -- so much so that populations could not be anything but identical. But nothing could refute that assumption more eloquently than the scans for recent selection. Yes, the brain is represented on those lists, but so are the testes. And the blood. And the gut. We know from functional genomics and gene expression that brain, gut, bone, and blood are often influenced by the same genes. Recent human evolution is not progress toward a pinnacle. The human population is a snowdrift where ten thousand trade-offs have blown together, mostly by the luck of mutations.

    I prefer to fall back on Dobzhansky. We should not confuse equality with identity.

Pages

Subscribe to recent selection

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.