john hawks weblog

paleoanthropology, genetics and evolution

India

  • Neandertal similarity in the HapMap samples

    Mon, 2012-06-25 11:36 -- John Hawks

    In my last installment on Neandertal introgression in present-day human samples, I covered whole genome data from the 1000 Genomes Project ("Which population in the 1000 Genomes Project samples has the most Neandertal similarity?". For the next few weeks I'll be releasing more of these comparisons, made with the help of my Ph.D. student, Aaron Sams.

    Just to remind about our methods for comparing genomes, what we have done is to examine every base reported as a single nucleotide polymorphism by the 1000 Genomes Project. If the sequencing data had no errors, then this would be an account of every point mutation in the human genome. However, the data are imperfect in various ways, as I'll note below. Likewise, the Neandertal sequence data are imperfect in various ways.

    Here's one of the 1000 Genomes Project comparisons, showing the histogram for pooled European, African, and Chinese samples. In this chart, the number of shared Neandertal derived SNP alleles is the x-axis, divided into bins of around 500. The y-axis is the number of individual genomes in the sample found in each bin. So on this chart, the largest number of European genomes (nearly 120) share very approximately 645,000 derived SNP alleles with the Vindija 33.16 genome.

    Comparison of shared Neandertal derived variants in African, Chinese and European samples

    I find it necessary to be very explicit about these charts, because after showing them to many people I know how easily they can be misinterpreted. It's natural to assume that they are bar charts, where higher y values mean more Neandertal. But with more than 2000 genomes to compare, a bar chart is really just noise. These histograms are much like bell curves, in which the shape of the distribution on the y-axis indicates the dispersion within the population of Neandertal shared alleles.

    Percentages

    Everyone is excited to find out what percentage of Neandertal ancestry people have. I'm hesitant to report percentages, because I think they are misleading on these data. There is some filtering hiding beneath the data. In particular SNP alleles that are found only in one individual ("singletons") are likely to be undersampled by the project's sequence analysis. Because gene variants that have introgressed from Neandertal populations tend to be rare in present-day samples, when we miss some rare alleles, this tends to reduce our estimate of Neandertal similarity. This bias in resequencing data should affect populations roughly in proportion to their Neandertal ancestry. Our comparisons of different populations are therefore likely to give the right order of Neandertal ancestry (e.g., Europeans more than Asians) but may underestimate the total fraction of ancestry by some amount. We are counting human SNP variants and not every base pair in the Neandertal genome data, so the effect of sequencing error in the Neandertals will be minimal, but nevertheless present in a small fraction of comparisons. These errors should be randomly distributed with respect to human population differences, but they also add noise that should decrease the accuracy of percentage estimates.

    For another thing, we don't know where the zero point may be. Europeans have around 3 percent more than Yoruba; Yoruba (as I showed in the last post) have around a half percent more Neandertal similarity than Luhya in the 1000 Genomes Project sample. The Luhya are almost certainly not minimal for living people, in fact I would put some money against it. Since some Neandertal alleles have proceeded right up to high frequencies outside Africa, there has been ample opportunity during the last 30,000 years or more for other alleles to have spread into Africa.

    Our conservative approach is to rely on comparisons of large samples of people, ideally hundreds, and to trust a comparison only when it achieves statistical significance in these samples. That still allows us to detect very slight excesses of Neandertal ancestry in some populations, because the data from hundreds of individuals is very strong evidence. But the overlap among populations is sometimes very extensive even if their means differ significantly.

    Incomplete lineage sorting (ILS) is one pattern by which living people share alleles with Neandertals. ILS should be equally distributed among populations today, under the assumption that Neandertals and ancestral Africans stem from a single unstructured population. Obviously, Europeans and Asians share more derived SNP alleles with Neandertals than do Africans today, so we can strongly reject the hypothesis of isolation between African and Neandertal populations.

    Given that, three patterns of evolution could have caused some populations to share more derived alleles with Neandertals than others.

    1. Population structure in the ancestors of Africans and Neandertals may have caused some populations to share more ILS with Neandertals than others.

    2. Continued gene flow between Neandertals and Africans could have spread Neandertal alleles into Africa and vice-versa.

    3. Recent introgression from Neandertal populations into the ancestors of today's populations may have transferred new Neandertal alleles into recent humans.

    These three processes actually overlap with each other. Very likely all three of them happened -- although to date, the descriptions of Neandertal genome data have accentuated the last and argued that the first two are relatively less important [1] [2]. A "new" allele in a Neandertal may actually have originated from a mutation more than a half million years ago, have been lost within ancient Africans, and transferred into today's Europeans when they encountered and mixed with Neandertals. We cannot tell these processes apart from the standpoint of any single SNP allele. Only by comparing many SNP alleles across many populations can we sort out their relative importance.

    To this end, we have been comparing populations with each other and ancient Neandertals in many different ways. The 1000 Genomes Project has continued to sample and resequence many of the same samples that were initially amassed for the International HapMap Project. The HapMap was a project based on genotyping individuals with microarray technology. Genotypes are just as informative in many cases as whole-genome sequences. If you already know which genetic variations you want to examine, a microarray can save a substantial amount of wasted effort.

    With Neandertal comparisons, we don't start out knowing in advance which genotypes will be useful. For this reason, genotyping data yields a potential bias when comparing to Neandertal or other human genomes. The microarray was designed to include genotypes that were already known to vary in some human population. With the HapMap, this bias tends to overrepresent the genetic variations in the initial HapMap samples -- generally, Utah residents of northern European descent, ethnic Yoruba people from Nigeria, ethnic Han Chinese from Beijing, and Japanese people from Tokyo. If these samples share some common derived SNP alleles with Neandertals, they will very likely be represented in the genotyping array. But very rare alleles won't be represented. And alleles that are uniquely in other populations -- such as East Africans or South Asians -- may not be represented, either. The bias is called "ascertainment bias" because it comes from the "ascertainment" of SNPs, or their initial discovery in some populations but not others.

    It is possible now to find sets of SNP markers that have been statistically chosen to minimize ascertainment biases. The filters used in such comparisons are complex, and in some cases actually rely on the Neandertal genotype, so I haven't used them here. For our first paper we have focused on the whole-genome sequence comparisons, but here I'll give the same comparisons on some HapMap samples to show approximately where they fit. I will focus here on raw comparisons instead of standardizing them in terms of the predictive ability of informative SNPs on whole genome data. Finding the most informative SNPs is part of the process of sorting introgression from earlier population structure, and is rather more complex; I prefer to start with something very simple and visually easy to interpret.

    South Asia

    One interesting place is India. The HapMap includes a sample of Indian-Americans with origins in Gujarat, in western India. Here's a plot comparing the Gujarat ancestry (GIH) sample with the CEU and LWK samples:

    Comparison of shared Neandertal derived variants in CEU, LWK and GIH samples

    The GIH sample has substantially fewer shared Neandertal derived SNP alleles than the CEU sample. What may be more curious is that the GIH sample also has fewer than East Asians on average. The JPT+CHB samples, for example, exceed the GIH mean by around 100 derived SNPs.

    Comparison of shared Neandertal derived variants in JPT+CHB, LWK and GIH samples

    On a mean of more than 43,000, 100 is around a fourth of a percent, so it's not much -- and it may fall within the amount expected from ascertainment bias. It will be much more enlightening to have GIH whole genome data. In the meantime, we can probably confirm the picture from sequence data that indicates Europeans today have the highest degree of Neandertal ancestry.

    East Africa

    The situation within Africa is potentially very complex also. From sequence data, we were able to show that Yoruba (YRI) and Luhya (LWK) population samples have different numbers of shared derived Neandertal SNP alleles. The YRI sample in West Africa has significantly more Neandertal similarity than the LWK sample in East Africa. We speculate that this relation may reflect trans-Saharan gene flow, which has continued throughout history and prehistory.

    Is this a question of east versus west in Africa? That might seem unlikely considering the extent of population movements into northeastern Africa and continued trade along the East African coast throughout historic time.

    The HapMap includes a sample of ethnic Maasai people from Kenya, which allows us to provide another perspective on African variation. Here is the chart, compared to LWK and CEU:

    Comparison of shared Neandertal derived variants in CEU, LWK and MKK samples

    The Maasai have substantially more Neandertal similarity than Luhya, despite their present geographic proximity. In fact, the mean amount of Neandertal similarity in the Maasai is approximately the same as that in the ASW sample, which is composed of African-American ancestry people in the Southwest U.S.:

    Comparison of shared Neandertal derived variants in CEU, LWK and ASW samples

    You see immediately more dispersion in the African-American ancestry sample, because the mixture between African and European ancestors is more variable and much more recent than the events that gave rise to the Neandertal ancestry of Maasai people.

    We speculate that there may have been a substantial amount of interaction in northeast Africa. Obviously this has been true in historic times, but the Maasai suggest that it may go back long before the origins of the present ethnic groups and their movements into this area. The present heterogeneity of Neandertal similarity in these populations suggests a really complex population history. Some of the present Neandertal similarity may derive from ILS within the ancient African population.

    Probing assumptions

    Of course my lab is not the only one presently engaged in comparing the archaic human genomes with recent populations. One of the reasons why we're pursuing a more open science strategy in our reporting is that different groups using different methodologies ought to converge on the same population history. Where we see different results, it's often an indication that the alternative approaches involve substantially different assumptions about the way ancient humans interacted. As we've probed more deeply into the data, we have confronted the reality that long-term population mixture between Neandertal and African ancestral populations is extremely difficult to rule out. Assuming that long-term interactions were impossible because Neandertals and Africans were completely isolated will probably lead to erroneous results. That makes it harder for us to clearly identify gene variants that came from Neandertals within the last hundred thousand years, as opposed to those shared with Neandertals via more ancient gene flow.

    What makes long-term interactions seem more likely is that some of the Neandertal genomes seem to be more closely related to living people than others. More on that in my next installment.


    References

    Synopsis: 
    I examine the pattern of Neandertal ancestry in India and East Africa.
  • India archaeology blog

    Fri, 2011-04-15 14:00 -- John Hawks

    On the topic of the archaeology of South Asia, I want to point readers to Sheila Mishra's blog. She has picked up a number of topics of recent interest, including the earlier Acheulean dates by Pappu and colleagues, the comparison of terminology for Stone Age sites in India versus other regions and the issue of continuity between Acheulean and Middle Paleolithic within South Asia. It's a brief and nicely-referenced source of information and I look forward to seeing more.

  • Older and younger Acheulean in India

    Sun, 2011-03-27 00:37 -- John Hawks

    Shanti Pappu and colleagues [1] report on date estimates resulting from new excavations at the old site of Attarampakkam, India. The news element is that they date an Acheulean occurrence to as old as 1.5-1.6 million years ago. At the oldest, these dates would make the Acheulean in India equal in age to the earliest occurrences in Africa.

    The dates themselves depend on the decay of cosmogenic nuclides in the artifacts themselves. This is a kind of exposure dating -- as the artifacts are exposed to cosmic rays at the Earth's surface, they build up radioactive isotopes of beryllium and aluminum (10Be and 26Al), which have half-lifes of 1.39 million and 717,000 years, respectively. When they are buried deep underground, their exposure to cosmic rays stops, and the radioactive isotopes can only decay. Then the ratio of the two isotopes in the sample reflects the time since deep burial. But like other exposure methods, in practice this depends on a model of exposure time, burial speed, and radioactivity within the soil, which lends substantial uncertainty to the dates. The lower 95% confidence interval of each of the date estimates reported in the paper is still over a million years, leading to the minimal conclusion that the site is that age or older.

    Robin Dennell has written an accompanying short essay that gives a broader view of the Acheulian in South Asia [2]. The essay includes a great paragraph summarizing the now-obsolete idea that Acheulean reached India only a half million years ago:

    How does this new evidence affect our understanding of the South Asian Acheulian? Previously, the general consensus was that the Indian Acheulian was less than 0.6 to 0.5 Ma (5) and was thus much younger than that in the Levant (eastern Mediterranean). There, the earliest dates of 1.4 Ma, from ‘Ubeidiya in Israel, probably indicate a dispersal of hominins from Africa (6). A second influx of African immigrants is indicated by the discovery of African types of cleavers and hand axes at Gesher Benot Ya'aqov (GBY), in Israel, dated to 0.78 Ma (7). This evidence implied that the Acheulian dispersed eastward toward South Asia only several hundred millennia after it first appeared in the Levant. It also implied that the spread of Acheulian bifacial technologies into South Asia was broadly contemporaneous with its first appearance in Europe, where the earliest sites date from ∼0.5 to 0.6 Ma (8). Some have attributed this expansion of the Acheulian into South Asia and Europe to Homo heidelbergensis. This Middle Pleistocene type of hominin is known mostly from Europe, where it was first defined, but is also recognized by some (but not all) researchers at African sites such as Bodo, Ethiopia, and Kabwe, Zambia, and even at some sites in China (9).

    The "Homo heidelbergensis" model is in such utter disarray right now, I'm not sure many paleoanthropologists have realized the full extent of the problems. You should know that I don't believe in Homo heidelbergensis, never have. A couple of months ago, I was discussing some of the issues about mutation rate estimation with a very prominent geneticist, and the conversation turned to Homo heidelbergensis. What a shock the Denisova sequence should have been to those itching to see a H. heidelbergensis incursion into Asia!

    Notice however, the intrinsic nuttiness of archaeological interpretation. Oh, we have the first evidence for Acheulean in India around 600,000 years ago? Well, that's around the same age as the Bodo fossil from Ethiopia! What a coincidence! Maybe this new kind of hominin expanded from Africa and carried the Acheulean to India! And Sima de los Huesos is around 600,000 years old, too -- and there's a handax in the pit! My gosh, we need a name for those hominins!

    Well, the nice thing about a hypothesis built on mere coincidence, is that it only takes one observation to falsify it. Million-year-old handaxes in India ought to do it, and how. That's the message of Dennell's essay, and the subtext of the paper by Pappu and colleagues. What I find interesting is the extent to which the fact was hinted by earlier discoveries in South Asia but hampered by weaknesses in stratigraphic control and dating. From Pappu and colleagues:

    Sparse radiometric ages from sites in India have situated the Acheulian within the Middle Pleistocene, with a few dates suggesting an early Middle to Early Pleistocene age. However, these ages often exceed the limits of confidence of the methods used (2). They include an electron spin resonance (ESR) mean age of 1.27 ± 0.17 Ma, assuming linear U uptake, on two herbivore teeth from Isampur (23); an ESR age of ~0.8 Ma (lacking uncertainty envelopes) on calcrete from the Amarpura formation, Rajasthan (24), which has been correlated with the Acheulian site of Singi Talav (4); dates ranging from ~1.4 to 0.67 Ma for the tephra at Bori (Kukdi river) (25); and paleomagnetic measurements with evidence of reversals at the sites of Bori, Morgaon, Gandhigram, Andora, and Nevasa (26). However, the reliability of these ages has, in each case, been questioned on various grounds (5, 27, 28). Likewise, the age and stratigraphic position of artifacts and faunal remains from the Early Pleistocene Dhansi formation along the river Narmada are yet to be firmly established (29). Based on data from controlled excavations and two independent dating methods, our ages from Attirampakkam show that the Acheulian in India is older than previously thought. Evidence from other sites in South Asia should be reconsidered and redated.

    Much evidence already exists in the South Asian Acheulean that could be more accessible. The Acheulean in the region has been a long block of undifferentiated time, despite some very well-resolved sites. In addition to this much older dating for early Acheulean, India also has some of the youngest Acheulean assemblages anywhere -- for example, Haslam and colleagues [3] earlier this month reported on an Acheulean assemblage from around 130,000 years ago in northeastern India. That's long after the large biface tradition begins to give way to Middle Paleolithic and MSA toolkits in Europe and Africa.

    On the topic of Denisova, Haslam and colleagues were writing before that genome was reported. But they did know about the Neandertal genetic results, including the evidence of Neandertal ancestry within India. Nevertheless, they assert a scenario in which the makers of earlier and later Acheulean in South Asia are the same biological population, without substantial gene flow from regions to the west, including the Neandertals.

    Recent reports of the draft Neanderthal genome suggest that Neanderthals and H. sapiens likely did interbreed successfully soon after the latter had left Africa (Green et al., 2010), with the probable location of such contact to the west of India, in the Middle East. The southern limit of the Neanderthal range is unknown (Dennell and Roebroeks, 2005), but we emphasise that the continuity seen in the Middle Pleistocene South Asian technological record suggests that taxa derived from earlier hominin dispersals, and not Neanderthals, were the creators of the Indian Late Acheulean. Greater biological separation between dispersing humans and resident Indian hominins may have precluded viable genetic mixing (although see Liu et al., 2010 for an alternate view from East Asia), while similarities in certain technological strategies may have rendered cultural exchange a somewhat more likely occurrence.

    Well, the Denisovans didn't have to live in India when the ancestors of Melanesians ran across them and intermarried. But Denisova and the Neandertal genomes now make it very likely that the inhabitants of South Asia were one or the other. And even if South Asians were yet a third group, as yet unattested from genomes, it is no longer credible to suppose that they were isolated from Europe or Africa for a million years previous. The tools just don't have that much to do with the populations.


    References

    Synopsis: 
    Long known from India, new papers are adding detail to the temporal extent of the Acheulean.
  • Ardipithecus bloggingheads

    Sat, 2009-10-17 10:59 -- John Hawks

    Today, Science Saturday on bloggingheads.tv is a conversation between Razib Khan and me. We had a fun conversation about Ardipithecus and the recent study of the population genetics of India.

    Here's a non-embedded link to the bloggingheads site

    Razib pointed out the similarity of eyeglasses in our last diavlog.

    Obviously, we've taken the pills that make us smarter:

    Scientific American cover with glasses

    I think we did pretty well staying on topic in this one, and getting into some paleoanthropology deeper than your average radio interview.

    If you're finding my blog from the bloggingheads site, please look around! My Ardipithecus topic link. That goes way back, long before the current discoveries, and there are some interesting posts in there, from today's perspective.

    I especially like the two posts about bushes, ladders, and whether Ardipithecus is our direct ancestor or not: "Spacecraft all over the Pliocene", and "A ladder not a bush?"

    Oh, and I almost forgot this 2006 post linking to Tim White's kvetching over the Orrorin femur: "Orrorin opera." If you want a background picture of the competitiveness of research in early hominin field paleontology, that's a case worth examining -- or for a broader view, Ann Gibbons' book, "The First Human," has many stories as well.

  • SNPtastic India

    Wed, 2009-09-23 14:49 -- John Hawks

    The cover story in Nature this week is a paper about the population history of India, from David Reich's lab. It's an important contribution to our knowledge of human genetic variation, and provides a very interesting set of data for further investigation of modern human origins, the dispersal of agriculture into the subcontinent, and the history of more recent Indian populations.

    Here's the abstract:

    India has been underrepresented in genome-wide surveys of human variation. We analyse 25 diverse groups in India to provide strong evidence for two ancient populations, genetically divergent, that are ancestral to most Indians today. One, the 'Ancestral North Indians' (ANI), is genetically close to Middle Easterners, Central Asians, and Europeans, whereas the other, the 'Ancestral South Indians' (ASI), is as distinct from ANI and East Asians as they are from each other. By introducing methods that can estimate ancestry without accurate ancestral populations, we show that ANI ancestry ranges from 39–71% in most Indian groups, and is higher in traditionally upper caste and Indo-European speakers. Groups with only ASI ancestry may no longer exist in mainland India. However, the indigenous Andaman Islanders are unique in being ASI-related groups without ANI ancestry. Allele frequency differences between groups in India are larger than in Europe, reflecting strong founder effects whose signatures have been maintained for thousands of years owing to endogamy. We therefore predict that there will be an excess of recessive diseases in India, which should be possible to screen and map genetically.

    The number of individuals is not huge for the purposes of population genetic analysis -- only 132 people from 25 groups -- but it is very significant in terms of recent samples. By comparison, it is around double the number of effective individuals in any of the HapMap v.1 populations, genotyped at more than 560,000 SNPs.

    The results of the study are basic population genetic issues, including the degree of endogamy, the pattern of regional differentiation, the likelihood of discovering new recessive genetic disorders by additional sampling. Some notes:

    Population mixture. The authors propose that today's groups descend in varying proportions from two ancient (and no longer existing) populations, which they call "ancestral North Indian" and "ancestral South Indian".

    I'm always skeptical of mixture models, especially when the putative source populations no longer exist. There are just too many ways that structured migration or dispersal can lead to the appearance of mixture. People once thought of "Alpines" as a mixture of pure Nordic and Mediterranean elements, after all-- and that was just because their heads were mesocephalic.

    Still, with a half-million SNPs, it's possible to do a better job testing the hypothesis of mixture versus structured migration. The authors in this paper didn't -- they applied a simplified "3 Population Test" that compares the empirical allele frequencies to proportions expected under only two scenarios: simple mixture or complete isolation. It seems to me that the null should be simple isolation by distance, which would give the same result as "mixture" according to their test. If you really want to look for population mixture, you need to involve the dimension of time, for example, by demonstrating the antiquity of haplotypes that have mixed together.

    So I don't accept this ancestral division, certainly not at face value. It does seem plausible that West Asian (and thereby European-related) genes have introgressed into India over time, perhaps in association with the growth of high-density agricultural populations. Maybe some of this gene flow occurred under the influence of positive selection, but processes of elite dominance and differential growth may have been sufficient.

    Regional differences. The results show a greater degree of regional genetic differentiation in India than has been found for continental Europe. Still, with an FST of only 0.01, we're not talking about major population splits here. With that number, the subcontinent is closer to panmixia than one might expect for a region its size. The authors suggest that founder effects explain the regional differentiation:

    We propose that the high FST among Indian groups could be explained if many groups were founded by a few individuals, followed by limited gene flow. This hypothesis predicts that within groups, pairs of individuals will tend to have substantial stretches of the genome in which they share at least one allele at each SNP. We find signals of excess allele sharing in many groups (Supplementary Fig. 2), which as expected tend to occur in the groups that have the highest FST values from all others (P = 0.002 for a correlation). To estimate the age of founder events, we measured the genetic distance scale over which allele-sharing decays, and verified the robustness of our procedure by simulation (Supplementary Fig. 3). Six Indo-European- and Dravidian-speaking groups have evidence of founder events dating to more than 30 generations ago (Supplementary Fig. 2), including the Vysya at more than 100 generations ago (Fig. 2). Strong endogamy must have applied since then (average gene flow less than 1 in 30 per generation) to prevent the genetic signatures of founder events from being erased by gene flow.

    I don't think that explanation works. With those times in generations, we're talking about events within the last 600-2000 years. Since all these calculations are done on the whole dataset assuming complete neutrality, I think we should look more closely at the distribution of LD across loci. It seems likely that some of the high-LD loci that appear to point to founder effects will actually be found to be selected.

    Relationships of Indian to non-Indian populations. One of the real problems of assuming a tree with no migration is that it leads to statements like this:

    [T]he ANI [ancestral North Indian] and CEU [HapMap European sample] form a clade, and further analysis shows that the Adygei, a Caucasian group, are an outgroup (Supplementary Note 4). Many Indian and European groups speak Indo-European languages, whereas the Adygei speak a Northwest Caucasian language. It is tempting to assume that the population ancestral to ANI and CEU spoke 'Proto-Indo-European', which has been reconstructed as ancestral to both Sanskrit and European languages, although we cannot be certain without a date for ANI–ASI mixture.

    Some of the common ancestors of some living Europeans and some Indians were probably speakers of proto-Indo-European speakers. But we can easily refute the hypothesis that all of the common ancestors did so -- some of those common ancestors lived more than 40,000 years ago, as is well-known from the mtDNA chronology. The tree model with complete isolation does not explain the data. So as simple as it is -- and as well-used by Cavalli-Sforza and others -- it would be better to use a more accurate model.

    UPDATE (2009-09-24): Gene Expression has a full review of the paper.

    UPDATE (2009-09-27): Very interesting angle by Suvrat Kher at Reporting on a Revolution:

    The Indian Press has made a hash of the finding....

    But I can't blame the press entirely. The scientists who gave interviews to the press didn't mention this. They wimped out on reporting this potential inflammatory and politically incorrect finding. This is just poor and irresponsible science outreach on part of the scientists. How can you ignore a finding that is staring out at you from the very paper you are talking about? The press may be guilty of not digging in but it was just reporting what the scientists told them.

    References:

    Reich D, Thangaraj K, Patterson N, Price AL, Singh L. 2009. Reconstructing Indian population history. Nature 461:489-494. doi:10.1038/nature08365

  • Could genetic drift really break your heart?

    Mon, 2009-01-19 00:40 -- John Hawks

    Are these people crazy?

    The combination of such a large risk with such a high frequency is, fortunately, unique. "How can such a harmful mutation be so common?" asks Chris Tyler-Smith from The Wellcome Trust Sanger Institute, Hinxton, UK. "We might expect such a deleterious change to have 'died out'.

    "We think that the mutation arose around 30,000 years ago in India, and has been able to spread because its effects usually develop only after people have had their children. A case of chance genetic drift: simply terribly bad luck for the carriers."

    This is a 25-bp deletion in a muscle protein gene, MYBPC3. The current allele frequency in India is estimated to be 4 percent; it is estimated to be carried by 60 million people. The paper suggests that it originated 30,000 years ago. Carriers of the gene have a massive increase in their chance of cardiomyopathy.

    Here's the relevant passage from the paper:

    The presence of a disease-associated variant at substantial frequency raises an evolutionary question: if it is disadvantageous, how did it become so common? In principle, it could be evolutionarily neutral, manifesting its disadvantages only late in life; alternatively, its disadvantages could be outweighed by advantages early in life, or in a different environment, so that it could have been positively selected. To address this question, we examined the haplotype structure surrounding the deletion. Using five short tandem repeat (STR) markers, spanning ca. 3.4 Mb surrounding the deletion in 287 heterozygous individuals, we found similar high degrees of variation in the inferred haplotypes from chromosomes with and without the deletion (Supplementary Fig. 7 and Supplementary Table 6 online). We then used allele-specific amplification to resequence ca. 10-kb haplotypes centered on the 25-bp deletion from nine heterozygous individuals (Supplementary Tables 7 and 8 online). The chromosomes carrying the 25-bp deletion showed five closely related haplotypes (Supplementary Fig. 8 online). After excluding variants likely to have arisen by recombination, we estimated a time to most recent common ancestry (TMRCA) of ca. 33 ± 23 thousand years for the deletion haplotypes (Supplementary Methods). This time slightly postdates the initial peopling of the subcontinent 30,000–50,000 years ago and together with its restricted geographical distribution suggests that the deletion did not arrive with the first modern human settlers from Africa [more than] 50,000 years ago, but arose subsequently within the subcontinent. Its occurrence in two populations from Southeast Asia can be explained by recent gene flow from India (Supplementary Note online). Collectively, these observations provide no evidence for rapid spread of a recent founder haplotype or any departure from neutral evolution (Dhandapany et al. 2009:4).

    The issue is not really whether a gene could go from 1 copy to 4 percent in 1200 generations by chance. That wouldn't be so terribly unlikely in Pleistocene humans -- in fact, the mean time for a mutation to go from 1 copy to 4 percent by drift in a population of effective size 10,000 individuals is not 30,000 years, but only around 20,000 years. On the other hand, mtDNA variation today suggests that South Asia experienced early and rapid population growth -- so we're not likely talking about a population of 10,000, but more like a minimum of 100,000 effective individuals through the past 30,000 years at least. It would take genetic drift at least 10 times longer to accomplish the requisite frequency change given that demographic history. Still, a single allele at a single gene locus might be exceptional.

    But that scenario, however unlikely, is simply not the situation we have here. Here we have a deletion that must have some disadvantage, because it gives people a fatal disease. This disadvantage is apparently dominant in effect, based on the case-control study. Yet the deletion has managed to persist within the large South Asian populations of the last 10,000 years so that today it is still around 4 percent.

    People mainly die of cardiac problems after age 40. But human reproductive lives aren't over until they're done investing in their children. Further, a weakened heart may reduce work potential or health even if it kills slowly. The fitness cost of this deletion is smaller than if it gave people a chance at a fatal disease when they are 17, but a smaller fitness cost is still a fitness cost. In a large population, that small fitness cost is going to whittle away the frequency of the allele over time.

    A thousand generations is a lot of potential whittling. Using some quick calculations, it looks like selection against the deletion as low as 0.001 to 0.0015 in heterozygotes should have been enough to cut the frequency down to around 1 percent, from an initial value of 4 percent. So even if drift increased the deletion early after its origin, it ought to be much rarer today. Meanwhile, drift looks even more unlikely, since the chances of a mutation growing from 1 copy to 4 percent against such selection are nil.

    Did this deletion have a fitness cost as high as one in a thousand? It increases cardiomyopathy by 5-fold or more compared to the wild type. So it seems very plausible. But really, we don't have any good estimates of the fitness costs of chronic diseases in pre-industrial populations.

    If the deletion was favored by some selection, that would probably be antagonistic, that is, acting against the fitness cost of the deletion late in life. The authors briefly investigated this hypothesis, as described above. They found no evidence for a recent expansion of a single haplotype around the deletion. That means that if there was strong selection favoring this deletion, it must have happened early after its origin and then petered out. If the expansion had been late in South Asian history, it would show more LD around it, and most of the deletion-carrying chromosomes would share a single long-range haplotype. So this deletion has not been increasing rapidly in the past few thousand years.

    I would hypothesize that the disadvantages of the deletion have actually increased over time. The average lifespan increased into the Upper Paleolithic and probably later as well. Meanwhile, as the population grew, larger completed family sizes became more important to fitness. As people became more sedentary, the accumulation and inheritance of possessions and land became an important means of investing in children. The increasing importance of later survival and investment in children should have raised the fitness cost of chronic disease. That would explain a pattern of evolution in which this deletion increased in frequency early in its history, but later remained static or declined.

    So, I don't suppose I can say people are crazy for thinking genetic drift could explain this deletion's current high frequency. But considering the powerful effect of weak selection over the many generations involved here, and the very large size of the South Asian population during most of that time, genetic drift seems pretty unlikely.

    References:

    Dhandapany PS and 23 others. 2009. A common MYBPC3 (cardiac myosin binding protein C) variant associated with cardiomyopathies in South Asia. Nat Genet (online early) doi:10.1038/ng.309

Subscribe to India

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.