john hawks weblog

paleoanthropology, genetics and evolution

population structure

  • Understanding population differentiation

    Mon, 2011-11-28 00:48 -- John Hawks
    Synopsis: 
    Devising a story problem to illustrate Fst as a measure of population differentiation

    This lab has a take-home assignment, which is worth three points when you turn it in at next week's lab section.

    The genetic differentiation among populations is very important to understanding human diversity and its historical origins. The basic measurement of population differentiation is FST. You will be designing and providing the solution to a problem involving FST.

    1. Use the "Measuring population subdivision" exercise as an example to follow.
    2. You can also refer to the "Measuring differences between populations" text.
    3. Design a story problem with three populations.
    4. Your problem should involve a single gene locus, with two alleles. Each of the three populations should have a frequency for each allele (remember, the two will add to 100%).
    5. Show how FST should be calculated in your problem, with your allele frequencies.
    6. Use 1-2 sentences to explain what aspect of population differentiation your problem helps to illustrate. For example, does it show an example with one extremely different population? With very similar populations?

    Bring your story problem back to lab next week.

    Study terms: 
  • Measuring differences between populations

    Mon, 2011-11-28 00:28 -- John Hawks
    Synopsis: 
    Fst and its relationship to the number of migrants among populations

    When individuals mate locally, different populations tend to diverge from each other in the frequencies of their alleles. Genetic differences between populations are therefore differences in allele frequencies — and these differences in allele frequencies may have consequences in terms of phenotypic or adaptive differences. But every difference in allele frequencies is not equal. When populations encompass great genetic variation, large differences in allele frequencies still leave much overlap — the individuals in the different populations may not be very different from each other. In contrast, slight differences in allele frequencies might be very important between populations that are not variable, because individuals in these populations might vary extensively as a result.

    Geneticists measure the differences between populations by comparing the difference in allele frequencies to the amount of variation within the populations. When people mate with their neighbors, they tend to become more inbred — that is, they are more likely to mate with distant relatives. This means that people will tend to have greater genetic similarity than they would have if they mated equally with people who were born across the world.

    Increase in the level of inbreeding due to low gene flow is often used as a statistic, called FST, relating the increase in inbreeding in the subpopulation to that in the total population. When gene flow is high, FST is low, and vice versa. FST represents the proportion of differences between two individuals taken randomly from two subpopulations that are due to the differences in allele frequency between subpopulations alone. Other differences between the individuals are those that could be found between individuals taken randomly from the same subpopulation. FST therefore provides a comparison between the between-subpopulation and within-subpopulation components of genetic variation.

    The relationship of FST and migration between populations. When the forces causing genetic divergence between subpopulations are balanced by gene flow, the reduction of heterozygosity within subpopulations is a function of the number of people who move between subpopulations each generation, expressed by FST = 1 / (1 + 4Nm).

    Comparing human populations taken from different continents, FST is between 0.1 and 0.15, meaning that only between 10 and 15 percent of genetic differences between individuals are attributable to their geographic origins. This difference is relatively small compared to many other large mammal species spread among different continents, such as wolves or bears [1]. This level of similarity among human populations means that they have shared high levels of gene flow in the past. However, the meaning of these numbers depends on the relationship of gene flow and the other evolutionary forces.

    Because they are opposite in direction, gene flow and genetic drift will reach an equilibrium over time. At equilibrium, FST = 1 / (1 + 4Nm), where Nm is the number of migrants moving into each subpopulation. Neglecting the forces of selection and mutation, then, an FST of 0.1 for human continental populations means an average of 2 migrants have been entering each continent per generation for a long period of time. Many more people are moving from place to place today than two, so one prediction of this relationship is that the level of genetic differences among continents will in the future decrease. In the face of this gene flow, it is likely that most of the differences in allele frequencies that persist in humans are in fact affected by selection. Indeed many of the most obvious differences, related to physical appearances in different places, appear to bear this out.


    References

    1. Templeton AR. Human races: a genetic and evolutionary perspective. American Anthropologist. 1998;100:632–650.
    Study questions: 
    1. If the present FST among human continental groups is consistent with two migrants among populations each generation, what do you predict will happen to human FST in the future?
    2. It is remarkable that genetic drift and migration balance each other at a given number of actual individuals migrating, so that large and small populations are held in equilibrium by the same number of migrants. Are there any differences between large and small populations?
  • The risk gradient

    Wed, 2011-11-09 23:58 -- John Hawks

    Ann Gibbons reports [1] from the International Congress of Human Genetics, on papers that examine GWAS risk alleles for type 2 diabetes: "Diabetes Genes Decline Out of Africa" (paywall).

    At the poster session, Stanford graduate student Erik Corona stood in front of a Google Earth map of the world that he finds surprising. On this map he had plotted the frequency of 12 gene variants known to be associated with type 2 diabetes in 51 populations from Australia to Zaire. It shows “a clear gradient of red to green from west to east, from Africa to Asia,” Corona says (see map). “Something strange is going on with type 2 diabetes.”

    This is of course a challenging problem because risk alleles identified in one population may not replicate in other populations. The most well-known example is ApoE4, strongly associated with Alzheimer's Disease in Europeans, but not in Africans. More generally, looking at a set of risk variants that are identified in one population introduces an ascertainment bias that constrains their likely frequencies in other populations. An allele is more likely to yield a statistically significant association with a trait if the allele is not too rare. If we take many alleles associated with a trait, we're likely to see some gradient across populations due to this bias alone.

    Hidden ascertainment bias is a problem we run up against quite a lot. It may not apply in this case, depending on where the risk alleles were identified, in particular since many risk alleles for type 2 diabetes appear to be linked to recent positive selection (explaining why I got interested).


    References

    1. Gibbons A. Diabetes Genes Decline Out of Africa. Science. 2011;334(6056):583 - 583.
  • Mailbag: Spuds and mutts

    Wed, 2011-11-09 00:28 -- John Hawks

    Re: "How widespread is Denisovan ancestry today?" and "Potato sack race":

    Question about Denisovan DNA. Once introduced into a population, beginning many millenia ago, what keeps it from being in the DNA of everybody in the area? I exclude new arrivals, but what kept the Denisovan DNA from being spread to the homeland of the new arrivals what with the traveling salesmen, the refugees from tribal pushing and shoving, armies marching, cross marching and countermarching? It isn't as if Denisovan genes cause assortative mating by making the possessor either a hell of a catch or a last-man-on-earth scenario. Is it? Selective survival against diseases that come and go, while not so good in between, a la sickle cell? Is the blender model of human reproduction faulty somehow.

    As to potatoes, I'd heard that one advantage is that armies, used to pasturing their horses in the grain of the enemy's peasants' fields, had to move on more quickly when the supply officers gave up trying to get their foraging parties to dig potatoes.

    If, as Keegan hypothesizes, the ration was one pound of meat and two of bread (requiring two pounds of firewood) per man per day, an army of 30,000 ate out a location pretty quickly. If spuds were the local staple, they'd have to move. You just can't feed 30,000 guests who arrived unannounced by digging potatos. Not fast enough. Do horses like potatos? So, the army moves on--win--and the peasants get out the potato forks and do okay, more or less. Win.

    Re: potatoes -- I think you've pointed to an important factor -- also, they can't be burned when the army retreats. The sheer productivity of tubers really does outweigh the available grain crops in Northern Europe.

    Re: Denisovan DNA -- The genes should have diffused into other populations, all things being equal. That they did not do so is a pretty strong indication that SE Asia today shares little genetically with SE Asia 30,000 years ago. There must have been a massive influx of people who lacked Denisovan ancestry, well after the initial mixture with Denisovans happened and Denisovans themselves left the scene.

  • Diversity doesn't point reliably to source populations

    Mon, 2011-11-07 23:08 -- John Hawks

    Worth amplifying from Dienekes' Anthropology Blog, "Y chromosomes of the Bahamas":

    I like the line about there being substantially more Y-STR variation in E1b1a7a-U174 and E1b1ba8-U175 in the Bahamas than any African collection. I have argued for years that the central assumption of phylogeography, that the location of highest Y-STR diversity is not necessarily the point of origin of a haplogroup, since Y-STR diversity can be affected both by antiquity and by admixture. Nonetheless, I keep reading papers where tiny differences in Y-STR variation, even if we forget about the noisiness of Y-STRs themselves, are taken as evidence of ancient migrations. Thankfully, the time when Y-STRs were used to infer ancient migrations is over, and the huge collection of Y-STR haplotypes amassed by population geneticists, forensic specialists, and genealogists alike can be put to uses for which it is more amenable.

    Once we have population mixture, hypotheses about phylogeography become much harder to test. A population model with mixture has many ways of generating the same pattern of relative diversity among populations.

  • Braiding Denisovans into our ancestry

    Fri, 2011-11-04 10:39 -- John Hawks

    Dalton Luther reflects on the Denisovan admixture paper [1] that I wrote about earlier this week ("How widespread is Denisovan ancestry today?"), by referring to John Moore's work on ethnogenesis [2].

    Getting back to the original quote about Denisovan legacies, just because the Denisovans aren’t “around” anymore, doesn’t mean they’re not “around.” An ancient population is present even though in a very different form. Using the braided river metaphor, the name Denisovan refers to the contents of a particular stream that mixed back into another stream, which grew larger, amplifying its original contents.

    What seems to be the challenging concept to some geneticists is that some people today have that legacy and others don't. But it's not at all unusual for that to be true of families, kindreds, cultural traits, or even languages. So why should it be unusual for populations?


    References

    1. Skoglund P, Jakobsson M. Archaic human ancestry in East Asia. Proceedings of the National Academy of Sciences, U. S. A. 2011;108(45):18301-18306.
    2. Moore JH. Putting anthropology back together again: the ethnogenetic critique of cladistic theory. American Anthropologist. 1994;96:925–948.
  • How widespread is Denisovan ancestry today?

    Tue, 2011-11-01 00:32 -- John Hawks

    Last month, David Reich and colleagues [1] reported on estimates of Denisovan ancestry for island and mainland Asian populations. Their most memorable conclusion was that they could find no substantial sign of Denisovan ancestry anywhere on the Asian mainland, or indeed on any island that had ever been connected by land to Asia.

    The distribution was stark, as illustrated by the map from the paper:

    I wrote about the paper when it was released ("Denisovan DNA in the islands, and an Australian genome"), noting:

    Notice the apparent lack of Denisovan ancestry in anyone who lives anywhere that was once connected by land with mainland Asia. I say "apparent" deliberately: Abi-Rached and colleagues reported last month on the widespread distribution of Denisovan HLA types among today's Asian populations, and those may well be products of Denisovan genes that were later selected. I've already identified a handful of other loci that seem to reflect Denisovan ancestry in mainland Asian people. According to the comparisons by Reich and colleagues, such loci must be exceptions.

    Abi-Rached and colleagues [2] had argued that HLA alleles found in the Denisovan genome are presently common in some parts of Asia, and likely reflect local adaptive introgression. Substantial introgression of a small number of genes would not be enough to create a strong genome-wide appearance of Denisovan ancestry. Still, it was a little odd that the first genes anybody looked closely at would provide strong evidence of introgression.

    Now, Pontus Skoglund and Mattias Jakobsson [3] say that Denisovan ancestry is widespread across China and Southeast Asia.

    That conclusion contradicts Reich and colleagues, so why do the studies come to such different results?

    Skoglund and Jakobsson suggest that they have succeeded in finding introgression where others failed because their model accounts for ascertainment bias in the available datasets. SNP data come from genotyping chips, which have been designed using known polymorphisms. Five years ago, we knew much more about polymorphisms in Europe than other parts of the world, and so the HGDP, and HapMap to a lesser extent, do a good job of sampling rare alleles in Europe but miss many rare alleles in Africa and other populations. This is the ascertainment bias.

    Some of the most obvious signs of introgression today are cases where rare alleles are shared with an archaic genome. If ascertainment bias causes you to miss the rare alleles, you'll miss the introgression.

    But that explanation isn't really sufficient to explain the differences between these papers. For one thing, Reich and colleagues [1] also worked hard to account for ascertainment biases in their SNP samples. For another, whole genome comparisons between East Asian samples and the Denisova genome have not yielded evidence of Denisovan ancestry, even though whole genomes have no ascertainment bias. The number of whole genomes so far compared is very small, and so the statistical ability to detect introgression is lower, but Skoglund and Jakobsson actually replicate that null result in their current paper.

    Probably most important, it's not clear that Skoglund and Jakobsson's result can actually be explained by rare alleles. Here is Figure 1e from their paper:

    Figure 1e from Skoglund and Jakobsson (2011). Original caption: Interpolated spatial distribution of the frequency of Denisova alleles at SNPs where Denisova is different from chimpanzee and Neandertal. Sample localities are indicated with rectangles.

    This map represents a clever comparison. It is a heat map of the mean local frequency of the subset of alleles that are present in Denisova but absent from chimpanzees and Neandertals. These are presumptively derived alleles relative to the chimpanzee. The SNPs here are all known to vary in human populations, because they are all included in the HGDP sample. So the map does not represent all the Denisova derived mutations in humans today, only a particular subset that is especially likely to be informative.

    Given that the sites have been picked in a special way, we need to examine carefully how strong the pattern really is. Notice the scale of the heat map. The difference between the orange area in south China, from the green area in north China, is around 0.001, or a tenth of a percent in mean frequency. The actual values are reported in the online supplement, in Table S3. An exception of Yizu in south China who have around 0.006 more than their neighbors. The Yizu sample includes only 10 individuals (9 males, 1 female). The paper does not report the number of SNPs included in this comparison, but it must be a very small set relative to the total, because only a small fraction of human SNPs are known to be derived in Denisova and ancestral in Neandertals.

    With this very small difference in frequencies, I would not rule out the hypothesis that the zone of high Denisova derived frequencies in south China is caused entirely by frequency enrichment of a small number of loci. A handful of genes like the HLA loci observed by Abi-Rached and colleagues might be enough to create this very slight elevation in the average. Hence, the best case is that the data here simply provide greater sensitivity to small amounts of introgression. The worst case is that the pattern may be dominated by the Yizu sample, which is really too small to carry this kind of load.

    The strongest evidence presented in the paper is a comparison of north and south East Asian regions directly. Although the comparison of south China against other regions of the world (Africa, Europe) does not yield significant evidence of Denisovan similarity in this paper, south China differs from north China in essentially the same way that the Oceanian people do from other regions. And the Oceanian populations (here, Papua New Guinea and Bougainville) differ from other regions because of their Denisovan ancestry. So Skoglund and Jakobsson infer that the north/south comparison reflects Denisovan ancestry as well.

    I think this comparison is sound, and the question is, how much introgression would this pattern require? The paper answers that question in this way:

    Quantitative estimation of the precise fraction of Denisova-related ancestry in Southeast Asian populations based on genotype data are unfortunately sensitive to ascertainment bias and genetic drift, and such estimates will require genome sequence data that are currently unavailable. However, both the PCA results (Fig. 1B) and the approximately six times lower absolute values of the D statistic in tests between Northeast Asians and Southeast Asians compared with tests between Northeast Asians and Oceanians (Table S4) indicate a relatively low fraction of Denisova-related ancestry. Thus, the fraction is likely to be smaller than both the ~5% fraction of Denisova-related ancestry present in Oceanians and the ~2.5% fraction of Neandertal ancestry present in non-Africans (23, 24), perhaps around 1%.

    One percent is an amount that whole genome comparisons at present do not rule out, and I think it's a reasonable guess. I would not have thought we could rule out a one percent contribution from other, non-Denisovan archaic people, for example.

    We aren't very far from a more definitive answer of this question, as the data continue to accumulate every day. What I find interesting is the way that models can generate these 1% differences in ancestry proportions, depending on sampling and the pattern of migration assumed to have happened in the past. Two estimates that differ by less than a percent are not really different. This paper provides the suggestion of a more widespread Denisovan legacy, and I accept that as a possibility.

    I should mention: less than one percent of a half billion people is still a very large number, added to five percent of the indigenous population of New Guinea and Australia, and smaller fractions of other island populations. The total amount of Denisovan legacy present in living people probably exceeds the population of Earth at the time the Denisovans lived.


    References

    Synopsis: 
    A new paper contradicts earlier work, by suggesting a widespead Denisovan legacy in south China
  • Y chronology awry

    Wed, 2011-08-24 09:57 -- John Hawks

    Dienekes links to and discusses a current paper by George Busby and colleagues [1] on the Y chromosome chronology for the settlement of Europe: "Back to the drawing board for R-M269 (Busby et al. 2011)." The main idea is that microsatellite loci on the Y chromosome have made up the majority of our information about biogeography using this marker, but the rate of mutational changes of these loci has been badly misapplied:

    A bad clock is not useless: it gives you some information about time. Moreover, you can often use several to iron out the inaccuracy of any single one of them.

    Unfortunately, better estimation through averaging of bad estimators works only in one case: when the estimators are unbiased.

    The inclusion of some fast-mutating STR loci tends to make all estimates too young. The paper finds that this problem is general, affecting most commonly-used datasets.

    Our analysis confirms that this phenomenon is not specific to the R-M269 haplogroup nor to methods using ASD. Figure 4b shows that STRs with high D produce larger estimates of T. What is clear is that estimates of T implicitly depend on the STRs that are selected to make this inference. Using BATWING on an HGDP population for which 65 Y-STRs are available, we have shown that the median estimate of TMRCA can differ by over five times when STRs are selected on the basis of the expected duration of linearity (electronic supplementary material, figure S4). While researchers take into account STR mutation rates when estimating divergence time with ASD, commonly used STRs do not have the specific attributes that allow linearity to be assumed further into the past. The majority of haplogroup dates based on such sets of STRs may therefore have been systematically underestimated.

    One weakness of the study is that its reliance on geographic patterns of the haplotypes depends on the assumption that they have evolved neutrally relative to each other. Selection might radically affect this pattern.


    References

  • Floating on the data

    Mon, 2011-08-22 12:19 -- John Hawks

    Technology Review reports on a recent conference trying to spread data mining techniques. The point of departure is the growth of electronic sensor networks in industry and online social media information: "The New Big Data".

    People have been working with graphs of data for hundreds of years, but the graphs now being plotted from social networks or sensor networks are of an unprecedented scale, Apte says. "These are gigantic graphs," he says. "You're talking about millions of nodes and tens of millions of links."

    Dealing with graphs of that size and scope, and applying modern analytic tools to them, calls for better algorithms and other innovations.

    I'm dealing here with genetic data networks, which are becoming rapidly denser and we're beginning to apply these kinds of network methods to understand them. Once you begin to pass beyond the analysis of a single locus, and spread the data across the whole genome, it becomes necessary to go beyond a single tree, to understand the relationships (and commonalities) among genealogical networks that connect people with each other. In some ways, this shares more with epidemiological modeling than with traditional genetics.

  • Mailbag: Where did Neandertals come from?

    Thu, 2011-08-18 17:51 -- John Hawks

    Dr. Hawks,
    I greatly enjoyed your course on the rise of humans I purchased through the Teaching Company.
    I could not find the answer to this question: if humans migrated out of Africa and met Neandertals and interbred, where did the Neandertals originally come from?
    I am sure you are a busy man but I find this puzzling me. Thank you in advance for answering this question.

    Thank you so much for your kind words!

    We don't strictly know where Neandertals originated. We do know that their population and the African population began to differentiate sometime before 250,000 years ago. I think it is likely that the ancestors of Neandertals migrated out of Africa at that time and began to evolve within western Eurasia, later to come into contact with Africans again. But there are fossil humans who seem to have some Neandertal-like features in Europe far earlier, as early as 600,000 years ago. One possibility is that the ancestors of Africans and Neandertals actually lived outside of Africa, and Neandertals stayed there as other people moved into Africa. Another is that a population representing most of the ancestry of Neandertals left Africa more recently, maybe within the last 150,000 years, and mixed with an earlier European population. It is even possible that the Neandertal and African ancestors lived long-term in Europe and Africa, respectively, with a high rate of gene flow between them for their entire history.

    At this level things seem uncertain and will remain that way until we have a better fossil record in Africa. It's an exciting time for those of us who study that time period!

Pages

Subscribe to population structure

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.