genetic drift

Larry Moran writes, "Are you a descendant of Charlemagne?"

Thousands of amateur genealogists have contributed to a huge database of family relationships, including genetic analyses. What does this teach us about human populations and evolution?

It touches on some issues covered in more detail in Steve Olson's book, Mapping Human History: Genes, Race, and Our Common Origins, which remains surprisingly relevant today despite the explosion in genetic data. That's because Olson did a good job on the population genetics side.

Oh, and yes I am a descendant of Charlemagne. Woo-hoo!

The Finnish line

A new paper by Jukka Palo and colleagues investigates the population history of Finland:

The Finnish population in Northern Europe has been a target of extensive genetic studies during the last decades. The population is considered as a homogeneous isolate, well suited for gene mapping studies because of its reduced diversity and homogeneity. However, several studies have shown substantial differences between the eastern and western parts of the country, especially in the male-mediated Y chromosome. This divergence is evident in non-neutral genetic variation also and it is usually explained to stem from founder effects occurring in the settlement of eastern Finland as late as in the 16th century. Here, we have reassessed this population historical scenario using Y-chromosomal, mitochondrial and autosomal markers and geographical sampling covering entire Finland. The obtained results suggest substantial Scandinavian gene flow into south-western, but not into the eastern, Finland. Male-biased Scandinavian gene flow into the south-western parts of the country would plausibly explain the large inter-regional differences observed in the Y-chromosome, and the relative homogeneity in the mitochondrial and autosomal data. On the basis of these results, we suggest that the expression of 'Finnish Disease Heritage' illnesses, more common in the eastern/north-eastern Finland, stems from long-term drift, rather than from relatively recent founder effects.

So you've got a cline of genetic variation. How do you explain it? This paper reminds us that for a single locus there are always multiple explanations: asymmetric migration, natural selection, founder effect and population growth are the simple unicausal scenarios. Considering a cline by itself, there's no reason to prefer any of these except for assumptions that come from outside that gene -- maybe you know something about the history, maybe the gene's function gives you a clue.

If you're going to test these hypotheses with genes alone, then you need to sample multiple loci, and you need to make an adequate spatial sampling of the population. And when you do, sometimes the evidence points in a different way than you had expected.

References:

Palo JU, Ulmanen I, Lukka M, Ellonen P, Sajantila A. 2009. Genetic markers and population history: Finland revisited. Eur J Hum Genet 17:1336-1346. doi:10.1038/ejhg.2009.53

"The worm in the fruit of the mitochondrial DNA tree"

François Balloux (2009) has a polemic in the online access area of Heredity presenting references about mtDNA selection, and arguing that the use of this single genetic marker is no longer warranted without support from other loci.

Yay! I've been saying that both here, and in peer-reviewed articles, for several years. I think serious workers know that one gene is not enough; two genes (mtDNA and Y chromosome, for example) aren't enough -- we have to integrate information across every possible source, genetic, skeletal, and anthropological, to really test hypotheses about the past.

Still, an industry of mtDNA sequencing has grown up, reviewing each others' grants and papers, and shutting down any discussion of adaptive changes. Balloux's commentary addresses this problem -- I'm going to quote the same paragraph as Dienekes:

Let us assume I gave a seminar. I would tell the audience about my latest results on the population history of the pigmy shrew. My findings would be based on a stretch of DNA comprising several metabolic genes, showing no signs of genetic recombination. Armed with sequences from a large number of individuals sampled over a broad geographical area, I would make some inference on the colonization routes and times. To make life easier, I would restrict my analysis to the mutations I liked best, with nice names having been given to related sequences, rather than relying on dull mathematical quantities. As I reach one of the key conclusions of the lecture, which would go as follows: 'It is obvious from the distribution of haplotypes Amanda, Eugenie* and Hector_2 that the Outer Hebrides were colonised about 50,000 years ago, this was followed by considerable population fluctuations, a bottleneck during the last Ice Age, a swift recovery and a dramatic recent expansion over the last 200 years and...'. Imagine that, at that climactic stage I was interrupted by someone in the audience. The impertinent would say, 'Sir, can I just ask you whether this confidence in your conclusions may not be misplaced; your analysis is based on a single genetic marker, which comprises genes with a central role in metabolism and is thus likely to have been affected by natural selection'. An awkward silence may ensue, as I would find it difficult to dismiss this criticism easily.

Well, let me tell you, I've been in dozens of audiences, and have raised that exact point. Here is a sample of the bogus responses I've gotten to this question:

Bogus answer 1: There are no functional differences between humans and chimpanzees in the mtDNA, so it can't have been selected during human evolution. False, false false!

Bogus answer 2: Metabolic processes are highly conserved, and humans couldn't have changed much. Hello? Have you noticed that your breakfast didn't exist in the Paleolithic?

Bogus answer 3: But the pattern of variation can be equally explained by a bottleneck. Some aspects can, others can't so easily.

Bogus answer 4: We examined only noncoding parts of the mtDNA, so there could be no selection. Yes, believe it or not, this is the most common response. I guess they don't teach people about linkage anymore.

Bogus answer 5: There's little or no evidence of selection on any gene in recent human evolution. Human evolution may have stopped entirely. Oh, lord. Yes, I've gotten this one many times.

There have been others over the years. Yet mtDNA is a big business -- people seem to be worried that the slightest criticism will bring down the whole thing like a house of cards. That's not true, even if mtDNA has sometimes been selected during human prehistory or history, that doesn't mean it isn't a useful marker for many purposes. But many seem more comfortable avoiding the issue entirely.

I think that taking the hypothesis of selection seriously would improve most of the work in this field. The possibility of selection doesn't eliminate demographic interpretation -- for example, the high ancient African mtDNA variation allows us to test hypotheses about African demography before 50,000 years ago, and there the data appear to reject the hypothesis of selection, at least after around 150,000 years ago. Gene genealogies don't allow us to see the whole past, just the time and forces that they experienced. If we ignore one of the major forces, we are reducing our knowledge.

There is an obvious problem testing the hypothesis of selection with mtDNA. When we consider any one single locus, it's always possible to find some demographic scenario that yields exactly the same predictions as selection. It's just a mathematical necessity -- selection is fundamentally a demographic phenomenon, and the increase in frequency of selected alleles looks similar to exponential growth of a small population.

So what can we do? Fortunately we have lots of options. We can test the proposed demographic hypotheses against the historical record. When we make observations that show that people 1000 years ago had very different frequencies of common haplotypes, well, we know it was selection. There hasn't been any genetically significant bottleneck in the last 1000 years! When we see small Neolithic population samples dominated by haplotypes that are very rare today, again, no historically possible bottleneck could have caused that.

Balloux with his colleagues (2009) has shown that one aspect of mtDNA patterning -- the association of haplogroup diversity with geography -- is very unlikely to have arisen by genetic drift. Here's part of their abstract:

We show that populations living in colder environments have lower mitochondrial diversity and that the genetic differentiation between pairs of populations correlates with difference in temperature. These associations were unique to mtDNA; we could not find a similar pattern in any other genetic marker. We were able to identify two correlated non-synonymous point mutations in the ND3 and ATP6 genes characterized by a clear association with temperature, which appear to be plausible targets of natural selection producing the association with climate. The same mutations have been previously shown to be associated with variation in mitochondrial pH and calcium dynamics. Our results indicate that natural selection mediated by climate has contributed to shape the current distribution of mtDNA sequences in humans.

They took a dual approach to testing the hypothesis of selection. First, they modeled the evolution of haplotype diversity under neutrality, and showed that the empirical distribution lies significantly outside that range of results. But even so, we might imagine some bottleneck scenario that would cause low diversity in high-latitude peoples, and this would be difficult to refute historically because many of those populations have poor historical documentation. But demography should have similar effects on other genes, and they were able to show that the rest of the genome doesn't share the mtDNA pattern.

It's really not that hard to test demographic hypotheses, using comparative genomics and anthropological knowledge. That's what anthropological genetics should be doing more and more. There was a time when obtaining a reasonable sample of mtDNA was an accomplishment, and comparing that sample to other genes was not feasible. But that time is past, and hopefully the review process -- journals and grants -- will start demanding some integration of mtDNA phylogeography with results from the rest of the genome.

Back to Balloux's conclusion:

Exploiting these new resources of autosomal variation will present significant challenges, but it will not help overcoming them if a large fraction of the community of human population biologists persists in sticking to mtDNA as the marker of choice.

Mitochondrial DNA isn't the tip of the iceberg -- it's an ice cube on top of the tip of the iceberg.

Related:

"Mitochondrial DNA selection review"

"Mitochondrial DNA and sperm"

"mtDNA selection in Iceland?"

"Complete Neandertal mitochondrial sequence, and selection on human (not Neandertal) mtDNA"

"Did Neandertals need better mitochondria?"

"Has the dam broken on mtDNA selection?"

Mitochondrial DNA adaptations in living human populations"

OK, that's enough related posts. But you can find a whole lot more by searching the topic!

References:

Balloux F. 2009. Mitochondrial phylogeography: The worm in the fruit of the mitochondrial DNA tree. Heredity (advance online): doi:10.1038/hdy.2009.122

Balloux F, Lawson Handley L-J, Jombart T, Liu H, Manica A. 2009. Climate shaped the worldwide distribution of human mitochondrial DNA sequence variation. Proc Roy Soc Lond B 276:3447-3455. doi:10.1098/rspb.2009.0752

Mailbag: Statistics and future evolution

I was trying to find out more
about recent research predicting a relative convergence of racial features in
future generations (but I don't know anything about "rapid evolution by drift"
or things like that). I'm aware of debunked claims (inc. your debunking) from
media reports, but I'm not aware of research that actually contains enough
scientific merit to make a valid prediction. I decided to write to you after reading
your review of a lecture by UCL geneticist Steve Jones.

If there is any reference you can give to someone like me who has very little genetic
training (past Mendel, anyway) I would greatly appreciate it.

I'll be glad to help if I can. Population genetics shouldn't be too much of a challenge for you; it's basically statistics (e.g., evolution by genetic drift is modeled by repeated binomial sampling).

We have a very high rate of gene flow between "racial" or geographic groups today compared to the past, and so we can predict that gene frequencies should converge in the future. But there are two issues -- first, the rate of change by chance in very large populations is very slow; and second, some genes may be (or recently have been) subject to selection processes that maintain diversity. That second is a complicated problem because selection pressures may be different for every gene.

An (old) interview with Warren Ewens

I ran across an interview between Anna Plutinski and population geneticist Warren Ewens.

I cannot say enough about Ewens' book, Mathematical Population Genetics. If you can work through it, you can do population genetics. It doesn't cover every au courant topic, but those will change next week anyway. And it's on Kindle now. Which I suppose probably looks pretty good on the DX, assuming the math displays well -- the book's format is just the right size for it.

Anyway, this interview from 2004 was probably conducted around the time the book was released. It covers pretty much the gamut of his career. I have to select some part to quote for you, so I'll select the passage that would be most likely to come out of my own math in my genetics class:

WE: Of course there is a strong possibility that the neutral theory is assumed not because it is appropriate but because the math of that theory is so very simple compared to the math applying for any selective theory.

AP: Can I follow that up? Do you think that that has lead to models of phylogenetic change that is not very well supported by the evidence?

WE: I think that that is quite possible. However, here we enter into another question. In mathematical population genetics theory you know from the very start that you are making big simplifying assumptions. You are in a very different position from a physicist, who might believe that his mathematical models describe reality exactly. No sensible population geneticist would make any claim along those lines. He or she is forced to simplify, because reality is so complicated that you don’t know it in any detail, and even if you did know it and used math describing it faithfully, the analysis would be impossible to carry through. So simplification is unavoidable. I do not know whether the use of the neutral theory is too much of a simplification and has lead us to incorrect and distorted views about the true evolutionary tree, it’s shape and dimensions, but I suspect that there has been quite a significant distortion.

There is much more at the link, some history of association testing, genetic draft, a lot on Ewens sampling theory, and a touch about his work here in Madison.

More on the X variation conundrum

Last winter I noted the contradiction between two papers that each attempted to explain variation on the X chromosome compared to the autosomes. They had come to opposite conclusions, based on discrepancies in their data. I noticed that they had used different methods of determining mutation rates for X chromosome loci:

So, for their current paper, Keinan and colleagues (2008) try to correct for the recent divergence of human and chimpanzee X chromosomes. Simple enough -- rescale all X chromosome mutation events by the some ratio proportional to the human-chimp divergence discrepancies. In this case, they attempt to rescale to the human-macaque divergence. Since that divergence happened in the Oligocene, the discrepancies among chromosomes should slight compared to the overall divergence. I'd feel better if they actually tested this idea.

Meanwhile, Mike Hammer and colleagues scaled X chromosome diversity to the human-orangutan divergence. They claimed that this gave the same results as the human-chimpanzee divergence. Which, if true, would obviously give a different outcome than the procedure followed by Keinan and colleagues, which was predicated on the idea that the human-chimpanzee X divergence is the wrong number to use.

I had sort of forgotten about this (which drove me crazy at the time), but another question led me to revisit it late this week. In the intervening time, I see that Carlos Bustamante and Sohini Ramachandran (2009) happened across the same explanation that I had offered:

It appears that the rest of the discrepancy is explained by different normalizations for background mutation rate differences between the X chromosome and autosomes (Hammer et al.10 used human-orangutan divergence and Keinan et al.9 used human-macaque divergence).

So you read it here first. Which I suppose means that I should submit letters to journals more often. I don't because it seems to me that all I'm doing is reading and trying to understand papers, which sometimes takes more work than it should. On the other hand, I wonder how many people are really putting much effort into their reading...

Meanwhile, Bustamante and Ramachandran add an additional explanation -- the different means of ascertainment, since Mike Hammer's group used resequencing to find variation, while Keinan and colleagues (2008) had used HapMap SNPs under a specific ascertainment model. They end their short piece by pointing out the value of further resequencing data:

In order to address continuing questions on the nature of sex-biased processes, full genome sequencing of large numbers of individuals sampled from diverse populations will be needed. The upcoming 1,000 Genomes Project (http://www.1000genomes.org/), for example, will provide orders of magnitude more data for these types of analyses. We share the enthusiasm of the population genetics community that this will bring the potential for resolving continuing questions regarding how human history and cultural practices have shaped global patterns of genomic diversity.

Ascertainment is a serious issue with the existing SNP data, because different SNPs were ascertained in different, non-commensurable ways. That's how I was led into reconsidering this issue this week, another set of data seem to have features that are partially explained by ascertainment, but partially not. It's hard to use existing data for some kinds of population genetics analysis, although others are less affected by ascertainment biases.

So the 1000 Genomes effort will make some kinds of analyses simpler to accomplish. I suppose if ascertainment becomes less of a problem, we may see people focus more effort into understanding non-genetic sources of information, too!

References:

Bustamante CD, Ramachandran S. 2009. Evaluating signatures of sex-specific processes in the human genome. Nat Genet 41:8-10. doi:10.1038/ng0109-8

"Hundreds of natural selection studies could be wrong"

Happily, though, the study isn't about our method for finding recent selection!

Instead, Masatoshi Nei and colleagues at Penn State have the long knives out for tests of selection based on excess amino acid substitutions:

Nei said that for many years he has suspected that the statistical methods were faulty. "The methods assume that when natural selection occurs the number of nucleotide substitutions that lead to changes in amino acids is significantly higher than the number of nucleotide substitutions that do not result in amino acid changes," he said. "But this assumption may be wrong. Actually, the majority of amino acid substitutions do not lead to functional changes, and the adaptive change of a protein often occurs by a rare amino acid substitution. For this reason, statistical methods may give erroneous conclusions." Nei also believes that the methods are inaccurate when the number of nucleotide substitutions observed is small.

Well, that's not us -- we're studying much more recent events, based on linkage disequilibrium. Hey, the observation that selection was rare through most of human evolution actually strongly supports our observation that the recent rate of selection represents a massive acceleration over the long-term rate.

Still, I'm skeptical about Nei's conclusions. According to the press release, they identify a number of cases in which sites inferred to be under selection are actually not the functional change, because other functional changes have been identified by experiment. That's hardly a general argument that selection has been overcounted in these analyses.

I find that in most counts of selection based on amino acid substitutions, the criteria for counting selection are ridiculously conservative. Often, you see the inference of selection only for cases where the number of amino acid changes actually exceed the number of silent changes. That's silly -- there's a strong bias against amino acid substitutions because of purifying selection. Only in repeated instances of positive selection are you ever going to see more amino acid substitutions than silent ones.

Meanwhile, the press release mis-states some research into human-chimpanzee genetic differences:

"These statistical methods have led many scientists to believe that natural selection acted on many more genes in humans than it did in chimpanzees, and they conclude that this is the reason why humans have developed large brains and other morphological differences," said Nei. "But I believe that these scientists are wrong. The number of genes that have undergone selection should be nearly the same in humans and chimps. The differences that make us human are more likely due to mutations that were favorable to us in the particular environment into which we moved, and these mutations then accumulated through time."

In fact, Margaret Bakewell and colleagues (2007) in the same journal showed that chimpanzees have more selected amino acid substitutions than humans. Nei's got it completely backward.

Now, I think Bakewell and colleagues might be wrong. The chimpanzee genome draft had many more sequencing artifacts at that time than the human genome, and these might account for the apparent excess in chimpanzees. But it's simply not true that researchers have shown "many more genes" under selection in humans than chimpanzees.

Well, except for us, referring to very recent human evolution. But in that case, as Nei notes, we're talking about "mutations that were favorable to us in the particular environment into which we moved." It's the massive environmental and demographic changes of the last 50,000 years that have made the difference. For most of the six million years before that, human genetic evolution seems to have gone at almost the same rate as in chimpanzees.

(via Gene Expression)

Plumbing for bottlenecks

My series on mutual information and tests of selection (which began with "Information theory: a short introduction") is at a branching point. One of the critical factors determining the power of such tests is the ancient rate of genetic drift. So it's important to come to some understanding of the archaeological record and our best estimates of ancient demography, so that we can independently test the hypothesis that genetic drift was very strong in recent human evolution. That's a long project, potentially the topic of several review papers. Since nobody else has put together these data in useful way for population genetics, I'm going to do it in one place. What you see in this series are my notes about this project. Being notes, they are not complete, but they may occasionally be better than any other sources. Where it's appropriate, I'll spin off the results for review and publication, and point to them here.

Many geneticists believe that there were massive population bottlenecks within the last 30,000 years, citing both genetic and archaeological evidence in support of this proposition. Some claim that there have been significant population bottlenecks in the last 5000 years.

Some archaeologists agree. However, I think this is one of those Inigo Montoya cases: "That word, I do not think it means what you think it means." Archaeology and genetics have completely different interpretations of the words, "bottleneck," "contraction," and "expansion." The result has been a lot of confusion about the relation of archaeological and genetic estimates of population size.

A population bottleneck impacts genetics by increasing the rate of inbreeding. This takes time to change gene frequencies, and does so in inverse proportion to population size. It may seem surprising that a truly massive die-off, on the scale of the Black Death, will have no measurable genetic impact. But cutting a population of millions down by half just doesn't impact gene frequencies. That is, unless you are looking at genes that helped people to survive the plague, in which case you're looking at natural selection, not a bottleneck.

A significant genetic bottleneck is not just any population contraction -- it's an event in which the population is cut by a large fraction for a long time. In paleontological terms, we're usually considering cases where the ratio of the number of individuals and the number of generations is near one. In other words, if you cut the population down to a thousand individuals, and keep it there for a thousand generations, you're going to have a large genetic impact. Likewise, you can have a significant bottleneck that's ten generations long, but you need to cut the population down to around ten people.

You can do a bit better measuring inbreeding by looking at lots and lots of people to study very rare alleles, like a rare genetic disease in a founder population. There, you may spot changes that unfolded in ten generations, even in a relatively large population of a hundred people. Increasingly, as we develop larger and larger datasets of gene variations, we will add power to detect such events in human prehistory.

In archaeology, a significant event is one in which fewer sites were occupied by ancient people in a well-studied region. The length of such a contraction depends on the sampling intensity and dating methods available -- it might be a hundred years or many thousands. Likewise, the magnitude of population contraction will be uncertain -- you can get an accurate estimate, but with substantial sampling error. As in genetics, there are other possible explanations for an apparent contraction. We might lack geological exposures of the right age, or people may simply have moved from formerly favored locations to new ones. Worse, it might just be that archaeologists haven't looked hard enough at a given time interval.

Archaeology is necessarily imprecise about the census population that existed at any given time. So is genetics. Both have their strengths and weaknesses. We want these different areas of evidence to bear on the same prehistoric events.

Too much, instead of testing hypotheses, people just line up chronologies and look for matches. A geologist may claim that African paleoclimate is important because it may explain ``modern human origins.'' An archaeologist may claim that a hiatus at a site is consistent with ``genetic bottlenecks.'' And the geneticist may claim that inbreeding in a modern-day genetic sample dates to a period of time corresponding to the replacement of one tool industry by another.

Any might be a valid hypothesis, but we need to take it further, to actually provide some tests. I believe we can do better now, because of the growing amount of genetic information. But we're going to have to do away with the facile idea that we're looking for massive bottlenecks, we need to introduce a recognition of the role of selection in human genetic variation, and we need to start addressing the archaeological record as it really exists.

That's a forward to what follows. I'm going through regions of the world at different time intervals, to discuss what we know about population size from the archaeological record.

Next: No Late Pleistocene bottleneck in southern Africa

When genetic drift reduces entropy

This is the third in a series on information theory and tests for recent selection. The first post, “Information theory: a short introduction”, covered some of the basics of entropy. The second post, “Information theory and mutual information between genetic loci”, showed that mutual information between independent sites will be distributed as a χ2.

We tend to think of genetic drift as a random process. Random processes operating repeatedly over time are called “stochastic,” and changes in gene frequency under genetic drift are certainly that.

Since entropy is a measure of uncertainty, it might seem natural to think that stochastic changes in gene frequency would increase the entropy in a population. After all, the gene frequency in a population under genetic drift will be more and more uncertain over time. So, considering the frequency of a single allele as the system, genetic drift appears to increase entropy over time.

But even this simple system isn’t quite so simple as it might appear. Sure if you start out knowing the allele frequency, then genetic drift will increase your uncertainty over time. You will become less and less able to say that it lies in any given interval. But what if you don’t start out knowing? What if all you know is that the locus has been subjected to t generations of genetic drift?

As t increases, the probability of fixation of the locus also increases. The net effect is to reduce the entropy in the system – going from uncertainty about the allele frequency to more and more certainty that it will be either one or zero. The only thing that will stop this process is some other evolutionary force – mutation, migration from other populations, balancing selection. Each of these will have its own distinctive effects on the entropy of the single-locus system.

mtDNA selection in Iceland?

Leave it to me to have readers unwilling to ignore selection in recent populations! Here's an e-mail:

Why couldn't the Icelandic genetic changes have been the result of selection that favored some mtDNA lineages rather than others? We know the population of Iceland derived from settlers that were transplanted into a relatively alien climate and ecology, and had to adjust agriculture and subsistence activity to survive there. We know that there were dramatic environmental insults to the population: disease, starvation, eruptions. At least some of these insults would have likely been more severe than the ancestral populations would have encountered, whether they were Scandinavian or "Celtic".

So why isn't there at least a token mention of selection, either by you or by the authors? Is "genetic drift" that much more likely than selection? Is selection a more academically risky proposition than the comforting mathematics of "drift"?

Genetic drift eliminated rare mtDNA haplotypes from Iceland

How powerful has genetic drift been in recent human evolution? That's the question I raised the other day with reference to the claim that a heart disease risk-inducing allele had become common by drift in India during the last 30,000 years.

Another paper released earlier this week in PLoS Genetics claims that mtDNA haplotypes have been recently lost from the Icelandic population by strong genetic drift. The evidence for such changes in haplotypes comes from sequencing the mtDNA of thousand-year-old skeletons unearthed in Iceland during the last 150 years. These ancient remains have haplotypes that are found elsewhere in Europe today, but not in Iceland. The conclusion is that the modern-day descendants of these early Iceland settlers have experienced genetic drift within the last 1000 years, relieving them of of a load of rare mtDNA haplotypes.

Could genetic drift have accomplished this loss of haplotypes? Although the paper does not present any analysis of this question, a quick consideration of some theory will show that genetic drift could easily have caused the observed results. It also shows a contrast between this case and others where genetic drift has been described as "strong". Even in this case, on an island with a limited human population, genetic drift is only "strong" in the sense of eliminating alleles that are already quite rare in the population.

Could genetic drift really break your heart?

Are these people crazy?

The combination of such a large risk with such a high frequency is, fortunately, unique. "How can such a harmful mutation be so common?" asks Chris Tyler-Smith from The Wellcome Trust Sanger Institute, Hinxton, UK. "We might expect such a deleterious change to have 'died out'.

"We think that the mutation arose around 30,000 years ago in India, and has been able to spread because its effects usually develop only after people have had their children. A case of chance genetic drift: simply terribly bad luck for the carriers."

This is a 25-bp deletion in a muscle protein gene, MYBPC3. The current allele frequency in India is estimated to be 4 percent; it is estimated to be carried by 60 million people. The paper suggests that it originated 30,000 years ago. Carriers of the gene have a massive increase in their chance of cardiomyopathy.

Here's the relevant passage from the paper:

The presence of a disease-associated variant at substantial frequency raises an evolutionary question: if it is disadvantageous, how did it become so common? In principle, it could be evolutionarily neutral, manifesting its disadvantages only late in life; alternatively, its disadvantages could be outweighed by advantages early in life, or in a different environment, so that it could have been positively selected. To address this question, we examined the haplotype structure surrounding the deletion. Using five short tandem repeat (STR) markers, spanning ca. 3.4 Mb surrounding the deletion in 287 heterozygous individuals, we found similar high degrees of variation in the inferred haplotypes from chromosomes with and without the deletion (Supplementary Fig. 7 and Supplementary Table 6 online). We then used allele-specific amplification to resequence ca. 10-kb haplotypes centered on the 25-bp deletion from nine heterozygous individuals (Supplementary Tables 7 and 8 online). The chromosomes carrying the 25-bp deletion showed five closely related haplotypes (Supplementary Fig. 8 online). After excluding variants likely to have arisen by recombination, we estimated a time to most recent common ancestry (TMRCA) of ca. 33 ± 23 thousand years for the deletion haplotypes (Supplementary Methods). This time slightly postdates the initial peopling of the subcontinent 30,000–50,000 years ago and together with its restricted geographical distribution suggests that the deletion did not arrive with the first modern human settlers from Africa [more than] 50,000 years ago, but arose subsequently within the subcontinent. Its occurrence in two populations from Southeast Asia can be explained by recent gene flow from India (Supplementary Note online). Collectively, these observations provide no evidence for rapid spread of a recent founder haplotype or any departure from neutral evolution (Dhandapany et al. 2009:4).

The issue is not really whether a gene could go from 1 copy to 4 percent in 1200 generations by chance. That wouldn't be so terribly unlikely in Pleistocene humans -- in fact, the mean time for a mutation to go from 1 copy to 4 percent by drift in a population of effective size 10,000 individuals is not 30,000 years, but only around 20,000 years. On the other hand, mtDNA variation today suggests that South Asia experienced early and rapid population growth -- so we're not likely talking about a population of 10,000, but more like a minimum of 100,000 effective individuals through the past 30,000 years at least. It would take genetic drift at least 10 times longer to accomplish the requisite frequency change given that demographic history. Still, a single allele at a single gene locus might be exceptional.

But that scenario, however unlikely, is simply not the situation we have here. Here we have a deletion that must have some disadvantage, because it gives people a fatal disease. This disadvantage is apparently dominant in effect, based on the case-control study. Yet the deletion has managed to persist within the large South Asian populations of the last 10,000 years so that today it is still around 4 percent.

People mainly die of cardiac problems after age 40. But human reproductive lives aren't over until they're done investing in their children. Further, a weakened heart may reduce work potential or health even if it kills slowly. The fitness cost of this deletion is smaller than if it gave people a chance at a fatal disease when they are 17, but a smaller fitness cost is still a fitness cost. In a large population, that small fitness cost is going to whittle away the frequency of the allele over time.

A thousand generations is a lot of potential whittling. Using some quick calculations, it looks like selection against the deletion as low as 0.001 to 0.0015 in heterozygotes should have been enough to cut the frequency down to around 1 percent, from an initial value of 4 percent. So even if drift increased the deletion early after its origin, it ought to be much rarer today. Meanwhile, drift looks even more unlikely, since the chances of a mutation growing from 1 copy to 4 percent against such selection are nil.

Did this deletion have a fitness cost as high as one in a thousand? It increases cardiomyopathy by 5-fold or more compared to the wild type. So it seems very plausible. But really, we don't have any good estimates of the fitness costs of chronic diseases in pre-industrial populations.

If the deletion was favored by some selection, that would probably be antagonistic, that is, acting against the fitness cost of the deletion late in life. The authors briefly investigated this hypothesis, as described above. They found no evidence for a recent expansion of a single haplotype around the deletion. That means that if there was strong selection favoring this deletion, it must have happened early after its origin and then petered out. If the expansion had been late in South Asian history, it would show more LD around it, and most of the deletion-carrying chromosomes would share a single long-range haplotype. So this deletion has not been increasing rapidly in the past few thousand years.

I would hypothesize that the disadvantages of the deletion have actually increased over time. The average lifespan increased into the Upper Paleolithic and probably later as well. Meanwhile, as the population grew, larger completed family sizes became more important to fitness. As people became more sedentary, the accumulation and inheritance of possessions and land became an important means of investing in children. The increasing importance of later survival and investment in children should have raised the fitness cost of chronic disease. That would explain a pattern of evolution in which this deletion increased in frequency early in its history, but later remained static or declined.

So, I don't suppose I can say people are crazy for thinking genetic drift could explain this deletion's current high frequency. But considering the powerful effect of weak selection over the many generations involved here, and the very large size of the South Asian population during most of that time, genetic drift seems pretty unlikely.

References:

Dhandapany PS and 23 others. 2009. A common MYBPC3 (cardiac myosin binding protein C) variant associated with cardiomyopathies in South Asia. Nat Genet (online early) doi:10.1038/ng.309

Reading through P. A. P. Moran's book, The Statistical Processes of Evolutionary Theory, I found this passage (p. 12):

It should be pointed out that the above stochastic models [of density dependence] usually result in there being a non-zero probability that the population will die out altogether. In genetic problems this is an unmitigated nuisance. In population genetics we are concerned with the variation and distribution of gene frequencies and it is very difficult to make stochastic models in which both the gene frequency and the population size are random variables (see Feller (1951), p. 242 for a beginning in this direction, in which, however, the population may die out). Many genetic phenomena do depend on the population size and the models we shall consider later nearly all assume that this size is held constant. It is true that if we have, for example, a situation in which a new mutant gene takes over the whole population by reason of some selective advantage, the total population size, which is held in check by density-dependent forces, can usually be expected to increase somewhat, or at any rate to change slightly, but this is not likely to have an important effect.

That's interesting for several reasons. Recently I've been investigating the connections between selection and demographic growth. In humans, there are a number of recently selected genes whose advantage comes from relaxing density dependence (that is, increasing carrying capacity), for example by allowing greater resource extraction from the environment. In those cases, the effect of a selective fixation on population size will not be negligible. Examples of that kind may not be rare in nature, although in many instances selection may increase population size only to result in added pressure to various prey species, which then reduce the carrying capacity.

Another reason why this is interesting is that it reveals a fairly unusual way of thinking about selection. From one point of view selection is just a condition of the demography of alleles. In particular, both selection and genetic drift (and for that matter, mutation) are described by the same equations that describe demography. Under genetic drift, these allelic demographies are in all cases of similar form to the demography of the population in which those alleles are embedded. Selection, on the other hand, is notable for showing the demography of alleles to be inconsistent with the demography of the population. The most commonly considered case is where one allele increases while the population remains the same size. But balancing selection, for example, can be reduced to density-dependence on an allele's frequency.

One of the easiest ways for selection to set itself apart from stochastic changes in populations is to be deterministic. But the results of selection are nonetheless stochastic, and it is good to be reminded.

Surfing and recent selection

Genetic Future and Gene Expression have commented today on the relative roles of selection and demography in shaping the genetic differences between populations. They are reacting to a paper by Hofer and colleagues (2009) that examined the differences in frequency among human populations for a number of genetic markers, including STR (microsatellite), SNP and insertion-deletion mutations.

That paper's abstract:

Several studies have found strikingly different allele frequencies between continents. This has been mainly interpreted as being due to local adaptation. However, demographic factors can generate similar patterns. Namely, allelic surfing during a population range expansion may increase the frequency of alleles in newly colonised areas. In this study, we examined 772 STRs, 210 diallelic indels, and 2834 SNPs typed in 53 human populations worldwide under the HGDP-CEPH Diversity Panel to determine to which extent allele frequency differs among four regions (Africa, Eurasia, East Asia, and America). We find that large allele frequency differences between continents are surprisingly common, and that Africa and America show the largest number of loci with extreme frequency differences. Moreover, more STR alleles have increased rather than decreased in frequency outside Africa, as expected under allelic surfing. Finally, there is no relationship between the extent of allele frequency differences and proximity to genes, as would be expected under selection. We therefore conclude that most of the observed large allele frequency differences between continents result from demography rather than from positive selection.

OK, so that abstract concludes that demography (including population bottlenecks and geographic dispersals) is a better explanation for the genome-wide pattern of interpopulation frequency differences than selection.

I agree completely.

When I teach Anthropology 105, our introduction to biological anthropology, I always force my students to learn how to calculate Wright's FST. They really don't like it. They think it's cruel and unusual punishment to have to do math in an anthropology course.

Well, if they're going to take my courses, they'll have to get used to it. Because with me, it's all about the math.

So, let's consider FST. The statistic represents the reduction in heterozygosity in subpopulations due to isolation, compared to the expectation under panmixia. The expression is:

Fst equation

Where HS is the average heterozygosity of subpopulations, and HT is the expected heterogosity of the total population, given the allele frequencies.

I always use a two-allele locus as an example in class, and I always choose a case in which the frequency of an allele in one subpopulation is 70 percent, and the frequency of the same allele in the other subpopulation is 30 percent. Big difference in frequencies -- the frequency is 40 percent higher in one population than in the other. In fact, that frequency difference is well within the range considered "extreme" in the current paper by Hofer and colleagues.

Well, if the subpopulations are the same size, the average allele frequency is 50 percent. So the expected heterozygosity of the total population is 0.5. (that's 2pq, where p and q are the frequencies of the two alleles). And the average heterozygosity of the two subpopulations is 0.42. So applying the formula above, we come to an FST of 0.16.

Now, the average FST among human continental populations is between 0.1 and 0.15. A value of 0.16 for a single gene should not be in the least bit unusual. Under neutrality, there ought to be lots and lots of gene loci that show allele frequency differences this great or greater. And indeed, Hofer and colleagues find a large set of such loci -- something like one out of 10, which actually seems a bit low to me.

Other surveys that have tried to test the neutral hypothesis have considered a much smaller range of frequencies -- essentially, genes in which an allele is 80 percent or higher in one population and rare or absent in others. This study included much smaller allele frequency differences as part of their "extreme" and thereby found that a very high fraction of sites had such differences.

For the broader meaning of "extreme" used in this paper, which under neutrality would include one out of every 10 loci, it is no surprise that most would look, well, neutral. There are so many neutral loci fitting these characteristics that they completely swamp out any statistical expectation of selection. There might be a handful of selected sites among the high-FST loci in the paper (and the authors identify a few candidates from other studies), but most must be neutral. The study tests the adequacy of neutral hypothesis to explain low FST genes, and finds that population differences at that level have not been driven primarily by selection.

I'm not sure why the authors didn't include the prosaic mathematical prediction of neutrality in their paper. It seems to me that the results were foreordained by theory.

Still, several of the observations in the paper are interesting. In particular, the excess of STR alleles outside of Africa that have increased in frequency is a sign of a long-term demographic bias toward population growth outside of Africa. I have heard that observation from other research groups in other contexts, but this is the first paper I can think of that reported it clearly. The "allele surfing" explanation is a very credible explanation for that observation -- essentially, geographically-dispersed founder effect.

The end of the discussion includes a statement about positive selection:

While we find that positive selection is unlikely to have shaped the allele frequency spectrum at most loci, it may certainly have acted on fewer genes than previously believed, and our current results do not allow us to discriminate between the effects of demography and selection for an individual locus. Loci which are candidates for being under positive selection should therefore be more carefully scrutinized to find links between potentially selected alleles and a phenotypic effect (see e.g. Sabeti et al. 2007).

I find nothing to disagree with here. Any individual instance of positive selection should be tested with reference to phenotypic effects, and collectively, most of the genome's diversity was not shaped by positive selection. Our own research on positive selection (discussed in this post from last year) addresses a relatively small subset of haplotypes across the genome. Even though the number of affected genes is quite large (on the order of several thousand), it did not strongly influence the genome-wide diversity parameters assessed by Hofer and colleagues.

The limited genome-wide effect of selection, in the face of a large apparent number of selected alleles, is one of the strongest arguments that the rate of positive selection has recently accelerated. If the rate had been high throughout human evolution, we would find a much stronger effect on the genome-wide variation than we in fact observe. The demographic changes proposed by Hofer and colleagues in fact bolster the case for a recent acceleration -- the very demographic changes that might create "allelic surfing" would also tend to generate more positively selected mutations.

References:

Hofer T, Ray N, Wegmann D, Excoffier L. 2009. Large allele frequency differences between human continental groups are more likely to have occurred by drift during range expansions than by selection. Ann Hum Genet 73:95-108. doi:10.1111/j.1469-1809.2008.00489.x

Cultural impedance, demographic growth, effective population size

This is a complicated story with many interlocking parts. Telling the whole story may well take me fifty posts. There's a lot of new science hiding in here waiting to get out.

I'm starting now because of the new paper by Luke Premo and Jean-Jacques Hublin, titled "Culture, population structure, and low genetic diversity in Pleistocene hominins." This paper is not the final word on its topic, nor is it the first word. But it is very much worth reading.

It makes an excellent point of departure to explain what we know and don't know about the genetics of prehistoric humans. Premo and Hublin propose an interesting model with interaction between culture and natural selection, as an explanation for a 35-year-old problem in human evolution: Our low level of genetic variation.

Their model may be right. I certainly think there's a kernel of truth in it, shared with a number of other models, as I'll describe below. And it's testable -- a project to which we'll be returning in the next few months.

The Amish heart-protecting triglyceride-busting null mutation

Toni Pollin and colleagues (2008) report one of the simplest medical research studies you'll ever see:

Apolipoprotein C-III (apoC-III) inhibits triglyceride hydrolysis and has been implicated in coronary artery disease. Through a genome-wide association study, we have found that about 5% of the Lancaster Amish are heterozygous carriers of a null mutation (R19X) in the gene encoding apoC-III (APOC3) and, as a result, express half the amount of apoC-III present in noncarriers. Mutation carriers compared with noncarriers had lower fasting and postprandial serum triglycerides, higher levels of HDL-cholesterol and lower levels of LDL-cholesterol. Subclinical atherosclerosis, as measured by coronary artery calcification, was less common in carriers than noncarriers, which suggests that lifelong deficiency of apoC-III has a cardioprotective effect.

Gina Kolata covers the story in the NY Times:

For the sake of heart disease research, 809 members of the Old Order Amish community agreed to go to a clinic in Lancaster, Pa., near their homes, and drink a rich milkshake that was made mostly of heavy cream. Over the next six hours, a group of investigators took samples of their blood, determining how much fat was churning through their bloodstreams.

Most of the study participants responded as expected — their levels of triglycerides, a common form of fat in the blood, rose steadily for three to four hours and then declined. But about 5 percent had an extraordinary reaction: their triglyceride levels started out low and hardly budged.

I'm generally interested in novel protective mutations, and this is clearly one -- and far from the only one. Its current frequency is 5 percent in the Old Order Amish. Neither the article nor the paper report on its frequency in the general population; although there is the intimation that it is rare. The Amish individuals carrying the mutation all share a common haplotype, apparently (based on pedigree and LD) from a single 18th-century founder.

It remains an open question whether homozygotes for the null allele are better or worse off than normal APOC3 homozygotes. With a frequency of 5%, the allele is rare enough that homozygotes are as few as one in 400 people. They were not included in the present study. I can't find any indication that homozygote nulls for APOC3 are a known Mendelian disorder.

I wonder to what extent the allele frequency in the Amish is due to selection.

The Amish have high frequencies of certain otherwise rare mutations. This is one of the textbook examples of founder effects -- extreme genetic drift due to sampling a small number of founders from a much larger population. Today's Old Order Amish in the United States trace most of their ancestry to an initial population of approximately 200 people in the eighteenth century. That means that any of the alleles carried by those 200 people, even if it was vanishingly rare in the European population, has a good chance of being half a percent or higher in today's Amish.

But founder effect is only part of the story -- there is also subsequent population growth. Those initial 200 people have more than 200,000 descendants today within the Old Order Amish. This number doesn't count descendants who may belong to other sects that splintered during the nineteenth-century (like the Mennonites [see update below]), or descendants of people who left the church. These values suggest that the Amish population has increased by some 2.3% annually during the last 300 years; it's current rate of growth is estimated at 4%.

This is very rapid population growth on an evolutionary time scale, equalling roughly 46% per generation. With this kind of population growth, strongly deleterious alleles may come to occur in a large number of individuals, even as they decline in frequency in the population. The susceptible population grows faster than selection can remove alleles. Hence, we find a number of rare genetic disorders within the Old Order Amish as a consequence not only of founder effect but also subsequent population growth.

The APOC3 mutation in this study was evidently not deleterious. Its current frequency of 5% suggests it may have been advantageous.

It's not too hard to hypothesize why a mutation that decreases the risk of heart disease might have conferred a benefit in an agrarian religious sect over the last 300 years. To the extent that heart disease affects men in their 30's and older, these are still active reproductive years for men who may have family sizes of eight children or more. Further, this is a time when men may come into property from their aging parents, may become leaders of new settlements, or may begin to affect the marriages of their children -- a time when young people formally join the church. Being alive would seem like a significant fitness advantage for men in this society. Or perhaps other effects of the gene determined its success.

The question is just how strong such an effect might be. If the mutation began with a single copy in a population of 200 founders, its initial frequency would be 0.5 percent, or 0.005. Its present frequency in the Amish is ten times that, or 0.05. If we assume that 15 generations have passed, that growth would be consistent with a fitness advantage of around 15 percent for carriers of the null mutation. In other words, the Amish population grew around 46% per generation over the last 300 years; this mutation grew around 60% per generation.

That kind of differential increase is unlikely to have been driven by genetic drift. Considering the rarity of the mutation in the non-Amish population today, it is unlikely to have been carried by more than a single founder, although we can't exclude the hypothesis that some number of founders were relatives who carried it. That hypothesis is the most likely way for an otherwise rare mutation to hit 5% by founder effect alone. Later, after the Amish population numbered more than a thousand or so, strong differential growth of a rare mutation by chance alone would be impossible. Still, we might imagine that in the initial few generations, one or two founders might have had a predominant effect on the subsequent Amish gene pool. We would need to suppose that the genes of such fecund founders now account for more than 10% of the present Amish gene pool. That's a testable hypothesis. Selection is simpler -- mainly because its effect can be spread across many more generations.

The interesting thing about selection in the Amish is that their population growth greatly affects the fixation rate of new advantageous mutations. In a constant-sized population, the fixation probability of a new advantageous mutation is roughly twice the heterozygote fitness advantage, denoted as 2s. But in a growing population, the fixation probability is 2(s + r) -- when s and r are both small. If we assume a growth rate of 46% and a heterozygote fitness advantage of 15% for this null allele, it should be obvious that we've entered the territory where our small-value approximation no longer holds. New adaptive mutations are unlikely to exit the Amish population by genetic drift.

The subject of positive selection in founder populations is under-explored, from a theoretical perspective. Especially considering the very rapid growth of some human founder populations -- measured against the generational time scale -- there is a good chance that we'll find many new adaptive mutations in such populations.

UPDATE (2008-12-15): A reader writes:

It is a common mistake to think that the Mennonites as a group broke off from the Amish. It is actually the other way around with the split occurring in Europe before both groups came to the Americas.

He kindly provided a couple of sites with more information (here and here). I appreciate the correction!

References:

Pollin TI and 13 others. 2008. A null mutation in human APOC3 confers a favorable plasma lipid profile and apparent cardioprotection. Science 322:1702-1705. doi:10.1126/science.1161524

Genetic differentiation within Europe

Larry Moran tells an interesting personal story about long-distance gene flow among Roman-era elites in Europe (What does Marcus Antonius tell us about evolution?). He describes the genealogical connection between Mark Antony and the dark-age Irish warlord, Niall of the Nine Hostages, Y-chromosomal progenitor of a large proportion of Irish (and British) men.

But the strange thing is that after this story, describing how one man's descendants covered more than a thousand miles in a few generations, Moran gives this conclusion:

New beneficial alleles will not make much headway in 2000 years because gene flow between subpopulations is very low. There's no reason to assume that it was any different in the ancient past—it may even have been worse. Think about that the next time you hear about some hypothetical allele that arose 50,000 years ago and became fixed in the entire species. That's not very likely.

I would conclude just the opposite from the story. Garlic mustard has spread across North America after being introduced from Europe less than 500 years ago. It is currently invading formerly "wilderness" spaces such as the remaining patches of prairie here in Wisconsin. This has not happened by a slow, plodding spread from one square meter to the next. It has happened because every so often a few mustard seeds get stuck in the tread of someone's shoe, or tire, or in mud stuck in wheel-wells of cars and four-wheelers. Those seeds get carried into wilderness areas, many miles from their sources.

Only a very small fraction of garlic mustard seeds get themselves stuck in shoes. We might think that surely this small number should be no threat to Wisconsin prairies. It would take hundreds or thousands of generations for them to make any difference, right?

But garlic mustard grows exponentially, particularly if that area has been disturbed by fire, plowing, traffic or overbrowsing. A tiny number of seeds are all it takes to spread invasively into a new place. A small amount of long-distance movement has been sufficient to permeate almost every suitable mustard habitat in North America, in less than 500 years.

A selected gene is like garlic mustard. We may say that only a few members of the Roman elite intermarried with Britons. But if a single Roman married a Briton, carrying an advantageous gene, that gene has the chance to grow exponentially. That chance is not a guarantee, any more than a single garlic mustard seed is a guarantee. A single copy of an advantageous gene still has a very high probability of being lost by chance. But selected genes have a much higher chance of spreading than neutral ones. A very slight amount of long-distance gene flow can cause a selected gene to spread vastly faster than diffusion across a population.

Besides that, in this case, the history is incomplete. Roman legions occupied Britain for more than 400 years. Those legions were not only Italian, but included soldiers from across the empire, including in one famous instance thousands of Sarmatians. Sarmatians carried with them genes from the steppes of Central Asia, much farther than Rome. Soldiers were stationed for years, and many left the service and became local merchants, landowners, or minor nobles. They were not celibate. For that matter, neither were the early Latin clergy...

This massive flow of genes into the British Isles did not erase the standing genetic variation, some of which persisted from Neolithic and Paleolithic Britons. But the immigrants were more than enough to spread advantageous genes into the British population. We need not imagine one hitchhiking like mustard seed in the grandchildren of Mark Antony, although that is certainly possible. Antony's descendants were joined by thousands of lonely Roman soldiers stationed for years in backwater British towns, horny Vikings, pillaging Saxons, conquering Normans, and the occasional German prince.

Early gene flow would have been more influential on the present composition of the British population than later gene flow. But if the question is whether a gene could traverse the European population in a few thousand years, there have been ample opportunities. And if we go back 50,000 years, even relative isolates like Australia and the Americas had their chance to get such genes.

All this just says that it is plausible for genes to have spread widely through the human population recently. It's no proof that they actually did so, or that they had substantial effects on human similarities or differences. For that we must turn to empirical evidence.

In that vein, here's a question that I know is of interest to a number of people: How similar should the selected genes in Britain, or Northern Europe generally, be to those of Central Asia, or the Near East, or Italy? We have samples of genetic variation in each of these places (and many others) that would answer the question empirically. We know that the majority of the genome, presumably neutral to selection, shows significant population differentiation among those places. But what does theory tell us? Should we expect selected genes to have a different pattern?

On this, I'll have to save my answer for later....

Some genetic drift graphs with Mathematica

The first thing to come up in my lectures is genetic drift. Pretty much everyone who lectures about drift needs a figure showing the results of simple Monte Carlo simulations of sampling drift in a finite population. You start a population with two alleles, sample it randomly in each generation until one or the other alleles disappears. I tend to start with a "population size" of 1000 gene copies, and an initial allele frequency of 50%.

We can do this kind of simulation in Mathematica pretty easily. We'll work with three variables:

  • popSize, which we'll set equal to 1000;
  • a, the number of gene copies with one of the two alleles, initially equal to 500;
  • frequencyList, which will hold the value of a for each generation. Initially, frequencyList holds the first value of a, 500.

Now, we're ready to code the simulation. We set the three variables, and then set up a While[] loop that samples a random binomial deviate in each generation, based on the previous generation's allele frequency (a/popSize):

a = N[500];
popSize = 1000;
frequencyList = {a};
While[a < popSize && a > 0,
a = N[RandomInteger[BinomialDistribution[popSize, a/popSize]]];
AppendTo[frequencyList, a]]

The last line appends the value a to the list. Nesting BinomialDistribution[] inside RandomInteger[] gives us a binomial deviate, based on popSize trials and frequency a/popSize. The loop executes until a reaches either zero or 1000, at which point the simulation stops.

OK, that gives us a list, frequencyList, which contains the number of gene copies with one of the two alleles over time. Now, we want to plot that list. We can try:

ListPlot[
frequencyList,
PlotJoined -> True]

...which gives us:

Monte Carlo simulation of genetic drift

That's servicable, but a bit simple. And it's confusing where the axis cuts across. It would be better to have the plot bounded at 0 and 1000, the min and max possible. Also, we need a better font than Times.

Let's try:

ListPlot[
{frequencyList},
PlotJoined -> True,
Frame -> True,
AxesOrigin -> {0, 0},
PlotStyle -> Thick,
FrameLabel -> {"Time (generations)", Frequency},
TextStyle -> {FontFamily -> "Myriad Pro", FontSize -> 12},
Filling -> 500]
Monte Carlo simulation of genetic drift

That's a bit more like it. The Filling -> 500 gives us shading everywhere between our frequency line and the 500 mark, a nice cue reminding us where the simulation started.

Now, we can add a second run in the same plot:

Monte Carlo simulation of genetic drift, two populations

Oh, well that shifted the x-axis. No harm, though. And we continue...

Monte Carlo simulation of genetic drift, two populations

Not a bad representation of the process, with fixation times ranging from very short to quite long, and one going to fixation at zero. The different color shading tends to confuse the picture a bit, and doing it in a larger size for the projector will take some tweaks, but so far it's much more attractive than the Excel version.

We could run a few more simulations to substitute, just in case one made the overall picture more clear (for example, by moving the lines apart in the early phase). But here we have the benefit of honesty --- these are the first four sims I ran.

UPDATE (2009-09-13): I'm revisiting this post as I work on a Mathematica Demonstration of genetic drift. It's much simpler to implement the central drift algorithm with a NestList or a NestWhileList. For example:

NestList[RandomInteger[BinomialDistribution[1000, #/1000]] &, 500, 4000]

gives you the trajectory over 4000 generations. This:

NestWhileList[
RandomInteger[
BinomialDistribution[1000, #/1000]] &, 500, # < 1000 && # > 0 &]

replicates the output above.

Sample sizes and the "Neandertal haplogroup"

I have an excellent e-mail question about last week’s Neandertal mtDNA paper, which has provoked a lot of commentary.

I just skimmed over your comments on the recent paper and I have a couple questions. First, how many Neanderthals did they receive mitochondrial DNA from? I think I read somewhere that it was fewer than ten.

Second if that is true, what the hell does it mean? I wouldn’t try and predict anything based on even fifty humans from that long ago much less 8 or 9 in genetic terms. I don’t think that anyone else would either unless they are grandstanding. You can’t prove a negative so they really can’t say that no modern humans have any Neanderthal DNA. Did they study Neanderthals from Asia? I just think they don’t have a good enough sample and until we can resequence a Neanderthal nucleus and bring the little tyke to term and wait for him or her to marry then wait for those kids to have kids will we really be sure we’ve got the goods.

Krause et al. (2007) list 15 Neandertal partial mtDNA sequences. Ten of these at that time presented relatively long portions, including the central Asian Okladnikov and Teshik Tash specimens, Mezmaiskaya, Feldhofer 1 and 2, Vindija 75 and 80, Scladina, Monte Lessini, and El Sidrón 1252. The same paper lists five additional specimens for which only a very short sequence had been recovered (just enough to diagnose as part of the Neandertal clade), including Vindija 77, El Sidrón 441, Engis 2, Rochers de Villeneuve, and La Chapelle-aux-Saints.

We do not know that every Neandertal belonged to the same mtDNA clade as those 15 sequences. Some of them may have looked different, possibly including the new clade otherwise present in later Upper Paleolithic and living people. But based on the 15 sequences we have, we can say that a large fraction of Neandertals must have carried the “Neandertal haplogroup.” Exactly how large a fraction depends on what we are willing to believe about contamination, preservation, and the randomness of our sample.

Now, let’s consider the question: Can we predict anything about Neandertal evolution and relationships based on this small, possibly unrepresentative sample of mtDNA?

The answer is that it doesn’t matter very much whether we have 5 sequences or 500. If 15 out of 15 specimens from different sites across Europe preserve a single mtDNA haplogroup, we can’t say it was universal, but we can say it was common. If 40 out of 50, or 400 out of 500 specimens had the same haplogroup, that would increase the precision, but not change the basic fact: Neandertals had at least one common haplogroup that is now so rare it has never been found in a sample of 100,000 or more people. We deserve some explanation.

The possible explanations are:

  1. Random genetic drift
  2. Accelerated genetic drift due to demographic turnover
  3. Population extinction and replacement
  4. Natural selection


Drift

Random genetic drift is fairly easy to refute, although it might not appear so at first. In favor of drift: There were few Neandertals, and the population size of the succeeding Upper Paleolithic, up through the Last Glacial Maximum, was also small—the best estimates are on the order of 2000 people for Western Europe and 5000 for continental Europe to the Urals (Bocquet-Appel et al.2005). There would have been perhaps twice or more that number across the entire Neandertal range. The effective population size represented by this population would have been smaller; perhaps 3000–5000 for Neandertals and Aurignacian-era people, only half, or around 2000, females. Genetic drift in this small mtDNA population would have been much stronger than for autosomal genes, and very much stronger than in most recent human populations.

But when we plug these numbers into a model of random genetic drift, it starts to appear very unlikely that drift alone could explain the observations. Let’s assume (falsely) that our Neandertal genetic samples all dated to 40,000 years ago, and the female effective size was 2000 individuals between then and 15,000 years ago, and that the population of Neandertal country were a random mating pool. Following these assumptions, on averageall the mtDNA genomes at 15,000 years ago would descend from only 4 or 5 ancestral copies in the population 40,000 years ago. If these five ancestral copies were, by chance, a different haplogroup from the 15 copies we’ve already found, then drift could explain the data.

However, this still doesn’t appear very likely. So far, every one of the Neandertals shares a single haplogroup. The frequency of this haplogroup was apparently very high, making it very unlikely that all five ancestral copies would have belonged to some other haplogroups of which we have never found any trace.

Notice that this argument does not depend very much on the number of Neandertal mtDNA sequences that we have found. The fact that there are 15 helps to constrain the frequency of the haplogroup within the population 40,000 years ago, in our model. That frequency is unlikely to be less than around 85%, assuming random sampling. But suppose there were only five. We would still know that the Neandertal haplogroup was very common in its population, even if we thought it was only 50%. It would still be unlikely to draw four or five ancestral copies and have all of them be some other haplogroup that we haven’t found.

This gives us a considerable confidence margin against drift. We need it. After all, the Neandertals were not randomly sampled at a single time, and it is possible that some of them actually carried a human-like mtDNA sequence, which we now falsely interpret as contamination. But even with these shadows hanging over us, it would still be unlikely that none of the ancestors of today’s mtDNA variation were like the Neandertal haplogroup.

Also, the population was not a random-mating pool. When we add geographic structure to the story, which tends to reduce the importance of genetic drift, we find that the possibility that drift alone is almost zero, and it remains very unlikely that a single migration of modern humans interbreeding with Neandertals under random drift could explain the observations, either (Currat and Excoffier2004).

Extinction

It is at this point that most geneticists turn to the hypothesis of complete Neandertal extinction. They have a point. Genetic drift apparently cannot explain what we have observed, In their point of view, if genetic drift alone cannot explain the Neandertal mtDNA disappearance, then the only other random process at hand is extinction.

I think that hypothesis is false. It does not account for morphological similarities between Neandertals and later people, genetic evidence that suggests a strong ancient population structure with introgression, or with the apparent behavioral continuity in the Upper Paleolithic.

Happily, I don’t have a commitment to random processes. Instead, I think that the mtDNA evolution of Europe was driven by nonrandom processes of demographic turnover and selection.

Demographic turnover

Here we come to an important point. No one believes that later Europeans evolved from earlier Neandertals by a random process of genetic drift. Yet that is precisely the hypothesis that most studies have set up to refute. Without question it is valuable to set up boundary conditions under the hypothesis of random genetic drift. But the time has come to investigate more interesting models.

Personally, I am surprised that more complicated metapopulation dynamics have not gotten more attention as an explanation for the Neandertal mtDNA results. Population sources and sinks are a hot topic in biology, and you would think that anthropologists would have picked up on this. To my knowledge, the only time anyone has examined a population sink model was in 2001, when Milford Wolpoff and I worked with mathematician Per Enflo on such an idea for Neandertals (Enflo et al.2001). This idea deserves a fuller treatment (I think I’ll suggest it as a project for one of my classes this year!).

In a nutshell, a population sink is a region where the average rate of reproduction is below replacement levels. This region can remain populated only if individuals migrate in from other places. The places that reproduce above replacement are called population sources. The continual migration from sources to sinks creates a genetic gradient. Individuals sampled at any given time in the population sink are overwhelmingly likely to have ancestors not in the sink but in one or more source populations.

Europe today is a population sink. The population of the continent does not produce enough children to replace itself, and immigration from other parts of the world is high. There are several reasons to suggest that Europe may have been a population sink in prehistory as well. In Neandertal and Upper Paleolithic times, climate fluctuations created unique challenges in Europe, where caloric expenditures were high and food harder to obtain than some other regions.

Continual migration into Europe would provide a simple explanation for why none of today’s mtDNA haplogroups derive from the European Neandertals. The mtDNA population of 15,000 years ago had a few ancestors 40,000 years ago, and none of these ancestors lived in the sink population—all came from the source population in Africa or West Asia. The Neandertal mtDNA variation would have been a short-lived phenomenon, continually being turned over from source populations. Some Neandertal genes would have survived in Europe for hundreds of thousands of years, but some would have come in with more recent migrants from the population source.

There are points that argue against this source-sink hypothesis. The Neandertal-human divergence time for mtDNA is not very different than that estimated for the autosomal genome. If a European population sink had made genetic drift more powerful, that should have affected mtDNA more than the autosomes, so we might expect a more recent mtDNA divergence. Still, there is nor reason why the source-sink dynamic need have been constant over Neandertal evolution, and there may have been multiple sources in the Pleistocene, not only Africa and West Asia. Investigating the boundary conditions of the source-sink model and its correspondence to autosomal genetic results would be helpful.

I should note that mtDNA is not special. Neandertals had lots of traits that are now very rare. The horizontal-oval, or “bridged” mandibular foramen is a prominent example. Out of the relatively small sample of Neandertal mandibles, half have this derived form. Fewer than one percent of recent European mandibles have this form. As for mtDNA, a once-common variant is now very rare. And as for mtDNA, we deserve some explanation. A source-sink model would appear consistent with the continued evolution of such traits during the Upper Paleolithic—a time when the extinction and replacement hypothesis predicts no change in these characters.

Natural selection

The other nonrandom hypothesis is natural selection, which would presumably have favored one or more modern human types while eliminating the original Neandertal haplogroup. I won’t say much about that hypothesis here, since I discussed it in my initial post about the whole-mtDNA-genome sequencing. Selection has a leg up over the other hypotheses now because it seems like there’s good evidence it happened.

Still, selection on mtDNA alone could not explain the total pattern of observations about Neandertals. Physical traits that were once frequent in Neandertals were much less common or absent in later Europeans, and some continued to reduce in frequencies over time. To explain these changes, we must invoke either selection on other traits, or continued demographic turnover in the post-Neandertal population (probably more immigration into Europe) or both.

So selection on mtDNA has never been a sufficient or necessary hypothesis, even if we assume that other genes carried by Neandertals still survive. But given the current evidence that suggests something distinctive about the mtDNA of recent humans, natural selection may receive renewed attention as a factor in the disappearance of the Neandertal mtDNA haplogroup.

References


   Bocquet-Appel JP, Demars PY, Noiret L, Dobrowsky D. 2005. Estimates of Upper Palaeolithic meta-population size in Europe from archaeological data. J Archaeol Sci 32:1656–1668. doi:10.1016/j.jas.2005.05.006.

   Currat M, Excoffier L. 2004. Modern humans did not admix with Neanderthals during their range expansion into Europe. PLoS Biol 2:e421.

   Enflo P, Hawks J, Wolpoff MH. 2001. A simple reason why Neanderthal ancestry can be consistent with current DNA information. Am J Phys Anthropol 114:S62.

   Krause J, et al. 2007. Neanderthals in central Asia and Siberia. Nature 449:902–904. doi:10.1038/nature06193.

DavidB at Gene Expression continues his wonderful series on Sewall Wright with a detailed post on the population genetics of migration.

The (non-)neutral Neandertals

OK, I'm clearly going to have to cut out the beer if I'm going to do anything about stories like this one:

New research led by UC Davis anthropologist Tim Weaver adds to the evidence that chance, rather than natural selection, best explains why the skulls of modern humans and ancient Neanderthals evolved differently. The findings may alter how anthropologists think about human evolution.
Weaver's study appears in the March 17 issue of the Proceedings of the National Academy of Sciences. It builds on findings from a study he and his colleagues published last year in the Journal of Human Evolution, in which the team compared cranial measurements of 2,524 modern human skulls and 20 Neanderthal specimens. The researchers concluded that random genetic change, or genetic drift, most likely account for the cranial differences.
In their new study, Weaver and his colleagues crunched their fossil data using sophisticated mathematical models -- and calculated that Neanderthals and modern humans split about 370,000 years ago. The estimate is very close to estimates derived by other researchers who have dated the split based on clues from ancient Neanderthal and modern-day human DNA sequences.
The close correlation of the two estimates -- one based on studying bones, one based on studying genes -- demonstrates that the fossil record and analyses of DNA sequences give a consistent picture of human evolution during this time period.
"A take-home message may be that we should reconsider the idea that all morphological (physical) changes are due to natural selection, and instead consider that some of them may be due to genetic drift," Weaver said. "This may have interesting implications for our understanding of human evolution."

If you've been reading for long, you might reasonably wonder what I think about this study. My work has shown rapid natural selection in recent humans, consistent with evidence from recent skeletal samples for rapid evolutionary change. So it might seem incongruous that a study could assume that there has been no natural selection on the skeletal traits of recent human populations, and come to any kind of sensible conclusion.

I am actively working on this particular problem, with a manuscript in preparation, so I don't want to comment too extensively. However, I can say a brief word about why I disagree with the analysis.

A model of phenotypic evolution by genetic drift requires an assumption about the effective size of the population (Ne). Weaver et al. (2008) assume a model of "mutation-drift equilibrium." This is an assumption that the effective population size has not changed over time in the populations under consideration -- in this case, the Neandertal and human populations back at least as far as their common ancestor.

In their analysis, Weaver et al. (2008:4647) assume that the effective sizes of the human and Neandertal lineages, throughout the last few hundred thousand years, were equal to 2700 individuals. They wrote this:

The second reference point is the effective population size, PNe, under a mutation-drift-equilibrium model for sub-Saharan African human populations. Zhivotovsky and colleagues (ref. 17) estimated Ne from 271 microsatellites using an equation equivalent to our Eq. 7 as ≈ 2,700 individuals. Once again, we are just assuming that the morphological and microsatellite estimates should match up under the same model, not that this is the most realistic model to use to infer the actual effective population size.

This is an astounding assumption. It is important because a small effective size allows rapid evolution by genetic drift. But it is contradicted by other evidence.

For one thing, most other sets of genetic data indicate a long-term effective size of at least 10,000 for human populations -- four times larger than assumed in this study. All things being equal, this means that the rate of phenotypic evolution by genetic drift should be four times slower than assumed by Weaver et al. (2008). Some of this difference between real and assumed effective sizes may be washed out by their process of calibration -- their equations involve several unknowns that must be simultaneously estimated, and give a lot of wiggle-room to the results. But that points to another weakness of the analysis -- there's so much wiggle room that almost any level of phenotypic difference might look like "drift."

Moreover, the human population has vastly increased in numbers within the last 50,000 years. Weaver et al. (2008) use the phenotypic and genetic divergences of recent humans to calibrate their "clock" of phenotypic evolution. But the phenotypic divergences between recent human populations, with very large effective population sizes (Ne > 100,000) are simply not comparable to those between Middle Pleistocene humans and Neandertals -- at least, not without taking into account the vast difference in effective population sizes.

But please don't take my word for it. I am a clear partisan on the side of natural selection in recent human evolution. Weaver's quote in the press release above implies that we should accept a pluralistic model, in which genetic drift accounts for some changes. I agree entirely. But their analysis assumes that genetic drift accounts for all changes. I don't deny the role of genetic drift, but I do deny that it explains much about recent skeletal evolution in humans. Random chance cannot do much in a very large population in a few hundred generations.

I really don't understand why you would want to use a heuristic value for effective population size, when it is contradicted by genetic and archaeological evidence. I will be writing about effective population size over the next week, introducing some of the importance of the concept for these kinds of analyses. You're welcome to take a look at what I have to say, and take it or leave it.

Sewall Wright and the factors of evolution

Last year around this time, I noted that I happened to be reading Sewall Wright during a TV episode that mentioned Sewall Wright. It's not so unusual for me to be reading Wright, but in this instance I was directed to something I hadn't paid much attention to before.

I'm reminded of the article today because I talked about its basic theme during a lecture, and also because I'm writing up some stuff about effective population size, a concept attributed to Wright.

John Gillespie's 2000 article, "Genetic drift in an infinite population," introduced the concept of pseudohitchhiking, or "genetic draft." An important thing about pseudohitchhiking is that it behaves as a stochastic force very much like genetic drift. The formal difference between the two is that the stochasticity of a pseudohitchhiking locus depends on recombination and selection, while genetic drift depends on neither. Gillespie's paper considered to what extent pseudohitchhiking led to similar predictions for the change in allele frequency. This is a connection he made more explicit in his 2001 article, "Is the population size of a species relevant to its evolution?" by drawing out the first and second moments of neutral evolution under both drift and pseudohitchhiking. For drift, these are (Gillespie 2001:2161, eqs. 1 and 2):

First and second moments of neutral evolution

The first equation means that the expected change in allele frequency under drift is zero. This is otherwise known as the deterministic component. Under selection, the expected change in allele frequency depends on the current frequency and the fitnesses of genotypes. Under drift all genotypes have equal fitnesses and the only possible changes are stochastic, therefore the expected change is zero irrespective of the current allele frequency.

The second equation describes the variance of the change in allele frequency. You might think this variance would be zero, since the expected amount of change is zero. But the variance represents the magnitude of possible changes from the expected value due to random sampling a finite number of individuals. This is the stochastic component of allele frequency evolution.

The magnitude of these stochastic changes is directly proportional to heterozygosity and inversely proportional to population size. Larger populations have smaller potential changes in allele frequency due to random sampling. Intermediate allele frequencies (near 50 percent) can change more due to random sampling than high or low frequencies. These relations are embodied by the second equation above -- and if you're keeping score, this second equation is used in defining the variance effective population size.

The two equations help to frame the discussion of effective population size. The size of a population is relevant to its evolution only under certain contexts. If the deterministic change in allele frequencies is the dominant pattern of evolution, then population size is irrelevant to the outcome. In contrast, if random sampling is the most important cause of allele frequency changes, then the outcome (fixation or loss) may be indeterminate, but the population size is very important to the rate of the process.

As Gillespie's article makes clear, genetic drift is not the only stochastic process affecting the evolution of allele frequencies. His mechanism of pseudohitchhiking is one. And there are many others -- all non-deterministic in that their outcomes cannot be predicted from the frequencies of alleles or their phenotypic effects. The rate of these processes depends on different things: some internal to the population and some external. Genetic drift depends on the size of the population and its allele frequencies; genetic draft depends on the rate of recombination, the rate of generation of new favorable mutations, and the relative fitnesses of these mutations. Environmental stochasticity depends on the demography of other species as well as physical factors such as water availability and the weather.

Sewall Wright tried to categorize these stochastic processes, as well as the deterministic ones, making a catalog of of the processes that can cause evolutionary changes. Those of us who teach intro classes are well accustomed to talking about the "forces of evolution" -- selection, drift, gene flow, and mutation. These are important because they constitute different patterns of change in allele frequencies. But Sewall Wright went beyond this four-fold categorization, linking different aspects of these patterns with their stochastic and deterministic effects.

First, he defines the problem in terms of allele frequencies:

As is now generally appreciated, the seemingly very diverse factors that must be taken into account in population genetics can best be brought under a common viewpoint by considering their effects on gene frequency (Wright 1955:17).

Then he provides a full breakdown of different patterns of evolutionary change, or "modes" of change of the gene frequencies in a population:

Modes of Change of Gene Frequency
I. Immediate
1. Directed processes (mean change in allele frequencies determinate in principle)
a. Recurrent mutation
b. Recurrent immigration and crossbreeding
c. Mass selection
2. Random processes (variance in change in allele frequencies determinate in principle)
a. Fluctuations in mutation
b. Fluctuations in immigration
c. Fluctuations in selection
d. Accidents of sampling
3. Unique events
a. Novel favorable mutation
b. Unique hybridization
c. Swamping by mass immigration
d. Unique selective incident
e. Unique reduction in numbers
II. Secular change in system of coefficients
1. From internal causes (control by new adaptive peak)
2. From changes in environment
a. In home territory
b. In colonized territory

This breakdown clearly separates the deterministic factors of evolution (here, category 1, "Directed processes") from the stochastic factors (everything else). I find a couple of things very interesting from this perspective:

1. Wright makes a distinction between recurrent mutation, whose effect is more or less deterministic on allele frequencies, and "novel favorable mutation", each of which is a random, unlikely event. Both are distinguished from "fluctuations in mutation," which might be described as an intermediate between the two -- although writing in 1955 it is plausible that Wright may actually have meant alterations in the propensity toward mutations due to variation in radiational or chemical processes. This is one indication of the difference between Wright and Fisher, who felt that novel mutations might become more or less predictable in large populations.

I also noticed how many of Wright's "unique events" have been marshalled by one or another researcher to explain human evolution.

Another point of interest, reflecting the several instances of interesting evolutionary trends under domestication that I've linked this week, is Wright's accommodation of artificial selection within this scheme:

It may be noted here that artificial selection also imposes a new system of peaks toward one of which mass selection may be expected to drive the population rapidly. Since the peak attained is not a natural one, progress is almost inevitably at the expense of fecundity and viability. On relaxation, the population may be expected to return toward the original peak, or to another, and usually lower one, if the artificial selection has driven it across what was naturally a valley (Wright 1955:17).

This should be amended, in that selection comes at the expense of fecundity and viability in the previous environment, not the new artificially selected one. But the prediction that artificial selection should decrease fitness in the species' natural environment comes straightforwardly considering the nature of selection as a deterministic force. If the species was initially well adapted to its natural environment, any changes resulting from artificial selection would likely make it worse, not better.

Wright's well-known idea was that the stochastic factors might play an important role allowing a population to explore the adaptive landscape. In his "shifting balance" formulation, the division of an abundant species into many small subpopulations tends to maximize the species' ability to evolve toward higher fitness peaks, because a small group might have a fortuitous combination of alleles allowing it to move to a higher fitness peak. This model has been controversial even up to the present day, because of our lack of knowledge about the characteristics of "fitness landscapes".

But it is worth pointing out Wright's definition of the stochastic factors here, each of which might operate in conjunction with genetic drift in the shifting balance model. It is clear from the list that the balance between these different factors might itself change over time -- for instance, in our acceleration idea, the incidence of novel mutations is greatly accelerated in a growing population, ultimately increasing the scope of the deterministic process.

References:

Gillespie JH. 2000. Genetic drift in an infinite population: the pseudohitchhiking model. Genetics 155:909-919.

Gillespie JH. 2001. Is the population size of a species relevant to its evolution? Evolution 55:2161-2169.

Wright S. 1955. Classification of the factors of evolution. Cold Spring Harbor Symp Quant Biol 20:16-24.

The "dark matter" of modern human origins

I'm just looking through the January/February 2008 Evolutionary Anthropology, which is all about modern human origins in Africa. The special issue resulted from a conference at Stony Brook, along with a few additions to round out the topic.

I'll have some things to say about these articles, but one thing struck me. I'll describe the problem:

Dan Lieberman's paper, "Speculations about the selective basis for modern human cranial form," discusses five categories of functional requirements that might have been involved in the evolution of the "modern" human cranial anatomy. Each of these imposes distinctive requirements on the form of the head -- not all of which are fully understood -- but all of which changed in ways that parallel the basic changes in cranial form of the Late Pleistocene.

But Tim Weaver and Charles Roseman's paper, "New developments in the genetic evidence for modern human origins," claims that the modern human cranial anatomy originated by genetic drift, without any substantial selection:

Evolutionary quantitative genetic analyses, in fact, show that Neandertal and modern human cranial differences can be explained by genetic drift, making it unlikely, at least for the cranium, that modern human anatomical features were spread by natural selection rather than a range expansion out of Africa. An important point is that these analyses do not simply compare the magnitude of the morphological differences between Neandertals and modern humans; they are multivariate tests of how the patterns of covariation across different cranial measurements compare to those expected for divergence by genetic drift. Natural selective hypotheses designed to account for Neandertal and modern human cranial differences would also need to show multivariate consistency with the observed patterns of variation. While it may be possible to imagine natural selective scenarios that mimic genetic drift for a single measurement, such as fluctuating directional natural selection, the scenarios become much less plausible for multivariate patterns of variation (Weaver and Roseman 2008:78).

Both these papers cannot be correct. A full text search of Lieberman's paper does not find the words "drift" or "random," and "neutral" only appears as part of "neutral horizontal axis." Yet Weaver and Roseman cite the neutrality of cranial form as the main evidence against Eswaran's model of an adaptive dispersal of cranial form. According to them, all of Lieberman's "speculations" must be wrong.

I thought maybe I could get some insight into this dilemma by reading Günter Bräuer's paper, "The origin of modern anatomy: by speciation or intraspecific evolution." That title sounds fairly clear -- if we're talking about a speciation of modern humans to explain their anatomy, that sounds like the kind of rapid change that ought to indicate selection of some kind.

Bräuer shows some skepticism toward Lieberman's ideas about cranial evolution:

In my view, Lieberman, McBratney, and Krovitz's interpretation that anatomical modernization can be boiled down to just a few autapomorphies or genetic changes will be difficult to accommodate within the current fossil evidence (Bräuer 2008:27-28).

OK, but does this disagreement mean that Bräuer is likewise skeptical of adaptive hypotheses to explain modern cranial form? Again, a full text search fails to find the words, "drift," "neutral," or "random." But neither does it find the word "selection." Bräuer is concerned with describing the pattern of evolution of the modern human cranial form, but is entirely noncommittal on the question of why it evolved. That would seem to be problematic in itself: wouldn't we expect a different pattern of evolution if natural selection caused the changes, than if genetic drift caused them? Wouldn't the two causes make different predictions about the role of speciation in the process?

I'll have more to write about Bräuer's interesting paper, but on this issue, I think that is all I can extract from it. Osbjorn Pearson's paper, "Statistical and biological definitions of 'anatomically modern' humans," has more to say on the issue. Pearson cites the work that suggests modern human cranial form evolved under random genetic drift, saying:

Ideally, one would like to partition morphological distance into differences due to genetic drift, adaptation, and environmental interactions with ontogeny. Recently, several promising studies have shed light on these issues, including the amount of morphological diversity in recent humans that likely reflects genetic drift and the effects of the toughness of foods on the cranial morphology and occlusion of nonhuman primates, retrognathic mammals (for example, hyraxes), and humans from different parts of the world. Nevertheless, much remains to be done before these relationships become completely clear (Pearson 2008:40-41).

He later suggests (p. 44) that "rapid morphological change due to drift during population bottlenecks" may be involved in the evolution of modern cranial form. On the other hand, Pearson also suggests that "selection for new, advantageous traits or genes, or some combination of the two [selection and drift]" may have occurred. That would seem fairly noncommittal.

However, Pearson's description of the series of events -- a stepwise, sequential series of anatomical changes ultimately in a worldwide context up to and including the Holocene -- seems pretty unlikely to result from genetic drift alone. Indeed, Pearson writes,

In common with many other parts of the world, [African] crania that have dimensions or suites of morphological traits that make them statistically indistinguishable from the living populations appear only during the Holocene (Pearson 2008:45).

If the evolution of modern cranial form is a process that continued into the Holocene, it is quite impossible to have been caused by drift alone, since the effective population sizes of human populations were too large, and drift could hardly have caused a "nearly universal pattern of gracilization" (ibid.). So Pearson's paper certainly heightens the contrast between the adaptive and drift scenarios. If the events are as Pearson describes them, the "genetic drift alone" hypothesis must be false.

Philip Rightmire's paper is about earlier events, and Chris Stringer and Nick Barton's paper is a conference review. That leaves only Ian Tattersall and Jeff Schwartz's paper, "The morphological distinctiveness of Homo sapiens and its recognition in the fossil record: clarifying the problem," to clarify the problem.

Tattersall and Schwartz direct their attention to the kinds of features that are suitable for identifying a species from the fossil record -- uniquely derived features, or "autapomorphies." In their view, species must be accurately diagnosed from sets of specimens ("alpha taxonomy") before any kind of evolutionary hypotheses can be tested.

Because of this, they don't talk very much about the kinds of evolutionary forces that might cause the patterns they see. The paper includes only one reference to "random" and "adaptive," both in a single sentence:

However, there are some materials of this period [the late Middle Pleistocene] that fall outside, but not far outside, the strictest definition of Homo sapiens as based on the living species. Most of these (for example, Border Cave 5, Boskop, Fish Hoek, Klasies River Mouth except for AP 6222, and maybe Cave of Hearths) form a generally poorly dated South African group in which cranial structure largely conforms to the modern Homo sapiens morphology except that, most notably, the bipartite brow and/or the inverted-T-shaped chin are lacking. Do such fossils represent distinctive and now extinct populations of Homo sapiens that lacked two or more of the most striking autapomorphies of the living species merely as a result of random (or even adaptive) population variation? Or did they belong in life to one or more distinctive reproductive entities whose histories did not impinge, at least biologically, on that of today's Homo sapiens? (Tattersall and Schwartz 2008:52, emphasis added)

The bolded sentence is important. Tattersall and Schwartz view adaptive and random variations as equivalent: small changes between populations that may occur even without the kind of significant isolation that would invite a taxonomic interpretation. They contrast these in the next sentence with "distinctive reproductive entities whose histories did not impinge." And they are correct; modern human populations have morphological differences as a result of both selection and drift, and their histories certainly have impinged on each other.

But it makes a difference whether selection or drift was the cause of changes, because selection is more powerful than drift. Weak selection can cause a level of morphological differentiation that would require long isolation by random drift alone. If selection were involved in African regional differentiation, there may be no reason to posit "distinctive reproductive entities whose histories did not impinge" -- in fact, their histories almost certainly would have impinged.

In other words, the relation of the pattern of features to the taxonomic status of the populations depends on the evolutionary forces that generated the pattern.

As Weaver and Roseman note, their hypothesis that modern human cranial form evolved neutrally depends on the pattern of evolution of different features, not the amount of evolution of any single feature. But the amount of evolution must still be explained; under their hypothesis, it must have occurred in small populations over a substantial period of time. In their hypothesis, the cranial differentiation of African late Middle/early Late Pleistocene fossils would have emerged during relatively long periods of parital or complete isolation. Under that hypothesis, Tattersall and Schwartz would be correct to place these fossils into different taxa, only one of which was ancestral to living people -- or at least principally ancestral, allowing for some small amount of hybridization and introgression.

In contrast, Lieberman's adaptive hypotheses are consistent with the evolution of modern human cranial morphology within a broader, larger population. Patterns of selection may explain the variation among the fossils. Today's humans may have emerged from a population with substantial cranial polymorphism. That scenario would seem to be consistent with the patterns described by Pearson -- in which modern human cranial variation does not standardize until very late, perhaps even Holocene times. Only selection could cause this kind of evolution within the large populations of the last 10,000 years, or even within the large populations of the last 70,000 years.

I picked this problem first, because it was the first to stand out to me in the papers. It does seem a fairly glaring contradiction. I don't expect the authors to have noticed the contradiction in advance; I think that they approach the question of human origins from fundamentally different viewpoints.

As you can tell, two of the papers are not concerned with the causes of evolution at all -- their aim is to map the pattern of morphological variation onto putative speciation events. But it seems to me that if we approach the fossil record with the idea that speciation is the major cause of such patterns, then we have already assumed how the evolution happened. It may not have escaped your notice that this is the major reason for disagreement about modern human origins: One group of authors wants to assume the conclusion, foreclosing further discussion.

I don't have any complaints about the papers that were chosen for the issue -- in fact, I'm interested in reading the current opinions of all these authors. So far, I would say that each paper is a well-written expression of its authors' ideas, and I appreciate having all that in one place.

But it does seem a little strange that a special issue devoted to modern human origins in Africa doesn't have more, um, diversity of opinion. Several of the papers discuss multiregional evolution. They apparently believe that it is an important enough viewpoint to include their reasons for disbelieving it. One of the papers (Weaver and Roseman) includes a section about genetic introgression, kindly citing my work. Another (Bräuer) claims that it is reasonable to include all Middle Pleistocene humans in Africa and Europe as part of "one polytypic species, Homo sapiens" (Bräuer 2008:32).

So the work of those of us who write about evolutionary mechanisms seems to be making an impact. Still, it's kind of like "dark matter" -- you only know about the ideas because of their effects on what you can read! In this case, you can read a lot of peoples' opinions about these ideas -- you just can't read them from the people who thought of them.

What boring meetings these must be, with everybody agreeing with each other all the time, and nobody to point out all these contradictions!

References:

Bräuer G. 2008. The origin of modern anatomy: by speciation or intraspecific evolution? Evol Anthropol 17:22-37. doi:10.1002/evan.20157

Lieberman DE. 2008. Speculations about the selective basis for modern human cranial form. Evol Anthropol 17:55-68. doi:10.1002/evan.20154

Pearson OM. 2008. Statistical and biological definitions of "anatomically modern" humans: Suggestions for a unified approach to modern morphology. Evol Anthropol 17:38-48. doi:10.1002/evan.20155

Tattersall I, Schwartz JH. 2008. The morphological distinctiveness of Homo sapiens and its recognition in the fossil record: Clarifying the problem. Evol Anthropol 17:49-54. doi:10.1002/evan.20153

Weaver TD, Roseman CC. 2008. New developments in the genetic evidence for modern human origins. Evol Anthropol 17:69-80. doi:10.1002/evan.20161

Why accelerated adaptive evolution is faster evolution

RPM at Evolgen has a post raising a concern I've been seeing a lot the last week or two:

If you add up all three classes of mutations -- deleterious, neutral, and beneficial -- and figure out how many have fixed over the time scale you're looking at, you get the amount of evolutionary change along the lineage in question. So, to say that there was increased evolution along the human lineage in recent history implies that there was an increase in the total number of genetic changes. However, an increase in the amount of adaptive evolution (or an increase in the number of mutations fixed by positive selection), means there was an increase in the number of beneficial changes along the human lineage in recent history.

Here's the point in a nutshell:

1. Our recent acceleration paper suggests that the rate of adaptive human evolution has vastly increased during the past 40,000 years.

2. Some people confuse the idea of adaptive evolution with the idea of neutral evolution.

3. We can't let this happen, because, well, choose one: (a) we're good acolytes of Stephen Jay Gould; (b) people might start suggesting that all the human phylogeography based on "neutral" loci is irrelevant or worse; (c) we have a deep concern with the pattern of evolution of gene variants that don't actually do anything interesting.

I tend to notice that the various critiques of acceleration don't include any mathematics. I don't really understand this, since the math is simple. It is a whole lot easier to look at this algebra than to write a four or five-paragraph blog post!

So, let's consider some of the mathematical relations describing neutral evolution and how they apply to the recent increase in human population numbers.

1. The expected change in frequency of a neutral allele each generation is zero. That is, after all, why we call them neutral.

2. But the variance in the change in frequency of a neutral allele is related to population size -- in fact it is p(1 - p)/2Ne, where Ne is the effective population size (actually the variance effective size).

3. Because of this relation, neutral alleles in large populations change more slowly in frequency than those in small populations. Once human populations reached an effective size on the order of 100,000 -- certainly by 40,000 years ago -- the change in allele frequency due to drift alone became extremely small (on the order of 10-6 or less per generation).

4. So neutral evolution in the past 40,000 years should have vastly slowed compared to earlier phases of human evolution.

Except...

5. Changes in population size make absolutely no difference to the neutral substitution rate. The rate of generation of new neutral mutations is directly proportional to population size (2Neu for an autosomal locus). But the rate of fixation is inversely proportional to population size (1/2Ne). So the neutral substitution rate is simply u: the neutral mutation rate, irrespective of population size. That's part of what makes the neutral substitution rate cool -- and of course, what underlies the molecular clock assumption.

6. From this, we might conclude that the rate of neutral evolution was absolutely unchanged in the last 40,000 years. Of course, now it is obvious that the problem is what we mean by "rate" -- do we mean the substitution rate or the per-generation rate of change in allele frequency?

Except...

7. It should be obvious that we don't mean "neutral substitution rate" because this is irrelevant to recent human evolution. The fixation time of a new neutral mutation is directly proportional to the effective size of the population (4Ne generations for an autosomal locus). It doesn't take much figuring to show that is a long, long time from now with today's population size. There is no chance that a new neutral mutation within the last 40,000 years could be near fixation today -- in fact, every neutral segregating allele 40,000 years ago ought to still be segregating today!

8. From that perspective, we might well conclude there has been no neutral evolution in the last 40,000 years -- because it is vanishingly unlikely that any neutral variation has been lost during that time.

Except...

9. Our study actually did find a large number of neutral areas of the genome that had recently approached fixation, and a much larger number of initially rare neutral variants that have reached substantial frequencies during the last 40,000 years. Empirically, neutral evolution has been very rapid during recent human history. This is entirely the result of ...

10. Hitchhiking. The fast rate of generation of new adaptive mutations means that the rate of neutral evolution by hitchhiking has vastly accelerated in the recent past. This is, after all, how we manage to find evidence of selection in the first place -- the hitchhiking effect on neutral markers!

Therefore, the rate of neutral evolution in humans really has accelerated, as a function of hitchhiking on new adaptive mutations. For every selected mutation, we are talking about hundreds of kilobases' worth of linked neutral variants that have been experiencing rapid changes in frequency due to hitchhiking. In the long run, this will have not a jot of effect on the neutral substitution rate, but it accounts for most of the neutral evolution of allele frequencies in human populations.

I expect that there will be people who don't like this idea. I expect many of them have been counting on various neutral markers being informative about population movements. I'm not saying that neutral markers aren't informative, but we really need to consider the effects of selection on these distributions of markers.

Another class of people who don't like this idea are those who propagate one of my pet peeves -- the idea that we need to "invoke" selection as some kind of extraordinary event. The use of this term is very clear: Its only purpose is to vilify folks who want to explain evolution in terms of Darwin's mechanism. It's precisely the same way that we vilify creationists -- they want to "invoke" supernatural forces to explain evolutionary changes.

It's time to get the message -- natural selection has been the major force driving recent human evolution. Humans are no exception to the natural order -- any species that has increased in numbers and changed in ecology to the extent of ours should undergo a rapid pulse of selection resulting in the appearance and proliferation of many more new adaptive mutations. In fact, it looks like domesticated species like maize have undergone a similar effect. There's no "invoking" here, and neutrality is not a hypothesis that can explain these observations.

The foregoing should make one thing very clear -- I have nothing against neutral evolution. I am not an "adaptationist", and have no stakes whatsoever in the "adaptationist-neutralist controversy". This is not a matter of preferences or verbal arguments -- it is simple algebra!

What's more, its pretty obvious that this account of recent neutral evolutionis an evolutionary scenario of which Stephen Jay Gould would have been proud: the most widespread source of change in human genes is chance linkage to a relatively small number of selected sites.

It's just that there are quite a few more of these selected sites than anybody probably expected to find.

Most phenotypic evolution is neutral, IV

Irish elk skeleton; really big antlers

Skeleton of an Irish elk (Megaloceros giganteus) at the Carnegie Museum of Natural History. Photo by Via Bulatao, available on Flickr. Creative Commons license

Syndicate content