acceleration

Bruce Bower has a really nice feature article in Science News about my work on hearing and recent selection:

It all points to the evolutionary sensitivity of at least one part of the human language system in the post–Stone Age world, Hawks reported in April in Columbus at the annual meeting of the American Association of Physical Anthropologists. Language depends not just on a vocal tract capable of making certain speech sounds but on ears designed to hear particular sound frequencies, as well as on a variety of other brain and body features. Relatively recently in evolutionary history, genetic revisions within populations have upgraded ear structures needed for discerning what other people say, he proposes.

“It takes a long time for a biologically complex system like language to evolve,” Hawks says. “We’re still genetically adapting to language.”

This is a really nice article, and I wasn't expecting it to come out, so please go read it!

Filed under

I'm featured in an article in U.S. News and World Report, by Nancy Shute. It was a great interview, and she's put together our work on recent acceleration with some questions about where human evolution is headed.

She also cites work by Simon Baron-Cohen, Gregory Wray and Nick Bostrom. It's a nice group to talk about recent and ongoing changes in human biology.

I have to say one thing about being interviewed for the story that had me rolling at the time. I was called by a fact-checker to verify my quotes -- he seemed like a really knowledgeable, broadly-read person. He was very careful to check everything thoroughly, and asked several probing questions to make sure.

That is, until he came across the idea that "more people means more mutations." "Well," he said, "that just makes sense, doesn't it?"

I laughed and laughed! I said, "Yes, you say that now, but that's exactly what we had to show!"

"Oh," he said. "You had to show that? It seems pretty simple to me."

Filed under

Tuberculosis is interesting, but a lot older than 10,000 years

Hebrew University has issued a press release about ongoing research on human and animal bones from the Jericho excavations. They're looking for signs of tuberculosis:

While the origins of tuberculosis and its evolution remain unclear, it is thought it came from the first villages and small towns in the Fertile Crescent region about 9-10,000 years ago. Jericho is one of the earliest towns on earth, dating back to 9,000 B.C., and so a lot of communicable - or town - diseases would have had a good start in this community.

By examining human and animal bones from this site, the researchers will be able to see how the first people living in a crowded situation developed the diseases of crowds and how this affected the disease through changes in DNA -- of both the microbes and the people.

The most significant results of this research will come from a comparison between those data for humans and corresponding animal remains which may allow the identification of animal-human vectors and their interaction.

That's all very interesting, and looking for newly-virulent versions of tuberculosis in Neolithic bones is not a bad idea. But somebody ought to tell them that the zoonosis hypothesis (that tuberculosis was recently derived from domesticated animals like cattle) looks a lot less likely, now that ancient strains of the pathogen up to 3-million-years old have been found in living people, and signs of the disease have been found in a Middle Pleistocene human.

Anyway, that doesn't refute the idea that major changes in the pathogen population may have happened with human population growth, as new large reservoirs of people emerged. And it's quite possible that the germ went from humans to some of its animal hosts at that time, so studying the animal bones may give some information about the event. But they'll want to start with the idea of diversity within humans, not the other way around.

Why accelerated adaptive evolution is faster evolution

RPM at Evolgen has a post raising a concern I've been seeing a lot the last week or two:

If you add up all three classes of mutations -- deleterious, neutral, and beneficial -- and figure out how many have fixed over the time scale you're looking at, you get the amount of evolutionary change along the lineage in question. So, to say that there was increased evolution along the human lineage in recent history implies that there was an increase in the total number of genetic changes. However, an increase in the amount of adaptive evolution (or an increase in the number of mutations fixed by positive selection), means there was an increase in the number of beneficial changes along the human lineage in recent history.

Here's the point in a nutshell:

1. Our recent acceleration paper suggests that the rate of adaptive human evolution has vastly increased during the past 40,000 years.

2. Some people confuse the idea of adaptive evolution with the idea of neutral evolution.

3. We can't let this happen, because, well, choose one: (a) we're good acolytes of Stephen Jay Gould; (b) people might start suggesting that all the human phylogeography based on "neutral" loci is irrelevant or worse; (c) we have a deep concern with the pattern of evolution of gene variants that don't actually do anything interesting.

I tend to notice that the various critiques of acceleration don't include any mathematics. I don't really understand this, since the math is simple. It is a whole lot easier to look at this algebra than to write a four or five-paragraph blog post!

So, let's consider some of the mathematical relations describing neutral evolution and how they apply to the recent increase in human population numbers.

1. The expected change in frequency of a neutral allele each generation is zero. That is, after all, why we call them neutral.

2. But the variance in the change in frequency of a neutral allele is related to population size -- in fact it is p(1 - p)/2Ne, where Ne is the effective population size (actually the variance effective size).

3. Because of this relation, neutral alleles in large populations change more slowly in frequency than those in small populations. Once human populations reached an effective size on the order of 100,000 -- certainly by 40,000 years ago -- the change in allele frequency due to drift alone became extremely small (on the order of 10-6 or less per generation).

4. So neutral evolution in the past 40,000 years should have vastly slowed compared to earlier phases of human evolution.

Except...

5. Changes in population size make absolutely no difference to the neutral substitution rate. The rate of generation of new neutral mutations is directly proportional to population size (2Neu for an autosomal locus). But the rate of fixation is inversely proportional to population size (1/2Ne). So the neutral substitution rate is simply u: the neutral mutation rate, irrespective of population size. That's part of what makes the neutral substitution rate cool -- and of course, what underlies the molecular clock assumption.

6. From this, we might conclude that the rate of neutral evolution was absolutely unchanged in the last 40,000 years. Of course, now it is obvious that the problem is what we mean by "rate" -- do we mean the substitution rate or the per-generation rate of change in allele frequency?

Except...

7. It should be obvious that we don't mean "neutral substitution rate" because this is irrelevant to recent human evolution. The fixation time of a new neutral mutation is directly proportional to the effective size of the population (4Ne generations for an autosomal locus). It doesn't take much figuring to show that is a long, long time from now with today's population size. There is no chance that a new neutral mutation within the last 40,000 years could be near fixation today -- in fact, every neutral segregating allele 40,000 years ago ought to still be segregating today!

8. From that perspective, we might well conclude there has been no neutral evolution in the last 40,000 years -- because it is vanishingly unlikely that any neutral variation has been lost during that time.

Except...

9. Our study actually did find a large number of neutral areas of the genome that had recently approached fixation, and a much larger number of initially rare neutral variants that have reached substantial frequencies during the last 40,000 years. Empirically, neutral evolution has been very rapid during recent human history. This is entirely the result of ...

10. Hitchhiking. The fast rate of generation of new adaptive mutations means that the rate of neutral evolution by hitchhiking has vastly accelerated in the recent past. This is, after all, how we manage to find evidence of selection in the first place -- the hitchhiking effect on neutral markers!

Therefore, the rate of neutral evolution in humans really has accelerated, as a function of hitchhiking on new adaptive mutations. For every selected mutation, we are talking about hundreds of kilobases' worth of linked neutral variants that have been experiencing rapid changes in frequency due to hitchhiking. In the long run, this will have not a jot of effect on the neutral substitution rate, but it accounts for most of the neutral evolution of allele frequencies in human populations.

I expect that there will be people who don't like this idea. I expect many of them have been counting on various neutral markers being informative about population movements. I'm not saying that neutral markers aren't informative, but we really need to consider the effects of selection on these distributions of markers.

Another class of people who don't like this idea are those who propagate one of my pet peeves -- the idea that we need to "invoke" selection as some kind of extraordinary event. The use of this term is very clear: Its only purpose is to vilify folks who want to explain evolution in terms of Darwin's mechanism. It's precisely the same way that we vilify creationists -- they want to "invoke" supernatural forces to explain evolutionary changes.

It's time to get the message -- natural selection has been the major force driving recent human evolution. Humans are no exception to the natural order -- any species that has increased in numbers and changed in ecology to the extent of ours should undergo a rapid pulse of selection resulting in the appearance and proliferation of many more new adaptive mutations. In fact, it looks like domesticated species like maize have undergone a similar effect. There's no "invoking" here, and neutrality is not a hypothesis that can explain these observations.

The foregoing should make one thing very clear -- I have nothing against neutral evolution. I am not an "adaptationist", and have no stakes whatsoever in the "adaptationist-neutralist controversy". This is not a matter of preferences or verbal arguments -- it is simple algebra!

What's more, its pretty obvious that this account of recent neutral evolutionis an evolutionary scenario of which Stephen Jay Gould would have been proud: the most widespread source of change in human genes is chance linkage to a relatively small number of selected sites.

It's just that there are quite a few more of these selected sites than anybody probably expected to find.

Tracking back to acceleranistas

I've had a very busy couple of days, and haven't been maintaining my reading-and-linking as much as I had hoped. So I wanted to take a few minutes to do a quick tour of the blogosphere to see what people are saying about the idea of acceleration.

I'm linking to posts I have read, and in some cases commented on. They are a mix of explanation of the concepts, applauding the ideas and analysis, and criticism of the methods. What I most want to point out is that the discussion on blogs is at a very high level -- people are reading the paper with much more precision than I have ever experienced in the peer review process. This is really the best that today's science community has to offer.

One of the best posts is over at LiveJournal, where shoshin works through the theoretical part of the paper. Naturally, this is my favorite part -- and shoshin describes things exceptionally well. The beginning is great:

The case for a recent acceleration of human evolution in the last 40K years (and especially the last 10K) follows pretty straightforwardly from evolutionary first principles combined with elementary facts about human history since the late Pleistocene. So straightforwardly, in fact, that you have to wonder why nobody thought of it sooner. It's one of those rare cases where the theoretical argument is so strong that you can pretty much use accordance with it as a test of experimental methods at least as much as the other way around.

Razib works through the paper at Gene Expression, in a long, detailed post. I like this part:

We are now the most numerous large mammal on the face of this planet. Using the data above the authors imply that our species has been subject to somewhat more that 1/2 a substitution per year. Remember, a substitution is a replacement of one allele for another at a locus on a population wide scale. If this is correct that means right now every few years alleles driven by selection are being fixed within our species.

At the old-school Gene Expression, p-ter posts some analysis and critiques. A great comments section has arisen on this post, including comments from some of the principals, and general comments about the quality of the discussion on blogs compared to the journal process. I've answered some of the points in my rarely asked questions post, but the most powerful part bears repeating:

Every distribution has a tail, so if they were to move their threshold a bit further to the right, surely they'd be able to narrow down the number of regions to something consistent with a constant rate. That is, the entire argument is predicated on perfectly identifying selection in the regions of the parameter space they search. This is a major assumption, and not one I'm willing to make without strong evidence. They provide none.

Actually, with an acceleration of around two orders of magnitude, we can tolerate a lot of slop in the estimates. We don't need to perfectly identify selection -- in fact, we'd still have strong support for rapid acceleration if we threw away 95 percent of our data! Naturally, we don't have to do that -- our methods are based on a threshold that eliminates nearly all false positives, and we are missing the vast majority of events. For one thing, the LDD test doesn't find selection on multiple alleles at the same locus. I am working on new methods that will find some of these kinds of events, but for the time being we continue to interpret all things conservatively.


Andrew Sullivan posts approvingly:

I posted on this potentially world-changing research this afternoon. Here's a helpful, chatty, specialist blog with lots of extra links if you're scientifically literate and curious.

What I want to know is, sure, Razib is helpful and chatty, but what am I, chopped liver?


Larry Moran has added several posts on the research, starting with this one:

In addition to the major flaw in logic, there are many other things wrong with the claim that modern humans have stopped evolving. The claim carries with it a very loaded assumption that is never explicitly stated. The assumption is that humans have pretty much reached their optimal level of fitness for all other characteristics. For example, we are no longer selecting for higher intelligence, or a better immune system, or more efficient energy production, or stronger muscles, or any of a host of other things that might make us better adapted to all environments.
Why is this assumption necessary? Because nobody could possibly suggest that we have stopped evolving without assuming that we have reached optimal fitness for all those things in our present environment.

Larry follows with several other posts, some critical, focused in part on the problem of how much evolution is explained by positive selection as opposed to other forces.


Nature's blog, "The Great Beyond" notes the paper and the resulting discussion.


More will follow...

Why human evolution accelerated

n. b. This is a story about my work on recent human evolution, describing some of the main results and how the work came about. The story refers to my paper (with Gregory Cochran, Eric Wang, Henry Harpending, and Robert Moyzis), "Recent acceleration of human adaptive evolution," which came out in December, 2007.

Like most good stories in biology, this one begins with Darwin. Darwin was always very interested in animal breeding, which he considered the best analogy for the process of natural selection. Of course, if you're breeding livestock and want to select for some characteristics, it is important to select from as large a herd as possible, because large populations have more variation in them. Darwin recognized this as an important condition for natural selection, which relies on sufficient variation in natural populations.

[A]s variations manifestly useful or pleasing to man appear only occasionally, the chance of their appearance will be much increased by a large number of individuals being kept.... Hence, number is of the highest importance for success.

These words from the Origin, "number is of the highest importance for success" were influential.

This is a quick review of the research, based on a presentation I gave earlier this year. It is not complete, and glosses a number of very important details. A close reader looking for how to do genomics would be better served reading the actual research paper. Here, I'm trying to express the science for everyone else.

By 1930, R. A. Fisher picked up Darwin's idea about numbers, predicting that evolution in large populations could be faster than in small populations. However, this is not in all circumstances, but only where the number of new adaptive mutations is quite small -- in other words, where evolution is "mutation-limited":

The great contrast between abundant and rare species lies in the number of individuals available in each generation as possible mutants.... The importance of the contrast lies with the extremely rare mutations, in which the number of new mutations occurring must increase proportionately to the number of individuals available.

A long history of research in plant genetics (corn breeding), microbial chemostat experiments, and the examination of pesticide resistance in insects support Fisher's concept. For example, flies subjected to low doses of pesticide in the laboratory tend to acquire very complicated patterns of resistance -- involving slight changes in many different genes. These usually aren't transmitted perfectly and often have fitness costs; it's a very imperfect adaptation. But if pesticide is sprayed over a large area, flies sometimes appear very quickly with a single mutation that confers very complete resistance. Here, the very advantageous resistance mutation is incredibly rare -- it only occurs in maybe one in a billion flies. It would never occur in the small laboratory population.

Our growing population

Human populations have been growing rapidly during the last 50,000 years or so. That increase began around the time of the Upper Paleolithic -- that's documented by archaeological evidence. There was a later massive increase during the Neolithic. This agricultural transition actually was quite heterogeneous: earlier in West Asia and China, later in Europe, and then later still in subsaharan Africa. Last, we have within the last few hundred years seen a massive increase in numbers associated with industrialization and globalization of technology.

One day a couple of years ago, Greg Cochran and I were talking about brain evolution. You have to understand, this is long before we knew about any of these genome scans -- they hadn't come out yet. One of the main mysteries of human brain evolution is why it happened apparently gradually for such a long period of time. It is one of the best cases of evolutionary gradualism. But this is a problem, because directional selection would have too be too weak to take such a long time. Now, we know that brain size is constrained in two directions -- larger brains cost more energy to maintain, but smaller brains come with some functional disadvantages. So this creates a situation where new variants that satisfy both constraints -- costing little energy, or making great improvements in brain function -- must be very rare. It should be mutation-limited.

I remember very well, that at precisely the same moment, we both realized -- "Hey, maybe this great increase in human population size made a difference!" Because as we'll see later, the pattern of change in brain size really changed when populations started to get really big.

You see, this is one of those very rare cases where the theory preceded the data! It is quite simple; the rate of mutations in a population is a linear product of the rate per genome and the population size.

Not all mutations are advantageous, and not all advantageous mutations will be fixed. The vast majority are lost. If a mutation has a selective advantage, then the chance that it will proceed toward fixation (and attain high frequency) is 2s -- "s" here is the fitness advantage. That means that 90 percent of new mutations with a 5 percent fitness advantage are simply lost.

The most beneficial mutations are very rare; it is much more likely that a new mutation will be weakly selected. This is another aspect of selection that has been well-known since Fisher. So the chance of fixation increases with s, but the likelihood of the mutation decreases with s -- in fact, the number decreases exponentially as selection is stronger and stronger.

If you put all these together, you can predict how many selected changes you should see in a population that has been growing in size. This tells us the number of new adaptive mutations that should come into the population each generation. It is still linear with population size -- a larger population should have more mutations in precise proportion to its size.

Still, a very small fraction of the mutations in any given population will be advantageous. And the longer a population has existed, the more likely it will be close to its adaptive optimum -- the point at which positively selected mutations don't happen because there is no possible improvement. This is the most likely explanation for why very large species in nature don't always evolve rapidly.

Instead, it is when a new environment is imposed that natural populations respond. And when the environment changes, larger populations have an intrinsic advantage, as Fisher showed, because they have a faster potential response by new mutations.

From that standpoint, the ecological changes documented in human history and the archaeological record create an exceptional situation. Humans faced new selective pressures during the last 40,000 years, related to disease, agricultural diets, sedentism, city life, greater lifespan, and many other ecological changes. This created a need for selection.

Larger population sizes allowed the rapid response to selection -- more new adaptive mutations. Together, the the two patterns of historical change have placed humans far from an equilibrium. In that case, we expect that the pace of genetic change due to positive selection should recently have been radically higher than at other times in human evolution.

Finding selection in the genome

Now, it comes to a problem of how we can see recent mutations that have been selected. A genome scan is based on things that vary, not things that are fixed. So we are looking at some window of frequencies. In our study, that was a window from around 22 to 78 percent.

Before we go too far, it is important to point out that an adaptive gene will be in a window where we can detect it for only a short time -- it spends a long time getting up to an appreciable frequency (here 22 percent, which is our lower ascertainment bound) and a long time going from a high frequency (here 78 percent) to fixation -- this is for a dominant. But it spends only a very short time in the window where we can see it.

And strongly selected genes go through this window quite a lot faster than weakly selected ones.

The importance of this is that we will see genes with different strengths of selection at different ages. Our constraint is that right now all the things we can see are variable -- but some are variable because they originated a short time ago and were very strongly selected, and others are variable because they originated a long time ago, but were very weakly selected.

You can guess, that we expect to see more of the weak ones than the strong ones, because there should be more of them! So the window should give us a view of the strength of selection as well as the number of mutations. If we can estimate the ages of our mutations, then we can predict how many there should be at different strengths of selection, and try to quantify the effect of population size.

Here, we've drawn a graph showing the number of genes in the window, compared with the number that are still variable in the population -- they are on their way to fixation -- but they are outside the window. This is for a growing population, so you see that the number of these genes increases as you get closer to the present.

Tip of the iceberg

There are many more that we can't see than the ones we can see -- this is like the tip of the iceberg. That is one aspect of recent selection; these genes are in this intermediate frequency range for a short time, and there will be many more genes that are too rare for us to see with our current methods, but might be very important regionally or locally in some populations.

Based on a model of population growth, we expect to see a big peak corresponding to the period when humans were growing rapidly during the Neolithic. The distribution should plunge down toward the present, because selection would have to be so strong on such a recent mutation for us to see it -- we're talking about 20 percent or more. Those just almost never happen. The true number, remember, is the iceberg under the water -- but we must make predictions about the part we can see.

Linkage disequilibrium and selection

Now, I need to say a few words about how we find these genes when we scan the genome. The International HapMap consists of a list of over 3 million genetic polymorphisms -- SNPs -- taken from a sample of people with ancestry in Northern Europe, West Africa, and East Asia. When we look at a sample of a long stretch of DNA from several people, we will be considering the frequency of many different polymorphisms.

But more important, we have studied whether each polymorphism is linked to the others. As a new positively selected allele increases in frequency in a population, it is initially linked to a wide region including many nearby polymorphisms. This induces a long-distance association among SNPs, which is called linkage disequilibrium.

When we are looking at a stretch of chromosome, what we can observe is that there are areas where recombination seems to be very rare around one SNP -- an in particular where one of the two SNP alleles has almost no recombinant chromosomes, but the other allele appears to have been recombining normally. That kind of mismatch is a strong indication of selection.

I'm not going into the details of that process right now; I'll be posting some real examples of such LD decay analyses later in the week. After applying the analysis, we found more than 3000 in the Yoruba sample, more than 2800 in Europeans, and more than 2300 in Asians.

These numbers are very large -- they make it look like this aspect of evolution, positive selection on new adaptive alleles, has been going very fast. But how long a time period are we looking at? Based on the local rate of crossing-over, we can say how quickly LD ought to be broken by new recombinations, and that allows us to derive age estimates. The ages represent the time that has elapsed since the initial mutation that established each adaptive allele.

Here is a comparison between the ages of selected variants in the African HapMap and in the European HapMap. Let's look at this graph a little bit.

Selected variants

Each of these dots represents a number of different genes -- the y-axis is number; this is a histogram. The x-axis is the age. So you see, there are many of these selected genes that started around 10,000 years ago; there are many fewer that started around 40,000 years ago, and even fewer starting 80,000 years ago.

These fitted lines are what you get if you fit a one-parameter model with very strong selection to these curves. You can fit these without considering the effects of population growth.

But you notice some differences here between the African and European distributions. Africa has a few more total variants, but it especially has more older variants, before 10,000 years ago. You can see that during that time period, Europe has very few. And Europe has this later peak, where we see an earlier peak in Africa.

These details are a very good match to demographic growth -- Africa had much larger population size during the Late Pleistocene than Europe, but West Asia, and then Europe had earlier Neolithic expansion than Africa -- so we see these early times have a lot more selected variants within Africa, and later on there is a pulse of adaptive variants in Europe.

Testing acceleration

At this point, we have a theory that predicts acceleration of new adaptive variants, and we have data that appear to show a very fast recent rate. But we haven't yet directly tested the hypothesis of acceleration.

We chose a null hypothesis approach. After all, the rate of change looks like it has been very high recently, but what it if were always very high. A constant rate of change is a null hypothesis -- the hypothesis of no change, or in our case, no acceleration. So we worked out the predictions of this hypothesis: a constant, high rate of selection. If we could show that those predictions aren't true, then we could disprove the null hypothesis and show that adaptive human evolution accelerated.

We took several different approaches, testing predictions on different kinds of data. For one thing, if the null hypothesis were true, then there should be a whole lot more selected mutations that have already reached or approached fixation, than the relatively small number that we see still varying in human populations. So to test the null hypothesis, we should look for evidence of these fixed selected substitutions.

That's exactly what we did -- we looked at other means of assessing the number of recently fixed and near-fixed variants.

Fixed variants

On the bottom of this graph, we have the European age distribution of variants in our window. This should represent a small fraction of the total number that have happened across this time period. But you can see from this graph, that if the rate was constant, the total number should be very, very large -- since we are looking at 10-generation bins, here we have around 150 predicted substitutions every 10 generations, or around 1/2 per year. Most of these should be way above our window, in fact, as we go back toward 40,000 years ago, almost all should be close to or at fixation.

This large number of completed sweeps should have vastly reduced human genetic variation, because polymorphisms tend to hitchhike along with nearby selected alleles. Hitchhiking up to fixation tends to eliminate variation. When we look at the effect of hitchhiking under this constant selection hypothesis, the genome-wide average diversity should be less than a tenth of what we actually observe. So that also disproves the null hypothesis.

How much acceleration?

Down at the bottom of the graph, you see the predicted number of selected variants over our window, under the hypothesis of population growth -- exactly the demographic growth that really happened to humans. And here you see, that there are many, many fewer of these predicted, and in fact over the long course of human evolution, the rate would have been very low.

We can put a number on just how low, and when we do that, we can see how much human evolution has sped up. For example, if we have 1/2 of a substitution per year, well, there are around 12,000,000 years separating humans and chimpanzees (6 million since the common ancestor, in both these lineages). So if adaptive substitutions had happened at a constant rate as high as the last few thousand years, we should be looking at around 6 million fixed adaptive substitutions between humans and chimpanzees.

But in reality there have been nowhere near that number. There are only 40,000 total amino acid substitutions between humans and chimps. Not all those were selected -- maybe only a third. We can add in some additional selected sites outside of coding regions, but still we are looking at an increase in the rate of new adaptive mutations in humans that is 100 times faster than could possibly have been true during most of human evolution.

Our evolution has recently accelerated by around 100-fold. And that's exactly what we would expect from the enormous growth of our population.

What is all this selection for?

We know something about the functional categories of genes inferred to be under selection; we are studying this now. We expect it will keep us busy for some time.

In a general view, they illustrate the idea that changing cultures and ecologies have been important in changing the pattern of selection. For example, many of the selected genes are involved with pathogen defense -- for new pathogens that didn't always exist. Some are apparently related to metabolism or even directly to diet, in terms of processing new food sources. Of course, lactase is an excellent example in this category.

These are not the kinds of phenotypes that have a lot of visibility in skeletal remains. But we have a skeletal record of these populations during the last 40,000 years. We know a lot about what they looked like and how they changed. So we may try to relate the pattern of genetic, skeletal, archaeological, and other kinds of changes over time.

One obvious way to test hypotheses about these changes would be to sample ancient DNA from skeletons. In this way, we could see if the new selected alleles are in them or not. This spring, a paper by Burger and colleagues (PNAS) sampled ancient European skeletons, Neolithic skeletons, for the lactase persistence allele. They didn't find any who had that allele -- not a single one, and this is in Neolithic populations where today the allele is up over 90 percent in frequency. What is going on there?

Lactase allele over time

In this case, it is quite obvious by considering population genetics. We have a very good date for this lactase persistence allele, from many sources -- it is around 6000-10,000 years old. And you can see in the figure, a new selected allele will remain at a very low frequency for a long, long time after its origin. Here, these skeletons were sampled at a time when the selection pressure favoring the allele was present, but the allele had not yet increased to a substantial frequency. In fact, this allele would have been rapidly increasing through these intermediate frequencies much more recently -- we're talking here about Roman times. And today it is over 90 percent in Scandinavia, but considerably lower in Italy and Southern Europe.

In the future, we will be able to sample for genes more widely in ancient skeletons. At the same time, we will be able to sample skeletal changes to try to correlate them with allele origins. That is some research that I have applied for a number grants to support, and I think it will be very promising.

Conclusion

I hope that this essay gives an introduction to the work we have done. This was based on a presentation about the research I gave earlier this year. There are many missing ends, and I'll be adding more information over the next several days about ways of testing for selection, as well as some of the more surprising implications of our research. I've written it without a bibliography, which I can direct you to the paper for a full set of references.

Things I didn't expect to happen today

I am absolutely overwhelmed by the interest and press given to our paper about acceleration of human evolution. So far, the paper hasn't yet shown up online at PNAS, but you can get a copy here and there, or by asking me for a preprint.

Google News is tracking 240 stories worldwide on the research as of today, and that link also points to comments by two of my coauthors, Eric Wang and Henry Harpending.

I just wanted to dash down a list of things that have surprised me:

1. I didn't expect to be on the Drudge Report.

2. I didn't expect to be on Rush Limbaugh's stack of stuff.

3. It feels very strange to be on drive-time morning radio shows in Australia. It's like traveling to the world of tomorrow! Because, well, it is.

4. A talkative 2-year-old may function as an effective prop when talking to a reporter about the relationship of selection and fertility.

5. I didn't expect to get six inches of snow.

Filed under

Acceleration rarely-asked questions

Usually an FAQ starts with the easiest-to-answer questions. Those are, after all, the ones that are asked frequently!

But today I wanted to handle some of the hardest-to-answer questions: questions about the paper coming from people who are extremely knowledgeable about selection in the genome. We are working on three years of papers describing local and genome-wide scans of positive selection. At this point, the "best" methods each have weaknesses, and our method (the LDD, or "linkage disequilibrium decay" test) is no exception. People who know the weaknesses should be wondering, how have we taken them into account?

In a tight six-page limit, it is impossible to answer every valid question. We accentuated the most obvious ones, but we considered a wide array of others (and dealt with many during peer review). Still, it would be good to have a resource where these issues are hashed out for anyone to read them.

To that end, I've compiled a list of "rarely asked questions": what I see as some of the most critical problems with a study of recent selection like ours, and how we've addressed these in a way that makes our study conservative.

I will be adding to this list as I come across new critiques. And for those who aren't quite conversant with genomic techniques, I will be putting up a frequently asked question list tomorrow!

Methods to detect recent selection all have biases of one kind or another. How can we be sure that one or more of these biases haven't really exaggerated the number of alleles in your dataset?

We are working with a tremendous advantage that previous studies of recent selection have lacked: Mathematics. Unquestionably, there are biases in the data, and as described below we have minimized these to the extent possible. But unlike every other study, we actually describe the theoretical reasons why selection should have accelerated in the human genome.

Personally, I can't believe that nobody noticed how extreme these estimates of recent selection really are. I guess that folks doing genomics just weren't as primed in evolutionary theory to perceive how weird the human estimates looked compared to what is measured in the wild on other species, or even over the span of human evolution!

In the earliest studies, when people were finding that 3 or 4 percent of a sample of genes had signs of recent selection, those numbers were already extremely high. They got even higher, as more and more powerful methods of detecting selection came online. Our current estimate is the highest yet, but even this very high number is perfectly consistent with theoretical predictions coming from human population numbers.

At one level, the mathematical answer is as simple as "more people means more mutations." But more deeply, we can predict a linear response of new selected alleles to population size, and we can model this response with respect to a particular frequency range. The genome is a complicated place -- with different mutations originating at different times, selected at different strengths, consequently with different fixation probabilities and different current frequencies. For some reason, nobody really tried to describe this mathematically before.

Now, our model is extremely simple -- it can be challenged on several specific bases. For instance, population increase was not a simple exponential -- it grew in fits and starts, with some significant crashes. The average strength of selected mutations probably changed over time, and the distribution of the strength of selection may have departed from our assumptions. Even the adaptive mutation rate may have changed over time.

Still, the general prediction is quite clear: the population has grown, its conditions of existence have changed, and as a result selection on new mutations should have accelerated. And the observed data fit our theoretical prediction exceptionally well. Certainly we could do better if we made a more detailed model, and we will be doing some of that in future papers. But mathematical simplicity has a great virtue: we can see precisely why human historical changes should have accelerated this aspect of our evolution, and we can see the magnitude of the response. That magnitude greatly outweighs all potential biases.

I read on Gene Expression that the statistical power to detect selection varies based on allele age. You have the greatest power to detect things in the last 20,000 years. So it's no surprise that you find the most variants in that time period. How can you claim this is evidence of acceleration?

P-ter's post on this problem is well-detailed. This is quite an obvious issue -- if we are trying to detect alleles between 20 and 80 percent frequency, it is clear that we are going to be detecting recent things -- many old things would already have been fixed.

But we won't detect just any recent things -- in fact, we will not be able to detect recent things that are weakly selected. By contrast, we should detect older things that are weakly selected, but we will never detect older things that were strongly selected -- they're the ones that are fixed now.

We find a peak in the number of selected variants around 5500 years ago in Europeans, around 8000 years ago in Africans. That corresponds to a strength of selection around 3 percent or more. We find relatively fewer variants -- in fact, many times fewer, with a strength of selection of 1 percent or less.

In theory, strongly selected mutations ought to be vanishingly rare. In fact, they ought to be exponentially rarer than weakly selected mutations. That doesn't mean the theory has to be right, but it does mean we need some kind of explanation if we find that weakly selected things are rare, and strongly selected ones are common -- I mean, R. A. Fisher was wrong sometimes, but I'm not going out on a limb on this one.

Acceleration can explain this reversal -- there simply weren't as many weakly selected mutations 15,000 years ago, because there weren't as many people. The more strongly selected mutations in the last 8000 years actually were very rare per individual, but there were many, many more people to generate them.

But maybe ascertainment bias for recent alleles might explain this reversal of theoretical expectations?

Here's the problem: Suppose we are missing lots of selected alleles older than 10,000 years ago. That means there has been even more selection than we now think in the last 40,000 years.

We were very careful in the paper not to tie the test of acceleration to the age distribution of selection. The age distribution fits the acceleration theory beautifully, but this fit is not enough -- if we are willing to believe that selection was always intensely powerful and rapid in humans, then finding that it has recently powerful and rapid would be no surprise!

For this reason, we tested the theory of a constant rate of selection against other kinds of data -- data that aren't drawn from the LDD test. We showed that a constant rate makes many false predictions -- it predicts a tenfold lower heterozygosity than we see genome-wide, it predicts a very powerful association of heterozygosity with recombination rate, it predicts an extremely large number of recently fixed alleles, and it predicts 6 million adaptive substitutions between humans and chimpanzees. None of these predictions are close to reality. The rate couldn't always have been as high as it is now.

Now, suppose we missed a large fraction of old events with the LDD test. That means that the total number of recent events would be much larger than we have estimated. Which means that the recent rate of selection would have been much higher. Which means that a constant rate at that much higher level is even further from reality.

The LDD test is better at detecting selection in areas with low recombination rates. Isn't this is an obvious bias on the analysis?

Possibly. But it is a conservative bias. Consider: if we have a higher power in some regions of the genome, then we are actually missing events in others. That makes our assessments into underestimates of recent selection.

We have made this even more conservative by simply eliminating areas of the genome with very low recombination rates. We didn't include such areas at all, even though other studies have found selected variants in them.

But more important, the denser dataset has allowed us greater power in finding selection in areas of higher recombination. This has resulted in a broad addition of new variants to our list of selected alleles, particularly in the Yoruba sample were background LD is lower than the European or Asian samples.

One recent review (Nielsen et al. 2007) showed that a high proportion of alleles found by the LDD test were in areas of low recombination. But this comparison is very misleading -- that review limited itself to selected alleles shared in all sampled populations -- a tiny subset of 90 genes out of the total 1800 listed by Wang et al. (2006). These are predominantly the oldest alleles -- for which the age-related ascertainment bias is the greatest, and the power is strongest in low recombination regions.

Strikingly, we found that increasing the SNP density in the new HapMap made very little difference to the number of selected variants estimated for the CEU sample -- we believe this is because we are finding basically everything there for the method to find. This leaves significant limits -- for instance, the limited frequency window we used. But we don't think we are missing lots of selection in high-recombination regions.

But wait a minute. What if you are finding more variants in areas of low recombination because they are false positives -- in other words, because you are finding alleles that actually weren't selected?

Ah, the opposite ascertainment bias. This is what we have worked the hardest to avoid -- false positives. We have deliberately excluded areas of the genome to avoid them, and we have used very conservative threshold values for our tests to minimize them.

There are several strong reasons to believe that we are not looking at neutral alleles. Here's what we wrote in the paper:

Recent genetic drift including founder effects would affect all genomic regions equally, but the candidate selected genes occur predominantly in genic regions, and preferentially include genes in functional classes that are plausible targets for recent adaptive changes. Selection is the only explanation consistent with all these features.

Almost certainly, we have some false positives. But they cannot be very common, or they would distort these clear alternative signs of selection.

And more important, we are talking about a number of selected variants 100 times greater than we would expect under a constant rate. If we overestimated by a third (that would be a high error rate, which I think is very improbable), the rate has accelerated by 70 times. If we threw 90 percent of our list away, we would still be looking at an acceleration of 10 times faster! The numbers coming out of every other group looking at selection are within the ballpark of ours, so this is no surprise.

In other words, our tests of acceleration do not depend very finely on the ascertainment of these alleles. I believe our assessments are correct and conservative -- if we made errors, they were by underestimating selection rather than overestimating it.

But every distribution has a tail. If you use any kind of threshold, even a 99.5% threshold, you are going to have false positives, aren't you? And across the whole genome, doesn't that add up to a large number?

Our assessment counts the most extreme 0.5 percent of LD clusters as positively selected. Since the entire genome includes all the selected sites, this is conservative. Simulations showed that this value produced very few false positives. And we have a check against false positives -- if there were many neutral clusters in the data, they would not be associated in genic regions, sorted into certain functional categories, found across the entire range of frequencies, etc.

Also, false positives are very likely to be placed in the oldest time range where LD decay has proceeded to the greatest extent. If false positives were very common, we would see an elevation in this time range, which we don't see.

Why didn't you just used the phased data?

Well, the HapMap phased data are freely downloadable now and I've been working with them. The advantage of working with phased data is that we can look for lower-frequency variants. I'll be giving an example of that next week.

But the phased data weren't available when we did the analysis. And the LDD test really is an elegant way of dealing with unphased genotypes.

Don't we expect evolution to be faster on shorter time scales? The "acceleration" you are finding could just be the fact you are looking at a short window of time.

Geneticists may not have seen this question, but paleontologists will be intimately familiar with it. When we look at evolutionary changes over very long time scales, there is an averaging effect. Fluctuations over time tend to average out, so that the long-term change is relatively slight when measured per year.

In contrast, when we look at evolutionary changes over a short time, any immediate fluctuation will tend to add to the rate of change. So measuring change per year yields a relatively high rate.

It is quite obvious that a very high rate of change in phenotypes per generation cannot be maintained indefinitely. For instance, a reduction of 0.01 percent per generation shrinks a trait to only 1 percent of its initial size in only 40000 generations. Most organisms can't seamlessly shrink down to one percent of their size -- selection ultimately constrains their size. So even very low rates of change cannot be maintained over evolutionary time scales.

From that perspective, we may view it is basically unsurprising that human skeletal features have been rapidly evolving over the last 10,000 years. Sure, this is a higher measured rate than ever before -- even over equivalent time spans like the Neandertal-modern transition in Europe. But it cannot be sustained indefinitely, and may just be an artifact of looking at a sharp fluctuation in a narrow window of time.

I don't think that argument applies cleanly to the last 10,000 years. For one thing, the measured rate of skeletal change actually is surprisingly high, not only compared to longer timespans in the past, but also compared to equivalent timespans. But more to the point, the direction of change has been consistent across populations even as they grew in size. This is not some momentary fluctuation in our evolution, it is an exceptional transient from one state to another.

But even more important, the distinction between short-term and long-term changes in phenotype are simply not relevant to allele frequencies. Positive selection is always relatively rapid on a geological time scale. Some alleles can reverse themselves over long periods of time -- increasing and then later decreasing in frequency, or even holding themselves in long-term stasis. But when we count an exceptionally large number of recent alleles, we are not looking at a normal situation. Most fluctuations do not involve complete reversal -- it is unlikely for a fixed substitution to subsequently be erased completely from our species. So comparing short and long-term changes to gene sequences is comparing like with like to a much greater extent than is true of phenotypes.

How can you say anything at all about the rate of adaptive mutations? Everybody knows that adaptive mutations (choose one) occur rarely if ever...happen almost as common as deleterious ones...depend on the environment, which was constantly changing!

Believe it or not, we actually had a reviewer tell us that positive selection "rarely if ever" happens. Rarely if ever! This was a geneticist!

I think it must have been a slip of the keyboard. In any event, the intrinsic rate of new adaptive mutations per genome (as opposed to per population) is incredibly important in determining how fast selection should have happened in recent populations.

The beauty is that we don't have to know what this rate is. We don't have to make any assumption about this rate. In fact, we have structured our analysis so that this unknown rate is what we estimate using the data from the LDD test.

The incredible strength of our analysis is that we can assess the predictions based on this rate against other sources of data. That is, the LDD test generates a hypothesis about the rate of recent change, and we can show that observed rate is absolutely impossible as a long-term rate of change. Hence, evolution accelerated.

The obvious weakness is that for simplicity we assume the rate is constant, but it almost certainly changed over time. However, we have structured the analysis to be conservative with respect to the most probable kinds of changes.

For example, what I think is most likely by far is that the intrinsic rate of new adaptive mutations per genome increased greatly during the last 20,000 years, because new environments created new selection pressures. People lived with greater and greater mismatches to their environments, and as Fisher (1930) showed, this means that adaptive mutations should have become more and more likely.

But naturally, if adaptive mutations became more and more likely per genome, that must cause an acceleration of the rate of adaptive mutations per population. So this category of change leads to acceleration. This is precisely what I think happened.

Less likely is that the environmental changes made adaptive mutations less likely per genome. But even in this case, our analysis must be conservative. If such mutations are intrinsically less likely, but we are still seeing a very large number of them, then our data still show acceleration. In fact, the conclusion would be that population number must have been even more important to acceleration than we thought, because on a per population basis it outweighed an intrinsic reduction in the rate of new adaptive mutations per genome. Again, I think this is unlikely for the time period we are talking about. It might be closer to the actual pattern over the past 100 years or so, when mortality selection really has decreased in many populations.

What about the Haldane limit? I thought such rapid evolution was impossible!

J. B. S. Haldane famously estimated that the substitution cost of new alleles in humans limited the rate of adaptive evolution. In his estimate, the slow rate of human reproduction limited substitutions to one every 300 generations. This became known as the "Haldane limit".

Motoo Kimura used Haldane's argument as a reason why selection could not explain the substitution rate, and he asserted that as support for his neutral theory of molecular evolution. We do not challenge neutralism, but it is clear that Haldane's limit is a problem, since every estimate says that humans have been evolving at many times that rate.

Maynard Smith (1968, Nature) showed that Haldane's argument depended on the unrealistic assumption of independence among all selected loci, so that the substitution load depends critically on the fitness of the optimal genotype among all selected loci. If selection on many loci is non-independent, then a very large number of genes may be selected with the same substitution load as a single gene under Haldane's assumptions. Later, Ewens (e.g., 1972, Am Naturalist) made a similar argument. Ewens (2004, Mathematical Population Genetics) reviewed this problem, pointing out an additional weakness of Haldane's argument: it depends solely on mortality selection, while many genes may be under fertility selection.

These considerations show that Haldane's limit does not constrain the adaptive substitution rate in humans to 1/300 generations, and our estimated rate of 13 per generation is not excluded. Moreover, considering the high infant and juvenile mortality evidenced in Neolithic and later populations, much of that death rate resulting directly from disease and dietary deficiencies, the number of selective deaths available to drive substitutions has clearly been high.

No time tonight to add references, I will put these in an update tomorrow.

Human evolution has accelerated

The embargo has now ended on the second, and far more important paper that I mentioned the other day. It is a product of work I've been doing with Bob Moyzis of UC Irvine, his former graduate student Eric Wang, now at Affymetrix, my friend Greg Cochran and Henry Harpending at the University of Utah.

Some readers may know I've been working on this project -- I've given presentations at meetings and at a number of universities about it. But otherwise I've been silent about it. In particular I have been systematically avoiding the topic of recent selection here on the weblog. It has been a great inconvenience to me, but the unhappy fact is that journals want new results, and blogging about something is at least perceived to reduce its news value. And of course, working with other people across the country entails a lot of respect for keeping discussions and results confidential until we have all signed off on everything.

Anyway, I'm hugely excited about this project, our current results, and what we will be doing next. Which means I have some pent-up writing to do! Over the next few days, this will be acceleration central -- I'll be laying out what these genomic data mean for recent human evolution, what kinds of genes we have been finding under selection, and exactly how these kinds of analyses are done.

I'll also be tracking press articles and blog reactions to the paper. PNAS is, if anything, consistently unpredictable about when they actually make papers available. If you want a preprint, please let me know. I'd also appreciate your links.

Also, if you've come here for the first time, welcome! I may get a lot of traffic for a few days, so I apologize if things are slow.

Here's a start: the abstract.

Recent acceleration of human adaptive evolution
John Hawks, Eric T. Wang, Gregory Cochran, Henry C. Harpending, and Robert K. Moyzis
Genomic surveys in humans identify a large amount of recent positive selection. Using the 3.9-million HapMap SNP dataset, we found that selection has accelerated greatly during the last 40,000 years. We tested the null hypothesis that the observed age distribution of recent positively selected linkage blocks is consistent with a constant rate of adaptive substitution during human evolution. We show that a constant rate high enough to explain the number of recently selected variants would predict (i) site heterozygosity at least 10-fold lower than is observed in humans, (ii) a strong relationship of heterozygosity and local recombination rate, which is not observed in humans, (iii) an implausibly high number of adaptive substitutions between humans and chimpanzees, and (iv) nearly 100 times the observed number of high-frequency linkage disequilibrium blocks. Larger populations generate more new selected mutations, and we show the consistency of the observed data with the historical pattern of human population growth. We consider human demographic growth to be linked with past changes in human cultures and ecologies. Both processes have contributed to the extraordinarily rapid recent genetic evolution of our species.

This is a bold assertion, and I will be putting out an FAQ later today that covers many of the questions I have been fielding from the press. There is a lot of technical detail in it, but we have accomplished essentially two things:

1. An empirical age distribution for alleles under recent selection, which number in the thousands.

2. A theoretical account of why these new alleles should have been increasing rapidly in numbers during the last 40,000 years.

It is a powerful paper because it shows why a rapid acceleration of our evolution is expected in theory, and it matches those expectations to real empirical data. It shows the absolute impossibility of a constant rate of selective change in humans, and that gives reality to our estimate of the amount of acceleration.

The last paragraph of the discussion:

It is sometimes claimed that the pace of human evolution should have slowed as cultural adaptation supplanted genetic adaptation. The high empirical number of recent adaptive variants would seem sufficient to refute this claim. It is important to note that the peak ages of new selected variants in our data do not reflect the highest intensity of selection, but merely our ability to detect selection. Due to the recent acceleration, many more new adaptive mutations should exist than have yet been ascertained, occurring at a faster and faster rate during historic times. Adaptive alleles with frequencies under 22% should then greatly outnumber those at higher frequencies. To the extent that new adaptive alleles continued to reflect demographic growth, the Neolithic and later periods would have experienced a rate of adaptive evolution more than 100 times higher than characterized most of human evolution. Cultural changes have reduced mortality rates, but variance in reproduction has continued to fuel genetic change. In our view, the rapid cultural evolution during the Late Pleistocene created vastly more opportunities for further genetic change, not fewer, as new avenues emerged for communication, social interactions, and creativity.

Over the next few days, I'll fill you in a bit about the course of this research -- how we got started, how it proceeded, and what parts of it remain exciting. Also, I'll try to give a flavor to what genomics means for anthropology -- what exactly is "anthropological genomics?" I think that there is an exciting frontier opening in the way we look at the past, and I hope to be able to show how some of it will work over the next few years.

That acceleration thing

If you've come via a link about my current work, please welcome! I'm really not going to write about it here until our publication -- journals can be persnickety that way. But I am giving a talk about some aspects of the work today.

Syndicate content