effective population size

New data on Ashkenazi population history

Bray and colleagues [1] report on genotyping of 471 people of Ashkenazi Jewish descent. This is one of the largest samples of a single human population, and is therefore very interesting for studies of population history and recent natural selection.

There's a lot in the paper. One of the key findings in the paper is that the Ashkenazi population doesn't look bottlenecked -- in fact, it looks outbred compared to Europeans generally. The paper also documents a high amount of admixture with non-Ashkenazi Europeans, ranging from 35% to 55%. Figuring out the actual history of the population -- when and where its ancestors lived and how they interacted with other people -- is beyond the scope of this kind of analysis. But I expect that somebody can put together a really compelling historical account using these data.

I turned quickly to the issue of selection. They are able to substantiate evidence of positive selection on several disease-causing alleles in the Ashkenazi population, including the Tay-Sachs allele. The lack of evidence for bottlenecks or founder effects pretty much takes away the alternative explanation. Yet they were unable to show statistical evidence of selection on some other disease-causing alleles in Ashkenazi populations:

To explore whether regions of selection in the AJ population included any loci of known Ashkenazi diseases, we examined 21 disease- and cancer-susceptibility loci with known mutations found at higher frequency in the Ashkenazi population. Only 6 of the 21 genes fell in or near (within 500 kb) the top 5% of the AJ iHS windows (Table 2). Among these is the Tay-Sachs disease gene, HEXA, whose selection has been widely debated (4, 5, 14–16) and was found ~400 kb downstream of a window on chromosome 15 identified in the top 1% of the AJ iHS hits. Although none of the SNPs interrogated immediately adjacent to the HEXA locus showed elevated iHS signals, it is possible that the nearby region may contain regulatory elements under selection that affect HEXA expression. Cochran et al. (14) speculated that selection of many of the AJ- prevalent disease loci, especially the lysosomal diseases, conferred an increase in intelligence that was necessary historically for the AJ economic survival. Our data shows evidence of strong selection at or near only six disease loci, including only one out of the four AJ- prevalent lysosomal storage diseases, thus arguing that most AJ disease loci are not under strong positive selection, but rather rose to their current frequency through genetic drift after a bottleneck. However, we cannot exclude the possibility that selection of some AJ disease loci are outside the limits of detection by the extended haplotype tests, which are known to have less power to detect se- lection of lower frequency alleles (38, 41).

It seems to me that this passage probably wasn't written by the same author who showed the lack of evidence for founder effects a few pages before. In this case, the confusion probably comes from the fact that the "detection of positive selection" is actually a refutation of the hypothesis of genetic drift. With a larger sample it will be possible to test the hypothesis with greater power.

Ddisease-causing alleles are at low frequencies currently, making them unlikely to rise to the top percentages of the statistics. It would be interesting to control for current frequency, but I haven't seen a test that uses frequency information in this way.

It's quite remarkable to reflect on the idea that positive selection has now been demonstrated on six disease-causing alleles in the Ashkenazi population. Every one of these is a case of overdominance -- where the heterozygote carrying an allele has some selective advantage, while the homozygote carrying two copies has a disorder. I was having a conversation with a very prominent geneticist a few months ago, who claimed that no case of overdominance in humans had ever been demonstrated except sickle cell. Now, that was obviously false even at the time -- as I pointed out, the many hemoglobinopathies are fairly clear examples. But we've come an awfully long way.

From data like these, we're going to learn a huge amount about low-frequency selected alleles. The Tay-Sachs-causing allele is one of the most common recessive lethal genes in any human population, but like all genes subject to strong selection in homozygotes, it remains rare. Finding selection on these kinds of alleles is very hard unless sample sizes increase to several hundred individuals. Here we are seeing evidence of selection in historic populations -- within the last 2000 years. More will be coming.


References

Return of the Neanderchimps

Back in 2005, I reviewed the first description of fossil chimpanzee teeth, from the Middle Pleistocene of the Kapthurin Formation, Kenya, dating to around 500,000 years ago. At the time, I noted that no chimpanzees have lived in the area in historic times, and that mtDNA evidence then suggested that East African chimpanzees (Pan troglodytes schweinfurthii) may have been recently derived from Central Africa. Together, those observations raised a mystery -- if today's chimps had no ancestors anywhere near Kenya 500,000 years ago, to what group did these fossil chimpanzee teeth belong? I suggested an answer: a cryptic population of chimpanzees partially or completely replaced by the dispersal of Eastern chimpanzees. In other words, Neanderchimps.

Well, now that we know for sure that Neandertals are human, too... it's a good time to revisit the Neanderchimps. What can we say today about the population structure of chimpanzees in the past, and is it still possible that these chimpanzee fossil teeth are out of kilter with the population genetics of today's chimpanzees?

A few weeks ago, we had Jody Hey visiting here on campus, and he gave a talk about his recent work on chimpanzee population genetics. Together with Rasmus Nielsen and others, Hey has been developing Bayesian methods for estimating the times of divergence, migration rates, and effective population sizes of species.

The basic idea is that present-day samples of a species like chimpanzees reflect a branching process from an ancestral population. Each branch may exchange migrants with other branches, each branch has an effective population size, and each may begin with some kind of population bottleneck. That makes for a very complicated model -- for example, with only two populations, there are six parameters, not counting bottlenecks. With each additional population, the number of parameters is compounded by additional effective size, time of splitting, and migration rate to and from all other populations. The number of parameters increases faster than a factorial of the number of populations.

Hey began this work several years ago, initially limited to the two-population case. Together with Yong-Jin Won, he showed that West African chimpanzees (P. troglodytes verus) have a substantially smaller effective size than central African chimpanzees (P. troglodytes troglodytes). These two subspecies appeared to have diverged within the last 300,000-400,000 years. And while there was little evidence for gene flow from central into west African chimpanzees, there was clear evidence for gene flow the other direction, from west into central Africa.

Sound familiar?

In a series of two-way analyses, Won and Hey showed that bonobos diverged from chimpanzees approximately 400,000-800,000 years ago, that there was no substantial evidence of gene flow into or out of bonobos after their speciation, and that the efective size of bonobos was around the same as that of west African chimpanzees, a bit under 10,000 effective individuals.

Now, in 2010, Hey has extended both the data and method to encompass more than a single divergence between two populations. In the case of Pan, Hey has included three extant subspecies of common chimpanzees (P. t. troglodytes, P. t. verus, and P. t. schweinfurthii), together with bonobos (P. paniscus). Among those, in a bifurcating model of population divergence, there are three speciation times, ten effective sizes, and lots of asymmetrical migration rates, all scaled in one way or another to mutation rate. It takes a lot of data to estimate these parameters simultaneously. The study uses 73 loci from an average of 78 individuals split among the populations, which is apparently not quite enough data to get good parameter estimates for the migration rates, as the probability surfaces for these are shallow and relatively unresolved with a few exceptions.

The parameters describing divergence times and effective sizes under the model have tighter posterior probability distributions, so that they are reasonably well estimated using these data. Here are the highlights:

1. Bonobos split from chimpanzees around 930,000 years ago (680,000-1.54 million).

2. The effective sizes of most populations were small (around 10,000 or less). The Pan ancestral population was moderately larger (around 17,000 effective individuals).

3. Only central African chimpanzees were substantially larger in effective size, upward of 25,000-30,000 effective individuals during the last 460,000 years.

4. All common chimpanzees (Pan troglodytes) descend from an ancestral population that existed 460,000 years ago (350,000-650,000).

5. East African chimpanzees split very recently, only around 93,000 years ago (41,000-157,000) from central African chimpanzees.

All these estimates result from a fairly restrictive model. Each population is described by two parameters, their interactions by an additional two parameters per population pair. The ideas of pulses of population mixture or founder effects are simply not possible in the model. I don't see this as a weakness -- I'd much rather begin with even simpler models. But it does mean that we cannot generalize the results past the model. In particular, we shouldn't compare these times and migration rates directly with those obtained under the model that Green and colleagues (2010) applied to the Neandertal genome.

But after those words of caution, what can we make of this proposed population history for chimpanzees? Here are some possible conclusions relevant to human evolution:

1. Eastern and central chimpanzee subspecies share a more recent history than would have been true of humans and Neandertal populations at the time the latter existed. Western chimpanzees are more distant from other chimps than the Neandertals and humans were from each other.

2. For that matter, population differences between MSA humans within Africa may have been nearly as great as those between eastern and central African chimpanzee subspecies.

3. Bonobos and chimpanzees split roughly a million years ago with little if any subsequent interbreeding. At least in the west (Africa, Europe and West Asia), Pleistocene human populations did not experience this kind of allopatric speciation. At the moment, I enter that as an assertion, which I'll follow up later by some discussion of the pre-Neandertal problem.

4. The effective sizes estimated for ancient human populations are not especially low.

5. Range expansions and partial or complete replacements were part of the population history of chimpanzees. They managed these dynamic events without handaxes, fire, projectile weapons, language, or any of the other proposed trappings of Pleistocene humans.

I want to follow up on a couple of these. First, effective size: You often hear people claiming that humans have much lower genetic diversity than chimpanzees. It is true only in a limited sense. Bonobos, west African and east African chimpanzees are populations with lower genetic variation than humans. The estimate for the effective size of the common chimpanzee ancestral population, 7100, is substantially lower than estimated for the human ancestral population during the same time period, a period stretching from roughly a million to 460,000 years ago. The common ancestral population of chimpanzees and bonobos is inferred to have had an effective size close to that of ancestral humans at the same time, around 17,000 effective individuals prior to a million years ago.

One may object that chimpanzees cover a much smaller area than Pleistocene humans, so we should expect their effective size to be much lower. But genetic variation can be related to population size only by assuming a population model, and Hey's analysis gives us a model quite starkly different from the usual. That doesn't mean it's correct, or that it is a better estimator of the census size of the ancient populations. But it reminds us that comparing the genetic variation of humans and chimpanzees is too simplistic; that the gene trees within each populations are very sensitive to the relative contributions of different parts of each species' range during the last 500,000 years. In chimpanzees, the high genetic variation mostly can be attributed to the central African subspecies; in humans, the extant genetic variation can mostly be attributed to Africa.

Let's ponder chimpanzee range expansions for a moment longer. We know that in the early Middle Pleistocene, chimpanzee-like apes lived in western Kenya. The only chimpanzees who live anywhere near that area today seem to have been much more strongly connected to chimpanzees in western Congo prior to 93,000 years ago, and that central African population still has much more variation than the eastern ones. That suggests a recent range expansion, Late Pleistocene in age, into East Africa.

We don't know that the earlier chimpanzees became extinct. They may have contributed genes into later P. schweinfurthii, just as Neandertals did into living humans. We can tell stories about climate change and the former East African chimpanzees, just as people have done about human origins, megadroughts and volcanoes. But one thing is clear about the chimpanzees: there was no modern chimpanzee revolution. The other chimpanzee subspecies, P. t. verus, is still here.

UPDATE (2010-05-20): "More on chimpanzee population structure" discusses a subsequent paper on the same topic.

References:

Gagneux P, Gonder MK, Goldberg TL, Morin PA. 2001. Gene flow in wild chimpanzee populations: what genetic data tell us about chimpanzee movement over time and space. Phil Trans R Soc Lond B 356:889-897.

Goldberg TL, Ruvolo M. 1997. Molecular phylogenetics and historical biogeography of east African chimpanzees. Biol J Linn Soc 61:301-324.

Hey J. 2010. The divergence of chimpanzee species and subspecies as revealed in multipopulation isolation-with-migration analyses. Mol Biol Evol 27:921-933. doi:10.1093/molbev/msp298

McBrearty S, Jablonski NG. 2005. First fossil chimpanzee. Nature 437:105-108. doi:10.1038/nature04008

Won Y-J. Hey J. 2005. Divergence population genetics of chimpanzees. Mol Biol Evol 22:297-307. doi:10.1093/molbev/msi017

Passing on your fertility to your kids

From the NY Times earlier this spring, a profile of a New York woman with an exceptional legacy:

WHEN Yitta Schwartz died last month at 93, she left behind 15 children, more than 200 grandchildren and so many great- and great-great-grandchildren that, by her family’s count, she could claim perhaps 2,000 living descendants.

The story talks about her history and how she came to have such a large family. By itself, having 15 children would be unremarkable except that the children and grandchildren themselves all went on to have large families ("Like many Hasidim, Mrs. Schwartz considered bearing children as her tribute to God."). After a couple of generations, it adds up to a lot of descendants.

I don't think the story is all that unique. Within the United States there are many communities, like the Hutterites, Old Order Amish, and Hasidic Jews, where large family sizes are the norm. Probably hundreds of women on earth can claim more than a thousand living descendants, and thousands more have only to wait until they are old enough, while their children and grandchildren's families continue to grow.

You can get there by having 10 children, each of which has 10, and each grandchild has 10 -- that adds up to 1110, giving some extra for different generation times and losses. Of course, it's a trick to live long enough to see the 1000 great-grandchildren, but the early ones should already have given you a fraction of your 10000 great-great-grandchildren.

What's surprising here? Not the family sizes themselves -- big families are common in most human populations. The high offspring numbers are not as apparent in populations that have high juvenile and infant mortality, but many pregnancies was the norm prior to the industrial transition.

No, what's surprising about huge numbers of living descendants is the correlation between generations. In these cases, the correlation is driven by religion and various social proscriptions related to religious observance.

I often talk about models and real human population structures in my classes. One obviously unrealistic aspect of the Wright-Fisher population model is its reproductive variance. In the Wright-Fisher model, reproductive variance is binomial -- every gene in an offspring population is equally likely to descend from each gene in the parental generation. In the model, it is possible -- albeit extraordinarily unlikely -- for a single parent to give rise to the entire offspring generation. That just can't happen in a real population, certainly not in humans. The effect of that unrealistic assumption of the model is not great, however, because even in the model the chances have having more than 10 offspring, while possible in theory, are negligible. If anything, the Wright-Fisher model is too conservative about the variance of offspring number -- real human populations have a non-negligible fraction of women who have 10 or more live births.

I get more concerned about other deficiencies of simple models, which are sometimes harder to deal with. One of those is the correlation of offspring number between generations. If there is even a slight correlation, women tending to have more children because they came from larger families, it has a major effect on the amount of inbreeding in the population.

You can think about it genealogically. Suppose you live in a small town with a few big families. The chances that you yourself were born into one of those big families is small. But if today's big families tended to come from yesterday's big families, with each generation we go back in time, it becomes more and more likely that one of your ancestors came from one of those big families. Still looking backward in time, your genealogy becomes captured by those big families, branch by branch. Since there are few big families in the town, once two or more lines of your ancestry trace to them, those lines will rapidly share a common ancestor. That's inbreeding, from the perspective of your genealogy.

In small towns, that process isn't inevitable because people move in from elsewhere. Most of the lines of your genealogy will probably come from other towns within a few generations. But if we consider the human species as a small town, well, there's nowhere else to move in from. If the population structure of our species has included a strong correlation of offspring number between generations, it will have massively reduced our genetic variation.

Since we have low genetic variation as a species, you can see why this is potentially interesting.

Masatoshi Nei and Motoi Murata back in 1966 worked out a relation between intergenerational correlation in offspring number and effective population size. That's before the days of computer models, for you simulation jocks out there. The "effective" size of a population, as I've noted here many times, is the one parameter of a Wright-Fisher model, as estimated from the genetic variation within a population. It's a statement about how inbred the population looks, assuming that its evolution followed a random-mating model throughout its history. Now, that model is wrong in pretty much every interesting case, and so there are various mathematical transformations that attempt to account for the effects of different mating structures.

In the case of intergenerational correlation of offspring number, Nei and Murata derived an expression to predict the reduction of effective size to be expected from this correlation, assuming a model in which the variance in offspring number is distributed in a certain way. The solution isn't general -- if offspring number were distributed in some other way, the effect of the same measured correlation may be quite different. And in their model, they were concerned with the case where the correlation of offspring number is influenced by genes that determine fitness -- in other words, genes under selection in the population. So it's not a complete answer, but it's a start.

Nei and Murata cited empirical data from several earlier studies that showed a correlation of 0.20 to 0.40 between generations of human offspring number. Under the assumption of their model, a correlation of 0.30 would causes a reduction of the effective size by roughly half.

That's a big effect. We already expect a reduction of effective size compared to the census count of a human population, because human populations include many non-reproductive individuals -- kids and postreproductive adults make up half to two-thirds of small-scale foragers. If big families have an additional effect of half, it means that the effective size of the population starts out at a fourth to a sixth the census count. So that an effective size of 10,000 really means 40,000 to 60,000 people on the ground.

Still low, but as one factor among many it may be very important -- and possibly the distribution of variance caused a further decline. It's much worth investigation.

A correlation of offspring number between populations can be caused by many ecological or cultural factors. Nei and Murata (1966) had considered the case where fitness itself is inherited, because of the presence of selected genes. But in humans, a more pervasive force is cultural inheritance. This factor was discussed in 1976 by the demographer Samuel Preston, attending to the importance of cultural preferences in contemporary populations:

Since children of each generation are drawn disproportionately from families of women with high fertility achievements in the past, it may be expected that a pronatalist selective bias operates each generation with respect to the transmission of "tastes" for children. It has also been suggested that personality traits which may affect fertility achievement, such as the ability to defer gratification, may be transferred to some extent between parent and child (Kantner and Potter, 1954). It is also reasonable to suggest that biological fecundability is partially inherited. The positive correlation between the social classes of parent and child implies that economic constraints impinging on the childbearing process tend to be similar for the two generations (Preston 1976:110).

In small-scale societies, these forces are somewhat different. But I wouldn't expect them to be less -- indeed, the social competition between families is probably more intense. The entire "Macchiavellian intelligence" model of cognitive evolution implies that these kin-level effects were pervasive throughout human evolution over the past 2 million years or more. A strong cultural inheritance of fitness is really necessary for selection on genes that influence prosocial kin-related behaviors.

How intense? Seems like a good question to investigate, as it may have a lot of importance to understanding genetic variation in our ancestors -- including our common ancestors with the Neandertals, whose genetic variation was limited just as much as our own.

On the subject of effective population size, I'll be posting next week about chimpanzees and bonobos. More genetically variable than us? Well, some of them...

References:

Preston SH. 1976. Family sizes of children and family sizes of women. Demography 13:105-114.

Nei M, Murata M. 1966. Effective population size when fertility is inherited. Genet Res 8:257-260.

An insertion into deep history

A couple of weeks ago I noted a new article by Chad Huff and colleagues in PNAS. It wasn't available yet when I wrote, but I've had the chance to study it now.

The paper presents a tremendously clever way of using contemporary genetics to look at different time slices in Pleistocene human evolution. If you can imagine traveling to different parts of the human genome and looking at different times in the past, that's more or less what they are doing.

We have the genomes of several people now -- the paper focuses on Venter's sequence versus the official HGP draft sequence, but there are others. A whole genome is limited in its utility to look at genetic variation, but it has some very interesting sampling properties. Much of population genetics theory is based on a simple question: what happens if you sample two individuals at random? How similar are they? What will be the distribution of genetic differences between them? How long ago did each of their genes descend from a single common ancestor? Sampling a diploid genome yields precisely the data for which these questions were designed.

Huff and colleagues dredge up a relatively obscure point of theory. Suppose you take a particular kind of rare event -- they consider mobile element insertions, including Alu and LINE insertions. Even though these elements make up a large fraction of the human genome, the events that give rise to them are rare, occurring only once in a whole genome every 20 births or more. Now, look around the genome and partition it into two kinds of regions. One kind of region will include the rare events (insertions in this case) and the area immediately flanking them. The other will include everywhere else in the genome. Now, the partitioning creates a bias. The areas that include these rare events will, on average, represent more diverse parts of the genome, with deeper genealogies. This is because the intrinsically rare event is more likely to have happened in the long time span represented by such areas than in the relatively shorter times represented by the remainder of the genome. In fact, the average depth of these areas including the insertions should be precisely double the average depth of the areas that lack them.

In other words, looking at these rare events is sort of like opening the box on Schroedinger's cat. There's something that we shouldn't be able to find out a priori -- how old is the genealogy of a part of the genome? By sifting through the genome and picking out all the parts that have these insertions, we know something about them: We know that they represent a time interval double that of the rest of the genome. Our looking at these insertions has collapsed the likelihood function that relates genetic location to age. When we look at the variation around insertions, we can then ignore some of the events that changed the population's diversity in the last couple of hundred thousand years. And by comparing these sites with the rest of the genome, we have another way to test hypotheses about whether the population was once a lot bigger or smaller than it has been over the last few hundred thousand years.

The analysis shows that the population in that early part of the genealogy -- corresponding more or less to dates over 1.2 million years ago -- was consistent with an effective population size of 18000 individuals, give or take. As I pointed out in my earlier post, that value itself isn't surprising -- it's a bit higher than the average genome-wide. The best-fit model, including both areas near insertions and the rest of the genome, was one in which the effective population size actually declined from 18,500 to 8500 individuals at 1.2 million years ago. They explain that the recent value should be depressed by the separation of present human populations -- Venter and the human reference sequence both being primarily derived from Europe, they undersample human variation.

Now, it's easy to see some of the limitations on the analysis. The authors considered only a two-epoch model of population history. That is to say, once upon the time the population was x individuals, then at some time t, the population becomes y individuals. Two epochs of population size, separated by one time. Clearly the actual history of human populations was more complicated than this, but does it matter? Recent history will not greatly influence nucleotide diversity, and in particular the insertions -- because they are intrinsically rare -- are likely to reflect much more ancient events that have survived any subsequent vicissitudes of population.

But, I suspect that the distribution of insertions with relation to recent selection will make an appreciable difference to the nearby SNP diversity. The geographic distribution of variation will also make some difference, although we won't know how much until we look at non-European genomes.

Meanwhile, if I were looking to the archaeological record to identify times that made a difference to the human population, 1.2 million years ago would really not register. It certainly would not strike me as a time of substantial reduction of the human population.

The lack of any archaeological referent is typical of such studies -- after all, they're not trying to match numbers from archaeology, they're trying to establish internally consistent genetic tests of population history. But if these values are real, they must match what we know from the fossil and archaeological record. There is some text in the paper about the small effective size and its relevance to humans as a sign of repeated bottlenecks or other events. As I pointed out earlier, I think 18,000 is pretty significantly large compared to most other estimates of human effective population size. When we get an estimate of human effective size so near those of other apes, we are looking at a value consistent with habitation of a large, certainly continent-wide range by large populations. So now I have to think what the pertinent comparison from the archaeological record should be.

One archaeological comparison is of special interest to me: a real-life comparison that will be immediately relevant. This study should be giving us information about the population ancestral to Neandertals and humans. In that sense, it duplicates the information that we ought to be able to derive from the comparison of human and Neandertal genomes.

Interestingly, the effective size estimates published so far for the human-Neandertal ancestral population are much lower than the 18,500 estimated in this study. Green and colleagues (2006) made a point estimate of 3000 effective individuals at the time of Neandertal-human divergence. That estimate is likely to be supplanted by the Neandertal genome release, because the Green et al. (2006) estimate was influenced by some fraction of contaminating sequence from humans. And the error bars on that estimate are large. But there's a lot of space between them -- we're talking about at least a sixfold difference.

Something doesn't add up. The human-Neandertal ancestral population must have contained all these polymorphic insertions that supposedly occurred before 800,000 years ago. The effective size of the population may have been lower, but if so we should look for some explanation for that substantial loss of variation.

UPDATE (2010-02-10): A couple of people have asked about effective population size. Here's a helpful post that explains why a small effective size may not mean a small population size, and some of the current hypotheses that try to explain the human value.

References:

Green RE, Krause J, Ptak SE, Briggs AW, Ronan MT, Simons JF, Du L, Egholm M, Rothberg JM, Paunovic M, Pääbo S. 2006. Analysis of one million base pairs of Neanderthal DNA. Nature 444:330-336. doi:10.1038/nature05336

Huff CD. Xing J, Rogers AR, Witherspoon D, Jorde LB. 2010. Mobile elements reveal small population size in the ancient ancestors of Homo sapiens. Proc Nat Acad Sci USA (early online) doi:10.1073/pnas.0909000107

High Pleistocene human effective population size

Nicholas Wade is reporting on an upcoming paper by Chad Huff and Lynn Jorde: "Genome Study Provides a Census of Early Humans".

The Utah team based its estimate on the genetic variation present in two complete human genomes, one prepared by the government’s human genome project and the other by J. Craig Venter, the genome sequencing pioneer. The government decoded a single copy of a mosaic genome derived from a medley of people, apparently of European and Asian origin. Dr. Venter decoded both copies of his own genome, the one inherited from his father and the one from his mother.

The Utah team thus had three genomes to work with and looked at ancient elements known as Alu insertions, the youngest class of which appeared in the human genome around a million years ago. The amount of variation seen in the DNA immediately surrounding the Alu insertions gave a measure of the size of human population at that time.

Their estimate agrees almost exactly with an earlier one, also based on Alu insertions but with sparser data. The insertions tag ancient regions of the genome that are unaffected by the recent growth in population, Dr. Huff said.

I'll probably write some more notes on this when I can get a copy.

At the moment I think it's worth pointing out that the lede of Wade's story is exactly backward. The story is all about how the effective size estimate, 18,500 effective people, is very low. But in reality that's a high estimate compared to what most human geneticists have assumed, only 10,000 individuals.

Neither estimate is really news. Observations in the early 1970's established that 10,000 was around the right order of magnitude for human effective population size. Around 10 years ago, some gene systems, including Alu insertions, appeared to support a higher estimate of effective size up around 18,000 individuals. That still seemed pretty small in evolutionary terms, and didn't change anybody's ideas about ancient population bottlenecks.

The differences between these estimates have never really been resolved. As more and more genes got sequenced, human geneticists seem to have just standardized on the small estimate of 10,000 effective individuals -- even as they started to apply more and more complicated computer models to try to derive estimates of expansion and bottleneck times. (I wrote about the problem of effective population size last year, "Cultural impedance, demographic growth, effective population size".)

A few years ago we started to get good effective size estimates for other primates. As Wade's article points out, the genetic variation of chimpanzees and gorillas lead to estimates of effective size on the order of 25,000 or so individuals. Geneticists noted that these species are therefore much more diverse than humans, with our puny effective size of around 10,000 individuals. Only bonobos seem to be close to the low human value.

Well, if Huff and Jorde are right, human variation is a lot like the amount of variation in chimanzees and gorillas. Those other apes have lived in geographically structured subspecies spanning tropical Africa for several hundred thousand years.

Or have they? Maybe there were massive bottlenecks and population replacements among chimpanzee subspecies. Maybe there was a recent "out of Congo" migration that accounts for the low genetic variation of bonobos. Maybe chimps themselves derive recently from some part of their current range.

Or, maybe the human effective population size isn't so probative.

In any event, the genomes here are all Eurasian. I wonder how much African genomes will increase the diversity? Could it be that we're even more diverse than chimpanzees?

Mutual information between strings of loci

Fourth in a series on mutual information and genetic linkage. If you’re happening upon it for the first time, you can find the entire series or the first post, “Information theory: a short introduction”.

After the last post, you might wonder what the big deal is about these information theoretic measures of linkage. After all, we’ve got lots of other measures of linkage to choose in population genetics, with many years of theory behind them. The basic conclusion about genetic drift was that it adds mutual information to samples over short regions, but that recombination over longer areas washes it out. If the net effect is no linkage, why would we bother to come up with some non-standard linkage measure?

One answer: If the existing linkage measures were so great for testing neutrality, then we might expect some of the recent genome-wide selection scans to have used them. But they didn’t – instead we have several partially incompatible methods, all of which eschew the usual measures of linkage.

When genetic drift reduces entropy

This is the third in a series on information theory and tests for recent selection. The first post, “Information theory: a short introduction”, covered some of the basics of entropy. The second post, “Information theory and mutual information between genetic loci”, showed that mutual information between independent sites will be distributed as a χ2.

We tend to think of genetic drift as a random process. Random processes operating repeatedly over time are called “stochastic,” and changes in gene frequency under genetic drift are certainly that.

Since entropy is a measure of uncertainty, it might seem natural to think that stochastic changes in gene frequency would increase the entropy in a population. After all, the gene frequency in a population under genetic drift will be more and more uncertain over time. So, considering the frequency of a single allele as the system, genetic drift appears to increase entropy over time.

But even this simple system isn’t quite so simple as it might appear. Sure if you start out knowing the allele frequency, then genetic drift will increase your uncertainty over time. You will become less and less able to say that it lies in any given interval. But what if you don’t start out knowing? What if all you know is that the locus has been subjected to t generations of genetic drift?

As t increases, the probability of fixation of the locus also increases. The net effect is to reduce the entropy in the system – going from uncertainty about the allele frequency to more and more certainty that it will be either one or zero. The only thing that will stop this process is some other evolutionary force – mutation, migration from other populations, balancing selection. Each of these will have its own distinctive effects on the entropy of the single-locus system.

Cultural impedance, demographic growth, effective population size

This is a complicated story with many interlocking parts. Telling the whole story may well take me fifty posts. There's a lot of new science hiding in here waiting to get out.

I'm starting now because of the new paper by Luke Premo and Jean-Jacques Hublin, titled "Culture, population structure, and low genetic diversity in Pleistocene hominins." This paper is not the final word on its topic, nor is it the first word. But it is very much worth reading.

It makes an excellent point of departure to explain what we know and don't know about the genetics of prehistoric humans. Premo and Hublin propose an interesting model with interaction between culture and natural selection, as an explanation for a 35-year-old problem in human evolution: Our low level of genetic variation.

Their model may be right. I certainly think there's a kernel of truth in it, shared with a number of other models, as I'll describe below. And it's testable -- a project to which we'll be returning in the next few months.

Syndicate content