Alon Keinan and Andrew Clark have a short report in the current Science examining the effects of recent human population growth on the expected spectrum of human genetic variation
Why is this? In a constant-sized population, individuals have an average of two offspring who survive to have offspring of their own. Many people have no children at all, or only one, while only a small proportion of people have more than four children. In the constant-sized population, a person born with a new mutation would have a 50% chance of passing it on to each child. In such a population, more than a third (36%) of mutations aren't passed on even once. The same fraction are inherited by only one child, and these face the same odds of extinction in the next generation. This isn't natural selection, it is random genetic drift -- and its net result is that most new mutations are lost.
In a growing population, individuals average more than two offspring. Every additional offspring increases the chance that a new mutation will be passed on to the next generation. In other words, more people means less genetic drift. As a population grows, new mutations begin to stack up at low frequencies in the population.
This is a very basic point in population genetic theory, and it interacts in a troubling way with the current generation of sequencing technology. Short-read shotgun sequencing yields a high number of false positive mutations, which must be aggressively filtered out of whole genome data. If we don't filter these out, we will arrive at incorrect conclusions about many aspects of human biology. The simplest means of filtering require some understanding of how many rare mutations you expect to find, in particular how many should be found in only one person in a sample of people. That expectation is different in a growing population, resulting in a potentially large bias.
Despite an improvement in the accuracy of sequencing technologies, some errors remain unavoidable. For example, with a sequencing error rate of 1 in 10,000 bases, in a sample of 10,000 individuals, each base pair will exhibit two errors on average across the sample and the majority of monomorphic sites will appear polymorphic (most often as a singleton or a doubleton; i.e., with the rare allele present in one or two copies in the sample). On the other hand, strict filtering of the data will lead to missing many rare variants because they are not observed as reliably. Hence, any analysis of large sample sizes must account for the uncertainty inherent in sequencing by considering the variant calls probabilistically, and secondary validation of rare variants by an alternate sequencing procedure is essential.
Keinan and Clark present some models that show how much it matters to consider a growing population compared to the usual null model of constant population size.
It's so interesting to me to see human geneticists catching up to where anthropologists have been for a long time. Of course, we wrote about the effects of recent population expansions in 2007, noting the apparent acceleration of positive selection in post-agricultural populations ("Why human evolution accelerated")
Large-scale sequencing projects have moved beyond simply categorizing common genetic variation. They are now at a stage where thousands of individuals need to be examined, to find increasingly rare genetic variations and determine their collective effects on phenotypes. That means that the next version of the 1000 Genomes Project really needs to be involve many of us who are directly concerned with human population history. The growth and dynamics of actual historic human populations are going to matter to how we understand their genetic variation and its effects on phenotypes. Fortunately, archaeology and written history can help -- if anthropologists are involved in this work from the start!