Looking for local selection via STR diversity

This is one of those old papers I run across sometimes doing research:

A Genome Scan to Detect Candidate Regions Influenced by Local Natural Selection in Human Populations
Manfred Kayser, Silke Brauer and Mark Stoneking
As human populations dispersed throughout the world, they were subjected to new selective forces, which must have led to local adaptation via natural selection and hence altered patterns of genetic variation. Yet, there are very few examples known in which such local selection has clearly influenced human genetic variation. A potential approach for detecting local selection is to screen random loci across the genome; those loci that exhibit unusually large genetic distances between human populations are then potential markers of genomic regions under local selection. We investigated this approach by genotyping 332 short tandem repeat (STR) loci in Africans and Europeans and calculating the genetic differentiation for each locus. Patterns of genetic diversity at these loci were consistent with greater variation in Africa and with local selection operating on populations as they moved out of Africa. For 11 loci exhibiting the largest genetic differences, we genotyped an additional STR locus located nearby; the genetic distances for these nearby loci were significantly larger than average. These genomic regions therefore reproducibly exhibit larger genetic distances between populations than the "average" genomic region, consistent with local selection. Our results demonstrate that genome scans are a promising means of identifying candidate regions that have been subjected to local selection.

I hadn't noticed this wrinkle at the time, but it would seem that this study proves that STR variation is noticeably affected by positive selection at linked sites.

There is of course something fundamentally circular about choosing the "most extreme" differences among loci to define selection. After all, who is to say that 11 is the right number to choose? Why not 50? Why not 100? In fact, the study picked 15 loci and found only 11 with nearby STR's that could be typed. So 11 out of 332 isn't really the proportion; there are an unknown proportion of STR's affected by positive selection, some of which are affected differently by local positive selection in different populations and as a consequence show significantly high RST values.

In any event, this certainly makes suspect the idea that STR loci are unbiased neutral markers for reconstructing population history. Some of them may be, but at present we can't tell the "neutral" ones from the ones linked to positively selected sites. The proportion of these loci linked to sites under local positive selection -- the kind surveyed in this study -- might be relatively small. But how many are linked to sites under global positive selection?

Studying linkage with STR markers is tough. We are already throwing out STR loci with low repeat variance, because they don't vary. But some of these loci have low repeat variance because they have short genealogies -- either as a consequence of selection (presumably on linked sites) or drift. Now, we have to sift through loci that are variable to assess how much their variability may be affected by linkage to selected sites. And their variability isn't affected in any simple way -- complete linkage to a site currently under selection would increase the frequency of one or a few alleles at an STR site; partial linkage might affect different alleles in different populations; in neither case is there any easy way to tell what's going on -- because as the paper explains, there isn't even a theoretical distribution to compare them against. What a mess!

And this is interesting:

We assumed that local selection would primarily influence Europeans, as modern humans originated in Africa, and hence new opportunities for local selection would have occurred as modern human populations spread out of Africa. Some support for this assumption comes from the distribution of ln RV [ratio of allele size variance in Africans vs. Europeans] values (fig. 3), in which there is an excess of extreme positive values (i.e., in the right-hand tail); since the variance in Africans appears in the numerator of the ln RV value, this indicates that there are many more loci showing significantly reduced variation in Europeans (relative to Africans) than in Africans (relative to Europeans). However, this should be interpreted cautiously, as an extreme bottleneck in Europeans, which is suggested by some genetic data (Tishkoff et al. 1996; Yu et al. 2002), could also lead to an excess of loci with significantly reduced variation in Europeans relative to Africans (Schlötterer 2002a).

There's another reason for cautious interpretation: the European sample was all from Leipzig, while the African sample was taken from four groups in different locations in Ethiopia and South Africa!

In fact, that is a generally unrecognized problem in these kinds of studies: Are African and European samples apples and oranges? How much does post-Pleistocene population history affect the genetics of these populations? How much difference does it make sampling people in a large city vs. people in a bunch of villages? How much difference does local selection make within continents, if it already seems to make a large difference between them?

Clearly Africans are more variable on average than Europeans -- that's not at issue. But too many studies treat that observation as the end of the issue --- if you observe that your African sample is more variable than your European sample, then nothing else about them matters, QED. In this study, a lot of loci were more variable in the European sample than the African sample (i.e., have negative lnRV). Is the proportion of such loci informative? Depends on how much the sampling scheme might have affected the diversity of the samples.

I don't think there is really any easy solution to this sampling problem -- we aren't going to know how much difference the Neolithic may have made for a long time, for instance. But skepticism seems like a healthy attitude.

References:

Kayser M, Brauer S, Stoneking M. 2003. A genome scan to detect candidate regions influenced by local natural selection in human populations. Mol Biol Evol 20:893-900. Full text online