Looking for the balances

A nice paper from last August by K. L. Bubb and colleagues went looking for new balanced polymorphisms in the human genome. They didn't find any.

There's a lot of complexity in the research approach, involved with sifting through SNP data looking for true (i.e., not false) positives. That part is not very interesting, and will probably be superseded by new data. But the plot thickens in the discussion, where the paper reviews patterns of selective balances and the conditions under which they may persist.

Their bottom-line conclusion is that genes under balancing selection are hard to find -- mainly because the current strategy for detecting them requires a long linked haplotype that would have to result from suppressed recombination between two or more linked genes involved in the balance:

This brief analysis suggests that long-term balancing selection may simply be rare in humans and other organisms with similar biology and evolutionary histories. Certainly, this conclusion is compatible with the results of our search for targets of long-term balancing selection in the human genome. Nonetheless, the question still arises as to whether or not we failed to identify such targets simply because we had too little data to analyze. Would we have fared better, for example, if the entire genome were sequenced across 20 human haplotypes? While we cannot exclude that possibility, we suspect that identification of genes under long-term balancing selection will remain a gene-by-gene process, based largely on functional evidence, and not greatly accelerated by genomic analysis because (i) the phenomenon itself is rare and (ii) compatible balancing selection between physically linked loci--a requirement for generating a detectable genomic fingerprint--is also rare. Nonetheless, the fact that balancing selection systems have arisen independently multiple times and involve core functions of multicellular, sexually reproducing organisms (e.g., combating pathogens and avoiding selfing) suggests that, while rare, balancing selection has had major effects on the evolution of metazoan organisms (Babb et al. 2006:2175-2176).

Such long haplotypes exist for the HLA system, but this seems to be an exceptional case. The paper only discusses two other similar systems, both involving epistasis between two or more physically linked sites: color vision polymorphisms in primates (where functionally different alleles are defined by mutations on multiple exons) and the sex chromosomes. So the conclusion of the study is not really that there aren't any balanced polymorphisms, but instead that there aren't any more clear examples of long balanced multiallele haplotypes. If that's what it takes to find balances, it's no surprise that none were discovered.

Still, it's easy to say that now; it was not nearly so obvious a couple of years ago. The fundamental question is not really about selective balances, but instead about epistasis between linked sites. For example, a paper by the same research group in 2005 (Raymond et al. 2005) speculated that HLA-like gene clusters might be very common:

We hypothesize that genomic regions of the type described here will occur commonly in biology even if extreme examples are rare in any given genome. The prerequisites are a cluster of genes that are individually under balancing selection and whose products interact. Under these circumstances, theory predicts precisely the type of long-range hitchhiking of neutral alleles on selected sites that we observe in the HLA class II region (Kelly and Wade 2000). Other gene clusters that are likely to exhibit similar effects include those that encode key components of the self-incompatibility systems present in many flowering plants (Charlesworth et al. 2003; Franklin-Tong and Franklin 2003; Hiscock and Tabah 2003).

Why were they wrong? It is clear that physically linked gene families can evolve that are collectively under epistasis and frequency-dependent selection, and the plant self-incompatibility systems are indeed examples of the same process. But this process may be self-limiting in some respects. Here we have frequency-dependent coadapted gene clusters. Once such a system is started, it seems much more likely that it will be modified by successive alterations than augmented through the addition of an entirely new coadapted gene cluster system. And successive alterations will need to be physically linked to be effective.

But why shouldn't there be similar systems for other functions, besides self-compatibility or immune response? For instance, why shouldn't there be coadapted frequency-dependent brain variants?

Here, I think there may be two different, not mutually-exclusive, answers. One is that there just hasn't been all that much time. With the HLA system, we are looking at polymorphisms that are tens of millions of years old, and over that time span there has been a whole lot of evolution of the brain. So frequency-dependent variations that act in the brain may not have nearly the half-life that immune-related variants do. Maybe we could consider frequency-dependent variations in other tissues, like the liver, but here it is not nearly as obvious why we might see frequency dependence as a selective mechanism.

A second answer is that the human genome already includes a great big zone where heterozygotes are already suppressed -- the X chromosome. With most X loci, you already have an effective mechanism for the emergence of coselected haplotypes, because men only have one copy, and females have partial inactivation of one copy or the other. Many X-linked genes are already part of the major example of coadapted frequency dependence -- sex. But there is no reason why other genes may not be selected in a similar pattern, without necessarily being sex-related.

The HLA really stands out as unusual in this regard, because it is both frequency-dependent and heterotic. It is good to be a heterozygote, and it is good to have a rare genotype. To get both these advantages, the HLA must be on an autosome. But for other coadapted polymorphisms under frequency dependence, it would probably not be such a good idea to be a heterozygote -- these would emerge more readily on the X, where there is much less possibility of epistatic conflicts.

Also in this context, the paper by Bubb et al. (2006) includes a very nice discussion of the ABO polymorphism:

While there are frequent claims for balancing selection at other loci in the literature, the plausibility of most of these cases depends on scenarios for heterozygote advantage. Thus far, the best case for balancing selection in the human genome solely on the basis of greater-than-expected coalescence time is at the locus controlling ABO blood type, specifically between the A and B alleles. ABO is an interesting example because, although it has been known to be polymorphic for >100 years due to its relevance in blood transfusion, its primary evolutionary function remains elusive. The lack of a strongly deleterious genotype satisfies our first proposed criterion that there should be little genetic load. The initial suggestion of long-term balancing selection came from the fact that the AB antigenantibody phenotype is present in many primates, including some New World monkeys (BLANCHER et al. 2000). Furthermore, it has been shown biochemically that only two nucleotides, separated by 6 bp, differentiate the A allele from the B allele (YAMAMOTO and HAKOMORI 1990) and that these two nucleotides demonstrate apparent trans-species polymorphism within humans, chimpanzees, and gorillas (MARTINKO et al. 1993). In contrast, the O allele appears to have arisen multiple times in humans but is rare in nonhuman primates. When intronic sequence of humans, gorillas, and chimpanzees is compared, there is no evidence for trans-species polymorphism of linked neutral sites, so it has been argued that the two functional polymorphisms reflect convergent evolution (O'HUIGIN et al. 1997). However, if the balanced haplotype is just 8 bp long, it would behave as a single site and have only modest effects on flanking polymorphism levels (WIUF et al. 2004); the six exonic nucleotides between the functional polymorphisms certainly cannot hold enough neutral mutation to provide an accurate estimate of divergence time. Indeed, while polymorphism levels are high in the ABO regionwith a MAXDIV of 49, which approaches humanchimpanzee divergence levelsthere is no evidence for trans-species polymorphism outside the 8-bp haplotype [SeattleSNPs, NHLBI Program for Genomic Applications, SeattleSNPs, Seattle (http://pga.gs.washington.edu) (October 2005)]. Thus, while we cannot conclude that ABO is another example of trans-species balancing selection, the possibility exists that it is an "invisible" example that cannot be detected by polymorphism studies.

From the perspective of the genome scans, this point about "invisibility" is relevant, but from a broader perspective these details about ABO are important because so many of us use the system as an example in our classes. The O allele is the null allele, and the observation that it has arisen multiple times in humans is very significant.

The paper also includes a good discussion of why differnet kinds of balanced polymorphisms may persist:

While any type of selection that favors maintenance of more than one allele is, by definition, balancing selection, there are multiple mechanisms through which a balance of alleles can be maintained. The most widely recognized mechanism is heterozygote advantage, as in the textbook example of sickle-cell anemia. Although the sickle-cell allele raises the overall fitness of the population, a significant fraction of individuals have decreased survival and reproductive rates as a consequence of this one allele--a phenomenon that has been described as genetic or segregational load. There are two indications that such systems may not be stable. First, a new allele under balancing selection may rise in frequency more quickly than a new allele under positive selection--even one which, in equilibrium state, confers a greater fitness benefit on the population. This is because when a new allele is at a low frequency, the fitness advantage of the heterozygote is most important, while the lower fitness of homozygotes is not yet very relevant. For example, despite the fact that multiple hemoglobinopathy-related alleles (including the one responsible for sickle-cell anemia) have arisen independently in response to selective pressure by malaria, an allele exists (HbC) that is protective against malaria in the homozygous state and more weakly in the heterozygous state as well. Neither state is associated with hemoglobinopathy. Given enough time under continued selective pressure, it is expected that this allele would sweep through the at-risk region and increase the total population fitness (MODIANO et al. 2001). Second, in general, one can imagine some combination of gene duplication and regulatory modifications that would allow all individuals to have the benefits of both alleles of a gene under balancing selection (SPOFFORD 1969), as is illustrated by the evolution of separate middle-wavelength and long-wavelength color-vision genes in Old World monkeys and Great Apes.
In contrast, frequency-dependent selection does not require a steady-state fitness differential and, therefore, confers less load on a population (KOJIMA 1971). Consequently, this type of balancing selection is probably more stable than instances that depend on heterozygote advantage (Baab et al. 2006:.

That's an important point. Long-distance physical epistasis may be rare among frequency-dependent variants, but the frequency-dependent mechanism itself is in many respects more stable than heterosis. And in fact, under a game theoretic scenario, different alleles under frequency-dependent selection may actually be more stable, the more different they are. This is because certain mixed strategies are more stable when the differences between them are more exaggerated. In some instances, the exaggeration includes highly visible phenotypic signals.

The paper ends with a suggestion that balancing selection may not be that common after all, since they didn't really find much evidence for it:

We hypothesize that balancing selection most frequently arises in transient situations when the environment changes rapidly. Balancing-selection systems may largely be evolutionary "band-aids" that survive only until a more stable strategy arises, based on gene duplication and divergence, or until the rise of a more evolutionarily successful allele. This view is reminiscent of arguments supporting the less-is-more hypothesis (OLSON 1999); indeed, many suspected examples of recent balancing selection involve maintenance of nonfunctional or subfunctional alleles in the population (e.g., ccr5, F508, HbS).

For null alleles, this may well be true. Breaking something irreparably is easy, but probably not optimal.

For some genes, frequency-dependent variants may have a substantial lifespan. I think this is a gene-by-gene question: sometimes there will be individual genes that create selective balances, but these are a lot more likely to be single polymorphisms than long haplotypes with multiple selected genes.

References:

Bubb KL, Bovee D, Buckley D, Haugen E, Kibukawa M, Paddock M, Palmieri A, Subramanian S, Zhou Y, Kaul R, Green P, Olson MV. 2006. Scan of human genome reveals no new loci under ancient balancing selection. Genetics 173:2165-2177. doi:10.1534/genetics.106.055715

Raymond CK, Kas A, Paddock M, Qiu R, Zhou Y, Subramanian S, Chang J, Palmieri A, Haugen E, Kaul R, Olson MV. 2005. Ancient haplotypes of the HLA Class II region. Genome Res 15:1250-1257. doi:10.1101/gr.3554305