genomics

Earlier this week, the Washington Post printed a nice David Brown story about endogenous retroviruses and evolution.

In sheep, researchers are discovering an especially interesting story.

Sheep today sometimes develop lung or nasal tumors caused by circulating retroviruses. Ancestors of those viruses began creeping into the genome even before sheep and goats diverged from each other more than 5 million years ago.

A team led by Massimo Palmarini of the University of Glasgow Veterinary School studied two of those endogenous retroviruses. They found that wild species (such as bighorn and Dall sheep) had versions of the two retroviruses that differed slightly from the versions carried by domesticated species. The retroviral genes in those animals contained a mutation that impeded infection by the cancer-causing viruses.

In a paper published in November, the researchers argued that when people began rounding up wild sheep 9,000 years ago, the newly confined herds probably suffered epidemics of the cancer-causing viruses. Only those animals whose endogenous viruses had by chance mutated into the protective form survived.

Filed under

I meant to point out this news article when it came out earlier this month. It's a short description of a Scripps-Venter initiative to sequence 2000 healthy 80+-year-olds:

“We are looking at a cohort that we think is harbouring major secrets. They have disease susceptibility genes, but they don't get the diseases you would have expected. Something has protected them. We hope to find out what that is,” says study leader Eric Topol, who is director of genomic medicine at Scripps.

Topol and his team will compare gene sequences from their subjects with the same genes in tissues from a control group they've dubbed the 'illderly'. This second group covers people who died from common, age-related diseases such as cancer, heart attack and stroke before they made it to 80.

Topol and his colleagues Robert Strausberg and Samuel Levy at the Venter Institute finalized their list of 100 candidate genes last week. It includes genes with an unknown or putative role in healthy ageing, and some that are involved in key jobs such as DNA repair and the handling of insulin. The team plans to expand the list to 500 genes over several years and ultimately to sequence the whole genomes of their elderly recruits. So far, the affiliated Scripps Health System has provided the bulk of the costs of the study

I really don't understand why they wouldn't start out with a SNP-chip survey. Maybe they are, and it's just not reported here. The sequencing will be more effective at finding individual sites that are not yet part of the standard surveys, but a lot of interesting variants are likely linked to long-range haplotypes anyway.

Filed under

Larry Moran comments on the Gene Wiki. (If you haven't read about it, check out this AP article, or the PLoS Biology paper). Larry has written before about the errors in sequence databases and how hard it is to fix them, he's one of the people with the most practical experience trying to find ways to remove errors.

His posts are a good way to learn about the limits of these resources. I've seen several cases where incorrect data made it into a database and proliferated through the literature. These cases are extremely hard to root out once they get in. Errors are inevitable -- sometimes things just aren't the way they look. The wiki concept does provide a chance to fix things, or at least a place for annotations of existing errors, as long as credible people are doing the annotations.

I think that the baseline may have the potential as a foundation for a wiki about recent selection on human genes.

Filed under

Genetic Future comments on news from the Wellcome Trust Sanger Institute:

At the current rate (which is rapidly increasing) the Sanger is churning out more DNA sequence every two minutes than was generated by the entire research community from 1982-1987. This obscene rate of data generation has been enabled by the development of next-generation DNA sequencing platforms, which can each churn out one human genome equivalent in less than a week.

Filed under

Ajit Varki profile

Reporter Bruce Lieberman profiles geneticist Ajit Varki in this week's Nature. It's a good summary of Varki's work in sialic acid evolution, focusing on one particular change in the N-glycolyl neuraminic acid (Neu5Gc), work that I touched on here around 3 years ago.

On a molecular level, the difference between Neu5Gc and Neu5Ac is tiny -- a single added oxygen atom perched on one arm distinguishes one from the other (see graphic). But on a biological level, the difference could be enormous. "We thought if monkeys and all of our closest relatives have Neu5Gc and humans don't, then there must be a molecular basis for that," Varki says. He subsequently found it in an enzyme that converts Neu5Ac to Neu5Gc, but which is disabled by mutation in humans.

The article also covers the founding of the Center for Academic Research and Training in Anthropogeny, a research effort of the University of California, San Diego and the Salk Institute. Led by Varki, Margaret Schoeninger, and Pascal Gagneux, the center aims to become an important focus of interdisciplinary work in human origins. I was lucky enough to be invited to one of their research seminars two years ago, and I can say it's a wonderful environment for collaboration, if the project can continue and build on these small meetings:

Between 1998 and 2007, the Project for Explaining the Origin of Humans drew in anthropologists, primate biologists, geneticists, immunologists, neuroscientists, linguists and many others. They discussed topics ranging from the evolution of language to the differences between humans, Neanderthals and Homo erectus, the first hominid to leave Africa. Goodman says the interdisciplinary nature of the series made it extremely important to the field. "You really had the chance to explore an issue as it relates to the evolutionary origins of our species," he says.

...

Varki estimates that he has listened to more than 300 talks on various aspects of this discipline. "The idea is the linguist needs to talk to the molecular biologist who needs to talk to the neuroscientist who needs to talk to the psychologist and philosopher about these issues," he says. "Most areas of human knowledge are somewhere relevant."

I think that's exactly the right attitude -- we need more interdisciplinary efforts. I run up against the blind spots of various specialties all the time, and I'm just one person. On the other hand, it is very challenging to get people to invest the time to learn facts outside their narrow field. If this institute helps those efforts, it will be all to the good.

References:

Lieberman B. 2008. Human evolution: details of being human. Nature 454:21-23. doi:10.1038/454021a

Daniel Macarthur, of Genetic Future, reviews the amount of information required to store genomic information. Naturally, you'd probably think it was around 12 billion bits (2 bits per base pair), but sequencing technologies and the availability of references from other people make things a little more complicated.

This interesting quote about the raw image files generated by the Illumina platform presents some of the range of complications:

Almost as soon as these images are generated they are fed into an algorithm that processes them, creating a set of text files containing the sequence of each of the fragments. The image files are then almost always discarded. Why are they discarded? Because, as you will see in a minute, storing the raw image data from each run in even a moderate-scale sequencing facility quickly becomes prohibitively expensive - in fact, several people have suggested to me that it would be cheaper to just repeat the sequencing than to store these data long-term.

An accurate read requires lots of redundant bits, which adds up to lots and lots of data storage. If these are winnowed down to a real "best" sequence, then you're back to 12 billion bits (=1.5 gigabytes), more or less. Of course, most of that sequence is redundant and may be significantly compressed. And if you compare with a reference sequence, really a small amount of information is sufficient to distinguish your genome compared to the reference. Anyway, all this is explained at the link.

Substitution rates and ancestral population sizes

The rate of neutral mutations varies across the genome. When studying a single gene, this variation in rates is not especially important -- it is generally possible to obtain an estimate of the neutral rate for a single locus by comparing just that locus among closely related species.

But some comparisons involve looking at the pattern of variation among different loci. For instance, testing hypotheses about the ancestral populations leading to living species (like the common ancestor of humans and chimpanzees) involves comparing the amount of divergence among many independent loci. The variance in divergence times among loci gives an estimate of inbreeding in the ancestral population.

I discussed this particular example two years ago this week, after the paper that proposed extended hybridization between ancestral hominids and chimpanzees. The conclusion of the paper was that the X chromosome displays much less divergence between humans and chimpanzees than the autosomes, and this might reflect a late introgression of the X chromosome into hominids from another population that (mostly) was ancestral to chimpanzees. The autosomes, by contrast, averaged very old genetic divergences, although there was substantial variance. As I concluded then, the data look consistent with a large population size in the human-chimpanzee ancestor species, coupled with greater selection on the X chromosome. The interpretation of large population size (or alternatively, the interpretation of long-term population structure) comes from the low inferred inbreeding in that ancestral population -- which caused the variance in divergence dates among loci.

But there is another reason for a large variance in divergence dates: variance in mutation rates. Whenever mutation rates vary among loci, this variance adds to the variance among loci in their between-species genetic differences -- that is, the substitution rate. And as long as we are excluding selected sites (as we always try to do for these kinds of comparisons) we will overestimate the genetic diversity in ancestral species whenever the mutation rate varies among loci.

A new paper by Svitlana Tyakucheva and colleagues looks at human and macaque genomes to find patterns underlying the variance in mutation rates among regions of the genome. They find that a number of factors may cause such variations, including chemical factors like the CG content of the genome, functional causes such as male versus female rates of recombination, and large-scale structural causes such as telomeric proximity:

While a complete understanding of all biological mechanisms leading to variation in neutral substitution rates across the genome remains elusive, it is plausible that at least some of these mechanisms are conserved over relatively long evolutionary distances. For instance, both mouse-specific and rat-specific substitution rates are positively correlated with rodent-primate substitution rates [14], suggesting shared mechanisms persisting over ca. 90 million years [15]. Additionally, a positive correlation exists in substitution rates of homologous X- and Y-chromosomal introns that diverged from each other ca. 100 million years ago [16] (Tykucheva et al. 2008: R76).

Their finding that male recombination is an important contributor to mutation rate heterogeneity puts the focus on the X chromosome -- which has little recombination in males -- as unusual. X versus autosomal position did not explain a large fraction of the variance in this study (only around 2 percent, controlling for other factors) but the deviation was in the right direction to help account for the low X chromosome divergence between humans and chimpanzees.

Altogether in this study, a large fraction of variation in the human-macaque substitution variability could be explained by phenomena that affect the rate of mutations, including the structural and functional factors listed above as well as the corresponding homologous variability between mice and rats, and dogs and cattle. If these variations were explained by inbreeding in the human-macaque ancestral species, they would be random with respect to the dog-cow or mouse-rat divergences, and with respect to structural causes. So current estimates of the effective sizes of human-chimpanzee and other ancestral populations are almost certainly inflated. The amount of inflation is not clear, but a good estimate will require correcting for a large number of factors -- a complicated analysis.

Since the date of the human-chimpanzee divergence depends on our assessment of the diversity within the human-chimpanzee ancestral population, it may be a while before we can settle the issue of human-chimpanzee divergence time. That may or may not provide hope for Sahelanthropus, Orrorin, and Ardipithecus kadabba -- all supposed hominids that would predate 5 million years ago, the current best genetic estimate of the human-chimpanzee divergence time. To be sure, if the date is simply in error, that error might encompass older dates consistent with a 7-million-year divergence. But I'm not sure we should believe that the error is biased toward an older divergence -- "error" might lean in either direction, and a younger species divergence remains possible.

References:

Tyakucheva S, Makova KD, Karro JE, Hardison RC, Miller W, Chiaromonte F. 2008. Human-macaque comparisons illuminate variation in neutral substitution rates. Genome Biol 9:R76. doi:10.1186/gb-2008-9-4-r76

Evolution of the monkeyflowers

Spring has finally come to us here in the North, and it's time to start thinking about planting. So, when I went to a seminar yesterday by John Willis, it was with dual motives.

Naturally, I was interested in hearing about his work relating the evolutionary ecology of Mimulus species to their genomics. As Willis and his many former and current lab members made clear in a recent review article in Heredity, monkeyflowers have become a really interesting model system for studying the dynamics of natural selection on genomes -- particularly, with relation to local ecological adaptation, and also with relation to speciation.

But I was also thinking about whether I could find a nice flower variety for my garden. I'm not particularly excited about peas, and I tolerate Arabidopsis when it comes up, but let's face it, it's not exactly a show flower. I'd love to get one of the prettier hawkweeds going (these have eponymical appeal as well as botanical interest) but the common ones are pretty boring.

Well, Willis's lab has been a center of development for Mimulus genetics. They have developed a store of SNPs and other markers (available at the Mimulus evolution website) for QTL mapping, and are using them to find genes responsible for ecological adaptations in different wild Mimulus populations. In the talk, Willis featured some of his collaborators' work finding genes involved in wet versus dry habitat adaptations and in early versus late flowering. These traits are connected to each other, as well as to other life history, plant size and flower size.

I left having my prior belief abundantly confirmed: botany is awesome. I mean, think about it. You can go outside, in your own neighborhood, and study biology. You can uproot your subjects and transplant them somewhere else, to watch how well they do. If they die, well, that's a data point, not an ethical emergency! Worried about gene-environment interactions? No problem, just put samples of all your subjects in the same greenhouse and wait. Need to isolate a QTL against a uniform genetic background? Cool, just repeatedly backcross it into an inbred line for a few generations, selecting for the trait each time. Want to study genetic correlations? Well, you can breed a thousand plants and select for any trait you want!

Oh, and if you want to, you can clone them.

Let's look at an example, from the Heredity review:

Recent work on floral evolution demonstrates that fundamental evolutionary questions can be addressed in Mimulus through the combination of field experiments and modern genomic approaches. Bradshaw et al. (1995, 1998) pioneered the application of genome mapping to study of ecologically important traits in Mimulus using RAPD and allozyme markers to map floral QTLs underlying the divergence between red-flowered, hummingbird-pollinated M. cardinalis and pink-flowered, bee-pollinated M. lewisii. The initial mapping experiments, with hybrid phenotypes measured in controlled greenhouse environments, revealed QTLs with major effects on virtually every floral character studied, from coloration and morphology to nectar production. To determine the effect of these QTLs on pollinator visitation and discrimination, Schemske and Bradshaw (1999) moved the genotyped hybrids to a field site near one of the few regions where the species coexist, and observed bee and hummingbird visitation behavior. Amazingly, the M. cardinalis allele at a single QTL, YELLOW UPPER (YUP), was responsible for an 80% loss of visitation by bee pollinators, and the M. cardinalis allele at a QTL responsible for variation in nectar production doubled hummingbird visitation (Schemske and Bradshaw, 1999). Bradshaw and Schemske (2003) subsequently created near-isogenic lines (NILs), where heterospecific alleles at YUP were reciprocally introgressed into the parental genetic backgrounds, and evaluated the response of pollinators to the NILs in the field. They observed an even clearer pattern of pollinator discrimination due to this locus, with a 74-fold increase in bee visitation in M. cardinalis NILs that carried the M. lewisii YUP allele, and a 68-fold increase in hummingbird visitation in M. lewisii NILs with the M. cardinalis YUP allele. Although the ecological context, in this case the community of potential pollinators, is certainly important to the evolution of new pollinator associations, these results also demonstrate that single genomic regions can have a large effect on major evolutionary transitions (Wu et al. 2008: 224-225).

The talk was mostly focused on the Mimulus guttatus complex, where some of the most pressing issues are life history, drought tolerance, and tolerance of high mineral concentrations, such as salt or copper. They were able to trace many QTL's of small effect with relation to the major differences in life history and moisture requirements in ecogeographic races of M. guttatus, to show that the within-population variation for these traits is caused by high-frequency (likely balanced) alleles rather than mutation-selection balance or rare alleles, and to find the correlated responses to selection of different plant traits based on different QTL's.

With respect to the genetics of speciation and ecogeographic race formation, they are helped by a long history of research on Mimulus. For example:

Macnair and Christie (1983) performed the first direct genetic analysis of hybrid incompatibilities in Mimulus. While studying the genetic basis of copper tolerance in California populations of M. guttatus, they noticed that some crosses between plants from the copper mines and certain other populations resulted in F1s that died as young seedlings. Further crossing studies revealed that the F1 lethality was caused by a deleterious epistatic interaction between the copper tolerance allele from the mine populations (or a gene tightly linked to it) and alleles at an unknown number of different loci from the other populations. Such deleterious interlocus interactions, usually referred to as Dobzhansky–Muller (D-M) incompatibilities, are thought to be the major cause of low hybrid fitness in plants and animals (reviewed in Coyne and Orr, 2004). Remarkably, it appeared that natural selection for copper tolerance had indirectly resulted in the evolutionary origin of the hybrid incompatibility (Wu et al. 2008:226).

So yes, say what you want, botany is awesome. Plus, there's one more thing: I sat through an entire lecture about natural selection and ecological differentiation of species and races, and never once heard the word, "bottleneck." It was like traveling to some kind of bizarro world where biologists still read Darwin!

So we come down to the really difficult question: which variety am I going to plant? Mimulus glabratus is native here in Wisconsin, including Dane County, but it is not very showy, and prefers wet habitat. That makes it a poor fit for my native plant patch, which is dry/mesic, and which I never water unless the black-eyed Susans and bee balms start to wilt. Mimulus ringens is prettier, with bigger, lavender flowers, but also likes it wet.

I guess I'll have to keep looking. M. lewisii is a pretty variant, if I can find a good source for it, and I can keep it in one of the wetter corners of the yard. I would try for M. cardinalis, since we have hummingbirds sometimes, but I'd like to get Lobelia cardinalis going also, and it's a lot easier to find. Besides, it hardly looks like a monkey!

References:

Wu CA, Lowry DB, Cooley AM, Wright KM, Lee YW, Willis JH. 2008. Mimulus is an emerging model system for the integration of ecological and genomic studies. Heredity 100:220-230. doi:10.1038/sj.hdy.6801018

Probing for the alien within

Laura MacConaill and Matthew Meyerson present a cool short review in Nature Genetics of metagenomics applications in pathogen discovery.

The basic principle is to extract DNA from a tumor or sore, do intensive sequencing of all the DNA in it, and use the computers to subtract out everything human. What's left after you subtract out the human DNA is any pathogen that might be in the sample:

The two recent studies combined computational subtraction with microreactor-based pyrosequencing to identify viral signatures associated with human disease. Feng et al. used high-throughput pyrosequencing15 and comparison to the human transcriptome to identify a viral sequence in a library of cDNAs generated from individuals with Merkel cell carcinoma, a rare but aggressive human skin cancer. The authors sequenced over 395,000 reads of 150-200 bp in length. After digital transcriptome subtraction, 2,395 sequences remained. Among these, conceptual translation of one sequence showed similarity to a polyomavirus. By cloning the complete viral genome and carrying out further analyses, the authors found that the Merkel cell polyomavirus sequence was present in eight of ten Merkel cell carcinomas.
A second group used the same high-throughput DNA sequencing technology to identify a previously undiscovered arenavirus that likely caused the deaths of three transplant recipients who all received organs from a single donor.

I don't know if sequencing will ever get so cheap that this will become practical diagnostic method, but it really doesn't need to be. As soon as you suspect a pathogen, you can probe directly for that pathogen's DNA in a sample -- and there's no barrier to testing for hundreds of pathogens at once. Heck, there ought to be a SNP chip for it.

But this is a potentially important way of identifying new pathogens in unknown samples from scratch. The article mentions that the current cost of this kind of sequencing is around $10,000 per sample, and that is rapidly falling. For that cost, you get the sequence on your computer, even if you can't identify it yet, and who knows -- it might pop up two years later when somebody else finds it in some unexpected place.

References:

MacConaill L, Meyerson M. 2008. Adding pathogens by genomic subtraction. Nat Genet 40:380-382. doi:10.1038/ng0408-380

Filed under

Heritability review

Peter Visscher and colleagues present a long review paper on the concept and use of heritability in the current Nature Reviews Genetics.

Heritability allows a comparison of the relative importance of genes and environment to the variation of traits within and across populations. The concept of heritability and its definition as an estimable, dimensionless population parameter was introduced by Sewall Wright and Ronald Fisher nearly a century ago. Despite continuous misunderstandings and controversies over its use and application, heritability remains key to the response to selection in evolutionary biology and agriculture, and to the prediction of disease risk in medicine. Recent reports of substantial heritability for gene expression and new estimation methods using marker data highlight the relevance of heritability in the genomics era.

There's nothing particularly new here -- the "genomics" in the title doesn't amount to much beyond a discussion of how to estimate heritability from SNP-inferred relationships instead of pedigrees. But much that is old is worthwhile.

It reads like twelve pages out of Falconer -- if Falconer were in a new edition -- and if you don't have Falconer, well, you might do well to read these twelve pages. They include a box about the "heritability of IQ controversy" as well as a discussion of the basic mystery about heritability in natural populations -- why should additive genetic variance be as high as it is?

References:

Visscher PM, Hill WG, Wray NR. 2008. Heritability in the genomics era -- concepts and misconceptions. Nature Rev Genet 9:255-266. doi:10.1038/nrg2322

Filed under

Why have variants influencing recombination rate been selected in non-Africans?

A complicated story is tangled through this paper by Augustine Kong and colleagues, and I don't see where it may end. But here's the abstract:

The genome-wide recombination rate varies between individuals, but the mechanism controlling this variation in humans has remained elusive. A genome-wide search identified sequence variants in the 4p16.3 region correlated with recombination rate in both males and females. These variants are located in the RNF212 gene, a putative ortholog of the ZHP-3 gene that is essential for recombinations and chiasma formation in Caenorhabditis elegans. It is noteworthy that the haplotype formed by two single-nucleotide polymorphisms (SNPs) associated with the highest recombination rate in males is associated with a low recombination rate in females. Consequently, if the frequency of the haplotype changes, the average recombination rate will increase for one sex and decrease for the other, but the sex-averaged recombination rate of the population can stay relatively constant.

Perhaps it's not so curious that alleles of this gene have opposite effects on recombination in males and females. The mechanisms of gamete production are obviously different in the two sexes, and we might expect some kind of frequency-dependent mechanism to regulate recombination. At least, it's a hypothesis.

What I find mysterious is this:

A phylogenetic analysis of a 55-kb region containing rs3796619 and rs1670533 in the HapMap data (24) revealed three well-differentiated clusters of haplotypes showing notable differences in frequency between the Yoruban Nigerians (YRI) and CEU and East Asians (CHB and JPT) (fig. S6). The [C,T] and [T,C] haplotypes that associate most strongly with recombination rate have a combined frequency of only 17% in the YRI sample, but reach a frequency of 91% and 98% in the CEU and East Asian samples, respectively. Several SNPs in this region show an unusual degree of divergence among the HapMap groups, on the basis of the rank percentile of their FST values (Wright's coefficient, a measure of variance in allele frequencies among populations) among all autosomal SNPs with the same overall frequency in the HapMap. Specifically, we identified eight SNPs whose FST values are in the top 0.5% for differences between the YRI and East Asian HapMap samples and also in the top 5% of differences between the YRI and CEU samples. Each of these SNPs differentiated a subset of [T,T] haplotypes from the rest, perhaps indicating an episode of positive selection (or a severe founder effect) that increased the frequency of [C,T] and [T,C] haplotypes in the ancestors of European and East Asian populations.

The [C,T] and [T,C] haplotypes are the ones associated with increased recombination rate in males and females, respectively. The markers are in strong disequilibrium (no [C,C] haplotypes were observed), and seem to have been selected outside of Africa.

I have no idea why.

The recombination rates were all inferred from a large Icelandic sample, so maybe the rates don't really characterize the haplotypes in other populations. Maybe recombination rate is incidental to the real reason for the selection. Or maybe in populations roaring with positive selection on many genes at once, it is a good thing to break them apart more often.

References:

Kong A and 16 others. 2008. Sequence variants in the RNF212 gene associate with genome-wide recombination rate. Science 319:1398-1401. doi:10.1126/science.1152422

The future of genetics is corny

Elizabeth Pennisi's story about maize genomics is a good reminder for why biology will continue to grow in importance toward our understanding of human history:

With $9.1 million from the Mexican government, Jean-Philippe Vielle-Calzada of the National Laboratory of Genomics for Biodiversity in Irapuato and his colleagues have decoded a native "popcorn" strain grown at elevations above 2000 meters. Although still in more than 100,000 pieces, the sequence has revealed many new genes, he reported. This variety's genome "will be of tremendous value in terms of understanding the evolution of [maize] domestication," he says.

Oh, and if you're interested in biology, consider the potential experiments from this:

Another resource introduced at the meeting will help ... sort out how genes interact. The agribusiness giant Syngenta announced it was making available 7500 lines of corn, each representing a B73 genome with a single piece of DNA bred into it from one of the 25 strains of the Maize Diversity Project. Taken together, the lines incorporate all the genetic diversity of those strains but make it easier to understand the activity of particular genes. The community has long awaited these tools, says Brutnell: "They are really going to revolutionize the way we do genetics."

I'd say. Imagine 7500 twins, all identical except for a unique piece of DNA spliced in from some other person. Except with corn, it's not 7500 twins, its 7500 experimental plots full of twins. Now, see what they all do!

References:

Pennisi E 2008. Corn genomics pop wide open. Science 319:1333. doi:10.1126/science.319.5868.1333

Bees R Us

The PNAS Early Edition this week includes a paper by bee genome researchers Amro Zayed and Charles Whitfield. After a short review of honeybee phylogeny, they demonstrate two things:

1. An ancient dispersal of honeybees from Africa into Europe was accompanied by a pulse of positive selection on coding genes, amounting to selection on approximately 10 percent of bee genes.

2. As Africanized bees have spread across South and into North America, adaptive genes from the existing populations of European bees have introgressed into the Africanized population, increasing under positive selection.

These are remarkable parallels to the worldwide evolution of humans. In bees, the geographic pattern is not the same, and the timescale is different, but the overall genetic impact is quite similar.

Here's the bee history:

In its native range, A. mellifera is classified into approximately two dozen subspecies, which are further organized into four major geographically and genetically distinct groups: African, Western and Central Asian (hereafter referred to as Asian), Eastern European, and Western and Northern European (hereafter referred to as West European) (9-11). European honey bees were introduced by humans to the New World by European settlers as early as the 1600s. In Brazil in 1956, an intentional introduction of African honey bees (A. mellifera scutellata), which hybridized with previously introduced European bees, led to the establishment and spread of the highly invasive and economically devastating Africanized honey bees in North America and South America (12). Subsequent studies have shown that Africanized bees are predominantly African in ancestry with minor but consistent contribution from European genotypes (11, 12). Using recently developed SNP panels, Whitfield et al . (11) demonstrated that the honey bee originated in Africa and subsequently expanded into Eurasia in two or more independent ancient expansions. One expansion gave rise to Western European honey bees, and at least one other independent expansion gave rise to Asian and Eastern European honey bees. Honey bee subspecies vary in a host of phenotypic traits, such as morphology, behavior, physiology, and gene expression (9-11, 13, 14) (Zayed and Whitfield 2008:3421).

I was not aware of the initial dispersals of bees into Europe and Asia. The genetic data show that the Western European strains are the ones with the most adaptive evolution since their dispersal from Africa. The separate ancient bee dispersals were documented by Whitfield et al. (2006), but they were not able to provide date estimates for the ancient dispersals, and none are attempted in this study.

This is the kind of test that ought to fail in most wild populations. Without a shift in the adaptive landscape, the fraction of new mutations with potential adaptive value is bound to be small -- because species are optimized to the environments that they have occupied for a long time. But European bees have a number of recent environmental changes, ranging from the simple effect of moving from a tropical to a temperate environment, the need to use new and different flora, and the effects of domestication. In a very numerous, rapidly dispersing species, these effects led to a rapid adaptive response in a large proportion of genes. These are the basic principles underlying the recent acceleration of positive selection in our lineage also.

The introgression of European genes into the dispersing Africanized bees in the Americas is interesting, because it seems counter-intuitive. The main differences between Africanized bees and European bees involve adaptations to climate. European bees put up lots of honey for the winter, and swarm less frequently, in addition to being more sedate. African bees don't bother with as much honey, which together with their more frequent swarming would seem to be a good fit for the tropical pattern of seasonality. These African traits explain why the African bees have spread at the expense of the European bees across the tropical New World. But Africanized bees have picked up a lot of genes from the European bees in the New World.

The authors propose some possible explanations:

The adaptive value of functional (coding) portions of Western European genomes could be related to positive selection on novel variation in West European bees, to positive selection on novel hybrid gene combinations, and/or to selection for heterozygous genotypes. Our study thus provides direct evidence that invasive populations can exploit hybridization in an adaptive fashion -- a finding of immense relevance to understanding the dynamics of biological invasions (Zayed and Whitfield 2008:3424).

In other words, behavioral correlates of climate may be a target of selection and introgression -- I would speculate because of the intrinsic rarity of adaptive mutations in these functions.

This is a relatively course-grained analysis of positive selection, since the study basically averages within SNP categories, determining FST between pairs of populations. For non-coding SNPs, the Africanized bees are very similar to African bees (FST = 0.05), while for coding SNPs they are twice as divergent (FST = 0.10). That's a lot of difference in allele frequencies over a short time; it must have been caused by strong positive selection across a broad sample of loci. They do not attempt the same kind of "10% of genes" estimate for the introgression, but their figures show that it is quite significant across their data.

I don't know but it may be a while before this initial study can be followed up with recombination based selection tests, because of this little known fact: bees have a recombination rate of 19 cM/Mb -- roughly 15 times higher than humans. Still, Whitfield et al. (2006) found an excess of linkage disequilibrium in the West European subspecies of bees. It now seems likely that some of this LD is explained by the widespread selection documented in the current study.

In other words, the genetic structure of global bee populations provides another strong example of the importance of rapid evolution in abundant species, coupled with ecological changes. Bees also now provide a strong example of adaptive introgression -- in this case, within a very tightly timed dispersal with known climatic conditions.

References:

Zayed A, Whitfield CW. 2008. A genome-wide signature of positive selection in ancient and recent invasive expansions of the honey bee Apis mellifera. Proc Nat Acad Sci USA 105:3421-3426. doi:10.1073/pnas.0800107105

Whitfield CW and 9 others. 2006. Thrice out of Africa: Ancient and recent expansions of the honey bee, Apis mellifera. Science 314:642-645. doi:10.1126/science.1132772

The history of junk DNA explored

T. Ryan Gregory (Genomicron) has been writing a long series of posts looking into the history of junk DNA. He's focusing on what research articles were saying about repetitive and noncoding elements like Alu, LINES, SINES, minisatellites and the rest -- both at the time they were discovered and since then.

The series arises from Gregory's irritation about the oft-heard claim that biologists are "discarding the long-held hypothesis that non-coding DNA has no function. For an example, here is the conclusion of a post about functional analysis of non-coding DNA in the 80's:

In other words, there was no real period in which noncoding DNA was dismissed by the scientific community, though there was a much-needed shift away from strictly adaptive interpretations in the 1980s. Some individual researchers ignored noncoding regions, but there is no gap in the literature other than limits on what could be done in a methodological capacity. The "new" view of noncoding DNA as potentially important has been proclaimed regularly for at least as long as the claimed period of neglect between 1980 and 1994.
One wonders just how long we will be told that we have long been neglecting noncoding DNA.

The contrary-to-evolutionists'-claims-junk-DNA-has-function idea is also a staple of intelligent design creationists. As Gregory points out in one of his comments, biologists seem to be "getting their information from textbooks rather from the primary literature." As long as they remain ignorant of the history, they will be susceptible to junk claims.

Too many scientists fail to realize that good literature review is just as important as good research design.

The series is called "Quotes of Interest." I really like the idea -- many posts, grouped together, presenting a shotgun view of the literature on a single question. I have a couple of topics that would benefit from this kind of treatment -- and it's a very bloggy way to write!

Non-identical identical twins

Identical twins may be genetically different due to somatic variations, and a new study by Bruder and colleagues finds that large deletions contribute to some of that difference:

The exploration of copy-number variation (CNV), notably of somatic cells, is an understudied aspect of genome biology. Any differences in the genetic makeup between twins derived from the same zygote represent an irrefutable example of somatic mosaicism. We studied 19 pairs of monozygotic twins with either concordant or discordant phenotype by using two platforms for genome-wide CNV analyses and showed that CNVs exist within pairs in both groups. These findings have an impact on our views of genotypic and phenotypic diversity in monozygotic twins and suggest that CNV analysis in phenotypically discordant monozygotic twins may provide a powerful tool for identifying disease-predisposition loci. Our results also imply that caution should be exercised when interpreting disease causality of de novo CNVs found in patients based on analysis of a single tissue in routine disease-related DNA diagnostics (Bruder et al. 2008:1).

If this is a large source of phenotypic discordance between twins -- that is, one twin gets a disease and the other doesn't because of a non-shared somatic CNV -- then our estimates of the heritability of phenotypes based on MZ-DZ twin comparisons will all be too low. This research group is involved in finding genetic risk factors for Parkinson's disease, and they think somatic SNVs are a promising avenue to explain phenotypic discordance where one twin has Parkinson's and the other does not.

But their study cannot say (because of a lack of power) that phenotypically discordant MZ twins have CNVs that explain the discordance. It's possible that most of the CNVs they observe have no phenotypic effect.

MZ twins represent an excellent focus for such studies [of somatic CNVs] because any genotypic difference between twins derived from the same zygote highlights an irrefutable case of somatic variation. It is likely that the confirmed CNVs shown here represent only the "tip of an iceberg" of all CNVs that are actually present in the studied twins. The notion of somatic variation being more far more common than previously assumed agrees well with our other, recent results showing CNVs between normal, fully-differentiated tissues within an individual human subject (Bruder et al. 2008:4).

This does raise an important question. CNVs are a newly-understood component of human genetic variation, for example in the current paper by Jakobsson and colleagues (2008). But if people often exhibit CNV mosaicism, then some of the rare variants in global samples may be somatic mutations that do not occur in the gene pool of their respective populations. And if there are "hotspots" of CNV mutations, then multiple people might show somatic mutations for the same
variant. It's probably a rare event, but given how little we know about the evolution of CNVs, it might be nice to know how rare.

References:

Bruder CEG and 21 others. 2008. Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles. Am J Hum Genet 82:1-9. doi:10.1016/j.ajhg.2007.12.011

Jakobsson M and 23 others. 2008. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451:998-1003. doi:10.1038/nature06742

Filed under

Serial founder effects, again

A flush of papers this week (two today in Nature, one tomorrow in Science) describe new analyses of SNPs across the genome. Two of the papers sample SNPs in global samples numbering more than 500 individuals.

This Reuters story by Maggie Fox is typical of the press coverage:

Gene studies confirm 'out of Africa' theories
WASHINGTON - Two big genetic studies confirm theories that modern humans evolved in Africa and then migrated through Europe and Asia to reach the Pacific and Americas.
...
The studies, published in the journal Nature on Wednesday, paint a picture of a population of humans migrating off the African continent, and then shrinking at some point because of unknown adversity.
Later populations grew and spread from this smaller genetic pool of founder ancestors -- a phenomenon known as a bottleneck.

These studies have very, very exciting potential. Here in my lab, we will be immediately using the data from these papers to test hypotheses about recent human evolution.

But it is beyond me to understand why anyone thinks that the "serial founder effect" story is news!

For one thing, the idea is based on 12-year-old research demonstrating that human diversity declines for some genetic loci with distance from Africa. This observation was replicated for genome-wide STR loci in a well-publicized paper three years ago. This paper clearly demonstrated how a model involving a chain of bottlenecks could result in a cline of diversity -- one population leaving Africa, a small group from this population moving to Jordan, another small group moving from Jordan to Mesopotamia, another small group from Mesopotamia to the Zagros, etc.

In other words, there's nothing new here. It's no surprise that genome-wide SNPs and copy-number variants (CNVs) should replicate the pattern already shown for genome-wide STRs.

What's worse, all these papers from the Stanford school of genetic orthodoxy fail to even test the hypothesis! I pointed out this problem three years ago:

The data that the paper attempts to explain are (1) the correlation of genetic distance and geographic distance among human populations, and (2) the decrease in genetic diversity in populations farther from Africa. We may ask, what other hypotheses would explain the same data? And what kind of evidence could test these hypotheses, instead of just asserting that they "match" the pattern of evidence.
One scenario that matches the evidence is multiregional evolution with a recent African dispersal of some adaptive genes. This is the hypothesis presented by Eswaran (2002). The idea is that human populations interacted for a long time in Africa and Eurasia, and that during the Late Pleistocene, adaptive changes within Africa allowed those populations to spread alleles into existing populations in Eurasia. The strength of the "founder effect" in this scenario depends on the genetic structure and selective advantage of the new African adaptive complex. Ramachandran et al (2005) actually cite Eswaran (2002) as an example of a serial founder effect. So the idea that there was widespread genetic movement out of Africa does not necessarily imply an out-of-Africa population replacement. The data do not require a replacement, and some -- even many -- of the genetic variants outside of Africa may have nothing to do with recent genetic movement out of Africa.
A second hypothesis is presented by Templeton (2002), who proposed that several founder effects happened at different times in the Pleistocene, each carrying one or more genetic variants out of Africa. The pattern of genetic variation appears to indicate that some genes left Africa during the Lower or Middle Pleistocene, while others dispersed later, during the Late Pleistocene. For Templeton (2002), this pattern indicates multiple dispersals, none of which was sufficient to wipe out the genetic contribution of earlier dispersals. This scenario also would lead to a pattern of correlation of genetic and geographic distance (because most genes have been affected by isolation-by-distance for a long time), while the recurrent dispersals would explain the decline in genetic variation outside of Africa.
A third hypothesis is that population size was simply greater within Africa than within Eurasia. The smaller population size (along with isolation-by-distance) would explain the difference in genetic variation; the correlation of genetic and geographic distance would be explained by isolation-by-distance. We may consider a fourth hypothesis also: that natural selection has tended to create slightly more genetic uniformity within Eurasia and slightly more genetic diversification in Africa. Such a scenario might be justified on ecological grounds: African populations cover a wider range of ecologies and have historically had a greater exposure to zoonotic disease, for example.
Except for the serial founder effect with population replacement, none of the other hypotheses are mutually exclusive. In other words, some genes might have been influenced by natural selection, most might have been somewhat influenced by differences in population size, but the largest effect might have been recurrent population dispersals.

Reading over the whole post, I think it did a good job of laying out the situation with serial founder effects in 2005, and there is little reason to change it now. Still nobody has tested the model! Again, this is a case of science by consistency -- the results of simulations generate the same kind of correlations as the observed data, so the authors claim support for their hypothesis.

But the necessary test should be carried out by dating haplotypes, finding the ages of "founder mutations" and eliminating the possibility of introgression from ancestral Eurasian populations. One of the key points in my earlier post is that the model proposed by Eswaran (2002) would generate exactly the distribution expected for serial founder effects -- despite the fact that it describes a wave of genetic change within an already-established pan-Old-World population.

This study doesn't support an out-of-Africa migration; it merely assumes it. Now, I'm one who thinks that there was an important trend of strong gene flow out of Africa in the Late Pleistocene. But data showing a correlation between diversity and distance from Africa just cannot show the critically important facts about the timing and magnitude of such gene flow.

Somebody will eventually straighten all this out. What I wonder is why it never seems to be the reviewers!

References:

Jakobsson M and 23 others. 2008. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451:998-1003. doi:10.1038/nature06742

Eswaran V, Harpending H, Rogers AR. 2005. Genomics refutes an exclusively African origin of humans. J Hum Evol 49:1-154.

Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, Cavalli-Sforza LL. 2005. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Nat Acad Sci USA 102:15942-15947.

Templeton AR. 1998. Human races: a genetic and evolutionary perspective. Am Anthropol 100:632-650.

Templeton AR. 2002. Out of Africa again and again. Nature 416:45-51.

Retroviruses, immune responses, and vertebrate evolution

Last year's New Yorker piece on retroviral inserts in the human genome made some of my readers curious -- could such retroviral DNA be involved in recent human evolution? I think it's fair to say that I've been asked about retroviruses almost as much as about blue eyes -- and that's saying a lot!

Matt McIntosh has written a nice short piece describing what we know about retroviral genes and placental mammal evolution:

A significant chunk of our DNA had its origins as retroviral DNA. Most of these are now inactive, but a tiny portion actually appear to still code proteins. It's been found in mice, sheep and humans (and presumably generalizes to all placental mammals) that a particular kind of endogenous retrovirus is highly expressed in the outermost layer of the blastocyst (see e.g. Venables et al. 1995 for the human example). Furthermore, when you inhibit the expression of these genes the result is uniform spontaneous abortion immediately following implantation (Dunlap et al. 2006).
Most retroviruses are immunosuppressive, the most infamous example being HIV. Connecting the dots, it's quite plausible that these particular ancient retroviruses have been recruited into the mammalian genome and serve as local immunosuppressors in the uterus during development. In fact, we already know that syncytin, a protein crucial in placenta formation, is the product of a retroviral gene (Knerr et al. 2004), so there's nothing at all far-fetched about this.

If all this pans out, it stands as one of the most important cases of lateral gene transfer in eukaryotic evolution, with early mammals possibly accreting genes from many different viral lineages. As McIntosh points out, these genes are not acting as viruses now; they are imports into our genome -- just like most ancestral mitochondrial proteins are now nuclear genes. McIntosh ends with a short list of retroviral-origin genes that may be active during human development.

Filed under

Loneliness regulates immune system gene expression

So reports Reuters on a study of gene expression in lonely versus non-lonely people:

All 22,000 human genes were studied and compared, and 209 stood out in the loneliest people.
"These 200 genes weren't sort of a random mishmash of genes. They were part of a highly suspicious conspiracy of genes. A big fraction of them seemed to be involved in the basic immune response to tissue damage," Cole said.

I love that: "a highly suspicious conspiracy of genes!" I guess if the genes are there responding to loneliness with greater inflammatory responses, you could call that a conspiracy. They seem likely to link to social hierarchy, with the sort of relationships studied by Robert Sapolsky in baboons. What the article doesn't give (and I'll wait to see if the research indicates) is whether there may be keystone genes that up- or down-regulate this network. From Sapolsky's work, it seems likely that both stress hormones (cortisol) and social hormones, like oxytocin and vasopressin, may play central roles.

These gene chip studies are very quickly going to get more interesting. We'll see both expression and allelic associations for a lot of unexpected things.

Filed under

HIV genetics by the genome

A new whole-genome association study has found more genetic variants protective against HIV. The course of HIV infection is variable, even in the absence of medication, and it has been known for some time that some of the variation in disease progress is attributable to genetic variation among people. One gene variant (CCR5Δ32) is strongly protective against HIV-1; this is because the virus exploits the CCR5 chemokine receptor to infect T cells, and homozygotes for the Δ32 allele do not have this vulnerability.

The new research looked through the entire genome to find single nucleotide polymorphisms (SNPs) associated with variant disease phenotypes:

Understanding why some people establish and maintain effective control of HIV-1 and others do is a priority in the effort to develop new treatments for HIV/AIDS. Using a whole-genome association strategy we identified polymorphisms that explain nearly 15% of the variation among individuals in viral load during the asymptomatic set point period of infection. One of these is found within an endogenous retroviral element and is associated with major histocompatibility allele HLA-B*5701, while a second is located near the HLA-C gene. An additional analysis of the time to HIV disease progression implicated a third locus encoding a RNA polymerase subunit. These findings emphasize the importance of studying human genetic variation as a guide to combating infectious agents.

From a very large study population of infected patients, the authors were able to identify a subset for whom recurrent measurements of viral load and other essential data were available. This allowed them to find genes that associate with the temporal progression of the disease, not just its presence or absence. An article on ScienceNOW by Jon Cohen describes the setup:

The team studied 486 patients infected with HIV who had not received treatment and had known dates of infection and accurate set points. Then they checked blood samples against half a million known variations in DNA sequences, or single-nucleotide polymorphisms, which recently were identified by the International HapMap Project that looked for differences in the genomes of people from many populations. "We've approached this as a straight, quantitative genetic problem," explains David Goldstein, a geneticist at Duke University in Durham, North Carolina, who led the study. The researchers say this is the first study to ever do such a genome-wide association analysis for an infectious disease.

The study identifies a number of other candidates besides the three significant ones that receive most of the discussion. It's tricky to test for significance in genome-wide surveys because the genome is so large and there are potentially many genes with small effects on disease phenotype. Still, genes with small effect (unless rare and highly protective) are not particularly good candidates for therapeutic treatments, so the major ones are the main story.

References:

Fellay J and 26 others. A whole-genome association study of major determinants for host control of HIV-1. Science (online early) doi:10.1126/science.1143767

Filed under

Looking for the balances

A nice paper from last August by K. L. Bubb and colleagues went looking for new balanced polymorphisms in the human genome. They didn't find any.

There's a lot of complexity in the research approach, involved with sifting through SNP data looking for true (i.e., not false) positives. That part is not very interesting, and will probably be superseded by new data. But the plot thickens in the discussion, where the paper reviews patterns of selective balances and the conditions under which they may persist.

Their bottom-line conclusion is that genes under balancing selection are hard to find -- mainly because the current strategy for detecting them requires a long linked haplotype that would have to result from suppressed recombination between two or more linked genes involved in the balance:

This brief analysis suggests that long-term balancing selection may simply be rare in humans and other organisms with similar biology and evolutionary histories. Certainly, this conclusion is compatible with the results of our search for targets of long-term balancing selection in the human genome. Nonetheless, the question still arises as to whether or not we failed to identify such targets simply because we had too little data to analyze. Would we have fared better, for example, if the entire genome were sequenced across 20 human haplotypes? While we cannot exclude that possibility, we suspect that identification of genes under long-term balancing selection will remain a gene-by-gene process, based largely on functional evidence, and not greatly accelerated by genomic analysis because (i) the phenomenon itself is rare and (ii) compatible balancing selection between physically linked loci--a requirement for generating a detectable genomic fingerprint--is also rare. Nonetheless, the fact that balancing selection systems have arisen independently multiple times and involve core functions of multicellular, sexually reproducing organisms (e.g., combating pathogens and avoiding selfing) suggests that, while rare, balancing selection has had major effects on the evolution of metazoan organisms (Babb et al. 2006:2175-2176).

Such long haplotypes exist for the HLA system, but this seems to be an exceptional case. The paper only discusses two other similar systems, both involving epistasis between two or more physically linked sites: color vision polymorphisms in primates (where functionally different alleles are defined by mutations on multiple exons) and the sex chromosomes. So the conclusion of the study is not really that there aren't any balanced polymorphisms, but instead that there aren't any more clear examples of long balanced multiallele haplotypes. If that's what it takes to find balances, it's no surprise that none were discovered.

Still, it's easy to say that now; it was not nearly so obvious a couple of years ago. The fundamental question is not really about selective balances, but instead about epistasis between linked sites. For example, a paper by the same research group in 2005 (Raymond et al. 2005) speculated that HLA-like gene clusters might be very common:

We hypothesize that genomic regions of the type described here will occur commonly in biology even if extreme examples are rare in any given genome. The prerequisites are a cluster of genes that are individually under balancing selection and whose products interact. Under these circumstances, theory predicts precisely the type of long-range hitchhiking of neutral alleles on selected sites that we observe in the HLA class II region (Kelly and Wade 2000). Other gene clusters that are likely to exhibit similar effects include those that encode key components of the self-incompatibility systems present in many flowering plants (Charlesworth et al. 2003; Franklin-Tong and Franklin 2003; Hiscock and Tabah 2003).

Why were they wrong? It is clear that physically linked gene families can evolve that are collectively under epistasis and frequency-dependent selection, and the plant self-incompatibility systems are indeed examples of the same process. But this process may be self-limiting in some respects. Here we have frequency-dependent coadapted gene clusters. Once such a system is started, it seems much more likely that it will be modified by successive alterations than augmented through the addition of an entirely new coadapted gene cluster system. And successive alterations will need to be physically linked to be effective.

But why shouldn't there be similar systems for other functions, besides self-compatibility or immune response? For instance, why shouldn't there be coadapted frequency-dependent brain variants?

Here, I think there may be two different, not mutually-exclusive, answers. One is that there just hasn't been all that much time. With the HLA system, we are looking at polymorphisms that are tens of millions of years old, and over that time span there has been a whole lot of evolution of the brain. So frequency-dependent variations that act in the brain may not have nearly the half-life that immune-related variants do. Maybe we could consider frequency-dependent variations in other tissues, like the liver, but here it is not nearly as obvious why we might see frequency dependence as a selective mechanism.

A second answer is that the human genome already includes a great big zone where heterozygotes are already suppressed -- the X chromosome. With most X loci, you already have an effective mechanism for the emergence of coselected haplotypes, because men only have one copy, and females have partial inactivation of one copy or the other. Many X-linked genes are already part of the major example of coadapted frequency dependence -- sex. But there is no reason why other genes may not be selected in a similar pattern, without necessarily being sex-related.

The HLA really stands out as unusual in this regard, because it is both frequency-dependent and heterotic. It is good to be a heterozygote, and it is good to have a rare genotype. To get both these advantages, the HLA must be on an autosome. But for other coadapted polymorphisms under frequency dependence, it would probably not be such a good idea to be a heterozygote -- these would emerge more readily on the X, where there is much less possibility of epistatic conflicts.

Also in this context, the paper by Bubb et al. (2006) includes a very nice discussion of the ABO polymorphism:

While there are frequent claims for balancing selection at other loci in the literature, the plausibility of most of these cases depends on scenarios for heterozygote advantage. Thus far, the best case for balancing selection in the human genome solely on the basis of greater-than-expected coalescence time is at the locus controlling ABO blood type, specifically between the A and B alleles. ABO is an interesting example because, although it has been known to be polymorphic for >100 years due to its relevance in blood transfusion, its primary evolutionary function remains elusive. The lack of a strongly deleterious genotype satisfies our first proposed criterion that there should be little genetic load. The initial suggestion of long-term balancing selection came from the fact that the AB antigen–antibody phenotype is present in many primates, including some New World monkeys (BLANCHER et al. 2000). Furthermore, it has been shown biochemically that only two nucleotides, separated by 6 bp, differentiate the A allele from the B allele (YAMAMOTO and HAKOMORI 1990) and that these two nucleotides demonstrate apparent trans-species polymorphism within humans, chimpanzees, and gorillas (MARTINKO et al. 1993). In contrast, the O allele appears to have arisen multiple times in humans but is rare in nonhuman primates. When intronic sequence of humans, gorillas, and chimpanzees is compared, there is no evidence for trans-species polymorphism of linked neutral sites, so it has been argued that the two functional polymorphisms reflect convergent evolution (O'HUIGIN et al. 1997). However, if the balanced haplotype is just 8 bp long, it would behave as a single site and have only modest effects on flanking polymorphism levels (WIUF et al. 2004); the six exonic nucleotides between the functional polymorphisms certainly cannot hold enough neutral mutation to provide an accurate estimate of divergence time. Indeed, while polymorphism levels are high in the ABO region—with a MAXDIV of 49, which approaches human–chimpanzee divergence levels—there is no evidence for trans-species polymorphism outside the 8-bp haplotype [SeattleSNPs, NHLBI Program for Genomic Applications, SeattleSNPs, Seattle (http://pga.gs.washington.edu) (October 2005)]. Thus, while we cannot conclude that ABO is another example of trans-species balancing selection, the possibility exists that it is an "invisible" example that cannot be detected by polymorphism studies.

From the perspective of the genome scans, this point about "invisibility" is relevant, but from a broader perspective these details about ABO are important because so many of us use the system as an example in our classes. The O allele is the null allele, and the observation that it has arisen multiple times in humans is very significant.

The paper also includes a good discussion of why differnet kinds of balanced polymorphisms may persist:

While any type of selection that favors maintenance of more than one allele is, by definition, balancing selection, there are multiple mechanisms through which a balance of alleles can be maintained. The most widely recognized mechanism is heterozygote advantage, as in the textbook example of sickle-cell anemia. Although the sickle-cell allele raises the overall fitness of the population, a significant fraction of individuals have decreased survival and reproductive rates as a consequence of this one allele--a phenomenon that has been described as genetic or segregational load. There are two indications that such systems may not be stable. First, a new allele under balancing selection may rise in frequency more quickly than a new allele under positive selection--even one which, in equilibrium state, confers a greater fitness benefit on the population. This is because when a new allele is at a low frequency, the fitness advantage of the heterozygote is most important, while the lower fitness of homozygotes is not yet very relevant. For example, despite the fact that multiple hemoglobinopathy-related alleles (including the one responsible for sickle-cell anemia) have arisen independently in response to selective pressure by malaria, an allele exists (HbC) that is protective against malaria in the homozygous state and more weakly in the heterozygous state as well. Neither state is associated with hemoglobinopathy. Given enough time under continued selective pressure, it is expected that this allele would sweep through the at-risk region and increase the total population fitness (MODIANO et al. 2001). Second, in general, one can imagine some combination of gene duplication and regulatory modifications that would allow all individuals to have the benefits of both alleles of a gene under balancing selection (SPOFFORD 1969), as is illustrated by the evolution of separate middle-wavelength and long-wavelength color-vision genes in Old World monkeys and Great Apes.
In contrast, frequency-dependent selection does not require a steady-state fitness differential and, therefore, confers less load on a population (KOJIMA 1971). Consequently, this type of balancing selection is probably more stable than instances that depend on heterozygote advantage (Baab et al. 2006:.

That's an important point. Long-distance physical epistasis may be rare among frequency-dependent variants, but the frequency-dependent mechanism itself is in many respects more stable than heterosis. And in fact, under a game theoretic scenario, different alleles under frequency-dependent selection may actually be more stable, the more different they are. This is because certain mixed strategies are more stable when the differences between them are more exaggerated. In some instances, the exaggeration includes highly visible phenotypic signals.

The paper ends with a suggestion that balancing selection may not be that common after all, since they didn't really find much evidence for it:

We hypothesize that balancing selection most frequently arises in transient situations when the environment changes rapidly. Balancing-selection systems may largely be evolutionary "band-aids" that survive only until a more stable strategy arises, based on gene duplication and divergence, or until the rise of a more evolutionarily successful allele. This view is reminiscent of arguments supporting the less-is-more hypothesis (OLSON 1999); indeed, many suspected examples of recent balancing selection involve maintenance of nonfunctional or subfunctional alleles in the population (e.g., ccr5, F508, HbS).

For null alleles, this may well be true. Breaking something irreparably is easy, but probably not optimal.

For some genes, frequency-dependent variants may have a substantial lifespan. I think this is a gene-by-gene question: sometimes there will be individual genes that create selective balances, but these are a lot more likely to be single polymorphisms than long haplotypes with multiple selected genes.

References:

Bubb KL, Bovee D, Buckley D, Haugen E, Kibukawa M, Paddock M, Palmieri A, Subramanian S, Zhou Y, Kaul R, Green P, Olson MV. 2006. Scan of human genome reveals no new loci under ancient balancing selection. Genetics 173:2165-2177. doi:10.1534/genetics.106.055715

Raymond CK, Kas A, Paddock M, Qiu R, Zhou Y, Subramanian S, Chang J, Palmieri A, Haugen E, Kaul R, Olson MV. 2005. Ancient haplotypes of the HLA Class II region. Genome Res 15:1250-1257. doi:10.1101/gr.3554305

Syndicate content