john hawks weblog

:: paleoanthropology, genetics, and evolution
About me | Fossil hominids | Topics | Reviews | Courses

Advanced Search

Favorite spots:

Recent stories:

Blogroll

Now trying out:

Biological Anthropology:

Evolution and genetics:

Archaeology blogs:

Science blogs:

Cog blogs:

Eschewing reductionism:

Non-science blogs

Professional organizations:

Syndication

Kabwe

SK 48

D2700

HapMap "dumpster diving"

home :: reviews :: genomics

Jennifer Couzin of Science has a conference report about the HapMap, and many of its uses.

It's a good summary. I especially liked this:

Another group at the Broad Institute is examining data that were sequenced and publicly released by HapMappers but didn't make it into the final HapMap because they were deemed erroneous. "This project is kind of a dumpster dive," says the Broad Institute's Steven McCarroll. He and his colleagues found that thousands of the flaws are actually inherited DNA deletions. They've identified 10 commonly deleted genes, including two for sex steroid hormone metabolism and three for drug metabolism. They're now studying whether those deletions might contribute to disease.

I think this is an important message that isn't often articulated:

Despite such enthusiasm, some researchers say they're not certain just how the HapMap will aid their own genetic studies. The map's central goal is to help identify genes behind common diseases such as cancer, but it's not always clear how to apply it. When it comes to evolution studies, for example, the map may be biased because it prefers common SNPs to rare ones. "The HapMap project was not about studying population history," says NHGRI's James Mullikin. But it's being used often by researchers in that area.

Of course, there's nothing magical about an association study -- just genotype a lot of people who have a disease. And it's surprisingly easy lately with gene chips to survey all these SNPs at once. But the key is that these SNPs are usually not the mutations we're looking for. The hope is that they are markers for the mutations, and that the key mutations will be on long enough linkage blocks to be found. Sometimes it will work, but the question is, how often?

But in terms of population history -- now there we might see some interesting results....

References:

Couzin J. 2006. The HapMap gold rush: researchers mine a rich deposit. Science 312:1131. DOI link

Posted at 21:39 on 05/25/2006 | permanent link

Read other posts in /reviews/genomics


Epistasis and evolution

home :: reviews :: genomics

Razib at Gene Expression has a very informative post referring to the edited volume Epistasis and the Evolutionary Process (Wolf et al. 2000). I'm posting a reference myself because I want to remember and return to the topic later.

It's full of Sewall Wright goodness.

References:

Wolf JB, Brodie ED III, Wade MJ. 2000. Epistasis and the evolutionary process. Oxford University Press, New York.

Posted at 00:04 on 07/10/2005 | permanent link

Read other posts in /reviews/genomics


Chromosomal inversions in human evolution

home :: reviews :: genomics

A new paper by Lars Feuk et al. in PLoS Genetics is reporting on widespread gene inversions in humans. A press release at ScienceDaily announced the paper, and summarizes the results concisely:

According to [Stephen W.] Scherer, prior to this research, only nine inversions between humans and chimps had been identified. Using a computational approach, Scherer's group identified 1,576 presumed inversions between the two species, 33 of which span regions larger than 100,000 base pairs--a sizeable chunk of DNA. The average human gene is smaller, only about 60,000 bases in length.
Scherer's team experimentally confirmed 23 out of 27 inversions tested so far. Moreover, by comparing the chimp genome with its ancestor, the gorilla genome, they determined that more than half of the validated inversions flipped sometime during human evolution.

And of course if several hundred inversions occurred during human evolution, you can bet that 15 percent or so of them will vary among humans:

Perhaps even more interesting than the abundance of inversions that Scherer's group unveiled was their discovery that a subset of the inversions are polymorphic--taking different forms--within humans, meaning that the human genome is still evolving. When the 23 experimentally confirmed inversions were tested against a panel of human samples, the scientists found three inversions with two alleles or pairs of genes displaying the human inversion in some people, whereas others had one allele of the human inverted sequence and one allele of the normal sequence in chimps.

Three out of 23 is 13 percent, and the paper has this to say:

It would be expected that a certain fraction of the differences found between the human and chimpanzee assemblies are polymorphic in one of the two species, but perhaps not to the extent (13%) observed in this study (Feuk et al. 2005:e56).

Not surprising at all. Considering that most human loci have genealogical coalescents within the last million years, purely neutral alleles within any given genealogical linage have around a 15 percent chance (one million out of the six-million-year divergence) of occurring after the coalescent, and therefore being polymorphic. In reality, the odds of polymorphism are even higher, depending on the number and distribution of individuals used to ascertain variants.

But are these inversions neutral? According to the paper, the polymorphic inversions occur at 5, 30, and 48 percent frequencies (no indication whether the minor allele is ancestral or derived compared to primates). Few variants that are very much rarer than 5 percent will be ascertained by HapMap. At the same time, alleles over 5 percent frequency are pretty unlikely to be deleterious. Their high frequencies might suggest that these inversion variants have themselves been positively selected.

References:

Feuk L, MacDonald JR, Tang T, Carson AR, Li M, Rao G, Khaja R, Scherer SW. 2005. Discovery of Human Inversion Polymorphisms by Comparative Analysis of Human and Chimpanzee DNA Sequence Assemblies. PLoS Genet 1:e56. Posted at 00:18 on 11/01/2005 | permanent link

Read other posts in /reviews/genomics


Variation in gene expression evolves rapidly

home :: reviews :: genomics

A study in Nature (11/10/05, full text by subscription) by Scott Rifkin and colleagues performs an interesting experiment on gene expression and mutation.

Beginning with 12 identical lines of Drosophila melanogaster, they allowed each to reproduce for 200 generations. During this time, they expected the lines to accumulate some mutations that would affect gene expression. But how many?

On the basis of studies in D. melanogaster, we estimate that each of our 12 mutation accumulation lines contains around 360 mutations. We measured gene expression levels during the third larval instar (before puparium formation; BPF) and at puparium formation (PF), before and after the peak of a large pulse of the hormone 20-hydroxyecdysone that triggers the start of metamorphosis (see Methods and Supplementary Fig. 1). This stage is one of substantial transcriptional activity and turnover, with broad intra- and interspecific variation in gene expression. Of 11,798 genes measured, we detected significant Vm for 3,816 genes at the BPF stage, for 3,475 genes at the PF stage and for 4,658 genes overall, using a false discovery rate (FDR) of 0.05. The expression of 5,729 genes significantly differed between the two stages, although only 2,509 of these genes showed significant Vm (FDR = 0.05) (Rifkin et al. 2005, citations omitted).

The study is about differences in mRNA transcription in these lines, so it is not necessarily generalizable to everyday traits. But their preferred model to explain the level of mutational variance is interesting:

Third, network output, namely the production of a particular product at a specific place and time, may be the target of selection rather than gene expression itself. As in enzyme flux models, the selective effect of any particular change in gene expression may be negligible over a range of values but become substantial when the abundance of mRNA becomes rate limiting or when the variation becomes otherwise functionally relevant. Stabilizing selection, by canalizing network output against perturbations, may facilitate neutrality among members of the underlying network. Gene expression would be able to tolerate a moderate number of mild mutations but would trigger strong selection if the network output were substantially affected. Such a model could also account for moderate correlations between mutational and interspecific variation even when the total level of between-species divergence is far less than expected under neutrality (ibid.).

It's an argument of developmental robusticity via canalization. Individual genes are free to vary somewhat, as long as they don't surpass some threshold. But if the network output exceeds its tolerances, selection kicks in. One possible adaptive response is for certain genes to reduce the phenotypic variability, either by reducing the effects of environmental variability (canalization) or by reducing the disruptive effects of mutations internal to the network (developmental robusticity).

References:

Rifkin SA, Houle D, Kim J, White KP. 2005. A mutation accumulation assay reveals a broad capacity for rapid evolution of gene expression. Nature 438:220-223. Full text (subscription)

Posted at 22:53 on 11/09/2005 | permanent link

Read other posts in /reviews/genomics


HapMap

home :: reviews :: genomics

This week's Nature has over twenty pages of HapMap coverage. Here's the abstract of the main paper:

Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

I'm sort of scanning the stuff for interesting quotes now. Here's one, on the concern about false positives in phenotypic associations with SNPs:

Given the potential for confusion if associations of uncertain validity are widely reported (and a persistent tendency towards genetic determinism in public discourse), we urge conservatism and restraint in the public dissemination and interpretation of such studies, especially if non-medical phenotypes are explored. It is time to create mechanisms by which all results of association studies, positive and negative, are reported and discussed without bias.

Following its own advice, the study discusses evidence for local and global selection in remarkably sterile language. The HapMap is not ideal for finding evidence of selection -- the focus on only common variants tends to exclude several of the usual tests of selection, as well as low-frequency selected variants. But some interesting details are in the data, like this:

First we consider population differentiation, generally accepted as a clue to past selection in one of the populations. The HapMap data reveal 926 SNPs with allele frequencies that differ across the analysis panels in a manner as extreme as the well-accepted example of selection at the Duffy (FY) locus (Supplementary Fig. 8c). Of these 926 SNPs, 32 are non-synonymous coding SNPs and many others occur in transcribed regions, making them strong candidates for functional polymorphisms that have experienced geographically restricted selection pressures.

I've decided I admire statisticians who make it their work to figure out how to wring multiple-comparisons tests out of 3 billion base pairs of sequence.

The HapMap is an incredible step forward in characterizing human genetic variation. It's a challenging dataset to work with, though. It's like an old map showing continent margins and little else -- we can see many of the common SNPs, but for most we have no idea which ones are functional or what they might do.

But there's some much more interesting stuff coming before too long.

References:

The International HapMap Consortium. 2005. A haplotype map of the human genome. Nature 437:1299-1320. Full text (subscription)

Posted at 22:22 on 10/26/2005 | permanent link

Read other posts in /reviews/genomics


Human genetic variation in a (very large) nutshell

home :: reviews :: genomics

Hinds and colleagues (2005) report in Science on a study that involved determining the genotypes in a sample of 71 people of 1,586,383 single nucleotide polymorphisms (SNPs). The sample is drawn from Americans in three subsamples representing African, Asian, and European ancestry. The goal of the study was to add to knowledge about the frequencies of SNP variation in different medically relevant populations, while assessing the linkage among SNPs. These data would help formulate better strategies for tracing the genetic correlates of disease and other phenotypic traits.

The data were acquired with these medical goals in mind, which limits to some extent their ability to address interesting issues about human evolution. For example, they select known SNPs that were judged to be likely to be high in frequency in multiple populations. This process, called ascertainment, was complicated enough to make it difficult to use the data in models of genetic evolution. For example, a large set of the candidate SNPs were selected from public databases, which are not random representatives of the three subpopulations considered here, making it likely that the three would differ in allele frequencies in ways characteristic of this bias. Because of the ascertainment complexity, it is unlikely that geneticists would be able to use these data to accurately reconstruct ancient evolutionary events (although it may not stop them from trying).

The most interesting part is a brief consideration of the role of natural selection in differentiating populations from each other. As the authors note, one suggestion concerning the distribution of genetic differentiation (as measured by FST) is that different genes have undergone very different patterns of global or local selection. The suggestion from this hypothesis would be that candidate genes to examine local selection could be identified from relatively large FST values. (Such genes would have high FST in any event; the distinction is that if genetic drift were largely responsible for human differentiation, then many non-locally-adapted genes might also have high FST values.) As they put their findings:

If this is true, then larger FST values should be found near functional genetic elements. We looked at the distribution of FST for SNPs that were genic or nongenic, coding or noncoding, and synonymous or nonsynonymous. We performed the analysis within subsets of SNPs grouped by MAF [mean allele frequency], so that effectively, we looked at the fraction of between-population variance for SNPs with the same total genetic variance. Common SNPs in genetic regions do have slightly but significantly higher FST values than nongenic SNPs with the same MAF . . . and common coding SNPs have slightly higher FST values than noncoding SNPs in genic regions. . . . These results are consistent with local selection changing the distribution of FST near functional sequences. However, because the distributions of FST among genic and nongenic SNPs are very similar, large FST values by themselves appear to be very weak evidence of selection (1074).

Of course there is another reason that genic and coding SNPs might not be much more differentiated than the average: if global selection has constrained them to similar frequencies. Given the huge range of genes in the scope of this analysis, it is hard to say which force of selection should be predominant, or if they should be nearly balanced in the way they would appear to be to explain the data. Certainly genes like the MHC genes would be expected to be held at broadly similar frequencies across populations. But then some of those are precisely the genes that should be very different among populations, as a result of different microbial histories. The authors also examined the private (confined to one sample) SNPs to see if they were more likely to be genic, finding that they were not. This is not surprising, since these alleles are by definition rare, and therefore unlikely to underlie strong selected differences between populations. The few that might be locally selected are surely lost in the volume of rare alleles that are either deleterious or subject entirely to drift.

It seems to me that the way to address the FST issue is to examine the distribution of FST estimates for the SNPs. Given the observed sample frequencies of the SNPs and some assumptions about population histories, it should be possible to derive an expected distribution of FST. Comparing that expected distribution to the observed distribution would give some information about whether the genes had been subject to drift alone, or whether they had been significantly perturbed in some way.

The data are publicly available; if you can think of a good use for them, have at it!

References:

Hinds DA, Stuve LL, Nilson GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, and Cox DR. 2005. Whole-genome patterns of common DNA variation in three human populations. Science 307:1072-1079. Science Online

Posted at 00:43 on 02/20/2005 | permanent link

Read other posts in /reviews/genomics


How much are deletions like SNPs?

home :: reviews :: genomics

Hinds et al. (2006) examine the pattern of common deletion polymorphisms in the human genome. These are genetic variants in which the alleles have different lengths, the shorter resulting from the deletion of genetic material that was originally present (and is today found both in some humans and in other primates). They focused on intermediate-length deletions: those lying between a few base pairs and thousands.

Here's some background:

SNPs are the result of errors in DNA replication or repair that occurred once in human history and are shared among individuals by descent. Very small common deletions and insertions, in the range of 1Ð5 bp, show strong linkage disequilibrium with common SNPs, which suggests that, although the mechanisms giving rise to them may differ, these polymorphisms share a similar evolutionary history. It is well documented that diseases classified as genomic disorders, such as DiGeorge or velocardiofacial syndrome, alpha-thalassemia, Williams-Beuren syndrome and Charcot-Marie-Tooth disease type 1A, result from recurring mutations involving large deletions, insertions and other genomic alterations. These recurring mutations are the result of non-allelic homologous recombination events that occur between blocks of duplicated sequences (>95% sequence identity, >10 kb in length, and separated by 50 kb to 10 Mb)21. Here we and other concurrent reports in this issue show that intermediate-length deletion polymorphisms contribute to common genetic variation in healthy individuals. Our report focuses on whether these common deletion polymorphisms are the result of single mutation events such as SNPs or are due to recurring mutational events such as those resulting in genomic disorders (Hinds et al. 2006:82, references redacted).

They find that the intermediate-length deletions (like the short deletions) pretty much behave like SNPs.

The set of common intermediate-length deletions identified here has linkage disequilibrium patterns similar to SNPs, indicating that these polymorphisms share a similar evolutionary history and suggesting that most intermediate-length deletions, like SNPs, arose once in human history. High linkage disequilibrium with nearby SNPs suggests that most of these deletions are effectively assayed by proxy in SNP-based association studies, consistent with previous results for short insertion/deletion polymorphisms. On the basis of the fraction of the genome examined and the technical limits of our study, we estimate there are several thousand intermediate insertion/deletion polymorphisms in the human genome, suggesting that they represent an important component of common genetic variation and are likely to contribute to phenotypic variation in complex traits (ibid:85).

This means that intermediate-length deletions that recur again and again at the same genomic location are probably rare (via evolgen).

References:

Hinds DA, Kloek AP, Jen M, Chen X, Frazer KA. 2006. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat Genet 38:82-85. Full text (subscription)

Posted at 22:21 on 01/08/2006 | permanent link

Read other posts in /reviews/genomics


Patents and human chimeras

home :: reviews :: genomics

This article in the Boston Globe (Feb. 13, 2005) summarizes a recent US patent office ruling on whether an application for a human-animal chimera could be approved. The office rejected the claim, holding that the method specified in the application (creation of an embryo consisting of a mixture of human and animal cells) would result in the creation of a living being too close to a human to be patentable.

The story describes that this is a waypoint in a long legal battle over the patenting of life. The applicant in this case,
Stuart Newman of New York Medical College is a collaborator of biotechnology gadfly
Jeremy Rifkin. This patent application was part of an effort to get the patent office to make a precedent for future applications--they reasoned that if the patent were approved they could forestall all such research for the patent term, while if it were rejected it would block future patent applications for human-animal chimeras.

It's not entirely obvious that this will be the effect, since there are many labs currently working at creating mice with various levels of human tissue included. The most famous is perhaps the
mouse strain with a human immune system, but the article mentions the prospect of mice with brains made entirely of human neurons.

There is no definition of human that would determine how much human tissue or human genetic material would be enough to qualify an genetically engineered organism as human. In a legal sense, there is no real protection against the creation of such organisms, beyond the 13th amendment, which bans slavery, and would presumably preclude the extention of property rights over genetically engineered people. But how human does an organism have to be to qualify for this protection? Nobody knows.

Ironically, the patent office appears to be pretty friendly to Rifkin's position. As the article points out, the Supreme Court forced the issue of patenting organisms in the 1980 case of Diamond v. Chakrabarty
(FindLaw), deciding that any artifically created organism (i.e. not naturally occurring) was eligible for patent protection as a genuine human innovation. Since that time, hundreds of patents have been issued for living organisms, and tens of thousands on genes or gene products.

Now that human cloning has been kicked into high gear in Britain, we can expect that there will be an increase in the potential for mixing genetically engineered sequences into humans, including sequences taken from animals or plants. And there will be increasing attempts to make animals with human genes and cells, as experimental models for human drug and treatment testing as well as for other purposes. We're not quite at the island of Dr. Moreau, but we are separated from it by our motives, not our methods.

Posted at 21:31 on 02/13/2005 | permanent link

Read other posts in /reviews/genomics


Neutrality and selection on gene expression

home :: reviews :: genomics

There is a good case to be made that distinguishing neutrality from selection is now the central problem of molecular evolutionary biology. I don't intend to make the case, but I do want to discuss the problem. It arises for me because of a recent discussion of human-chimpanzee differences in gene expression, in a Science paper by Philipp Khaitovich and collaborators.

Reading this paper and some of its references has made me realize that one of the key aspects of the problem is that evolutionary biologists and molecular biologists often don't speak the same language. Sometimes the two groups may use exactly the same terms to mean different things --- yet, because they are mostly words borrowed from English, the meanings are similar enough to cause immense confusion.

These are my notes on gene expression differences between humans and chimpanzees. My focus is in pointing out things that might confuse, and attempting to determine the importance of the work to the project of uncovering the events and processes of human evolution. In that spirit, this is not a critique in any way, although I do include some critical comments; they are indications of the way this work differs from other kinds of evolutionary biology.

See more ...

Posted at 22:27 on 10/02/2005 | permanent link

Read other posts in /reviews/genomics


Mayr on speciation

home :: reviews :: genomics

OK, that headline looks like the title to a dissertation, which this isn't. But in honor of Mayr's recent death, I was looking through some of the things he has written about hominids, and I came across his book review of Jeffrey Schwartz's book, Sudden Origins. Reading this at once reminded me why Mayr has been such a giant in evolution that he spilled over into anthropology, and saddened me that there are so few representatives of such wisdom left.

Here are some quotes:

[Schwartz] correctly criticizes the strictly linear view of descent held by most anthropologists (p. 43), but by not thinking in terms of populations, Schwartz does not convert hominid history into a dynamic picture of the movement of geographically vicariant populations and subspecies. Such multidimensional thinking, introduced by the founders of the Evolutionary Synthesis, is not yet popular among physical anthropologists (978).
Phenotypic discontinuity does not conflict with Darwinian theory. If, for instance, a phyletic line evolves form the possession of two to the possession of three molars, the change does not occur by mutations giving one tenth, later one fifth, and one half of a new molar, but by one tenth, later one fifth, and then one half of the population having one new molar (978).

And here's rubbing it in:

What is the reason for Schwartz's failure in spite of his extensive reading and his efforts to make use of some of the most recent findings of molecular biology? Perhaps it is due to an insufficient consideration of some of the basic concepts of the synthetic theory. For instance, nowhere does he adequately emphasize that evolution takes place in populations and consists of the replacement of individuals, generation after generation. Furthermore, in numerous discussions of mutation in this volume, it is always implied that the gene (mutation) is the target of selection rather than the phenotype of the individual, and this favors acceptance of a theory of a saltational role of homeobox genes. Nor does Schwartz seem to appreciate that natural selection is a two-step process. Homeobox mutations occur during the first step, the production of variation. The fate of these mutations, after they have become components of new genotypes, however, is decided at the second step, the actual selection. Therefore, no conflict exists between the occurrence of homeobox mutations and the classical Darwinian process (979).

Consider that here, Mayr was in his early 90's. That some of us forget the lessons of the Synthesis is a discredit to us and our teachers, certainly not to the founders. Yet he patiently explains the way that today's developments in genetics should be incorporated into an evolutionary model, using the understanding that he helped the field to develop some sixty years before.

References:

Mayr E. 1999. Sudden origins (book review). BioEssays 21(11):978-979. Wiley InterScience

Posted at 14:35 on 02/19/2005 | permanent link

Read other posts in /reviews/genomics


Gene control by microRNA::evolutionary implications

home :: reviews :: genomics

News story at Nature

References:

@article{Lewis:2005,
  author = {Benjamin P. Lewis and Christopher B. Burge and 
     David P. Bartel}
  year = {2005},
  title = {Conserved seed pairing, often flanked by adenosines, indicates 
    that thousands of human genes are {microRNA} targets},
  journal = {Cell},
  volume = {120},
  pages = {15--20}  }
Article via ScienceDirect

Posted at 12:36 on 01/19/2005 | permanent link

Read other posts in /reviews/genomics


More on positive selection from human-chimp comparisons

home :: reviews :: genomics

Nielsen and colleagues (2005) report on a survey of 13,731 genes in humans and chimpanzees, in an attempt to find genes that show the strongest evidence of positive selection in one or both lineages. This is an exceptionally large proportion of the coding genome; considering that recent estimates peg the total number of genes at around 25,000 or so. From the total number, the analysis reduced its sample to 8079 genes because the rest exhibited relatively little variation.

Many of the genes that present a signature of positive selection tend to be involved in sensory perception or immune defenses. However, the group of genes that show the strongest evidence for positive selection also includes a surprising number of genes involved in tumor suppression and apoptosis, and of genes involved in spermatogenesis. We hypothesize that positive selection in some of these genes may be driven by genomic conflict due to apoptosis during spermatogenesis (Nielsen et al. 2005:1).

This test for positive selection focuses on the ratio of nonsynonymous to synonymous substitutions. This makes it a tricky test, because genes under strong selective constraint will tend to have a very low number of nonsynonymous substitutions between humans and chimpanzees. A very high number of nonsynonymous substitutions (in comparison to the number of synonymous substitutions) is taken as evidence of positive selection. But for this test to give significant evidence of selection, the number of positively selected substitutions must be fairly high. In other words, this test identifies genes that have been subject to repeated adaptive changes, while being under relatively little adaptive constraint.

The paper finds that genes expressed in brain tissue have been relatively constrained by selection during the evolution of humans and chimpanzees. This finding does not really conflict with that of Dorus et al. (2004), who found that nervous system genes have been rapidly evolving in the primate lineage, because the test for positive selection used here is different from the test of evolutionary rate used in the earlier paper. This paper notes that its test for selection is relatively noncontroversial, but may have very little statistical power to detect selection. In simulations, they find that the test is powerful in detecting selection for those genes in which the proportion of nonsynonymous substitutions is higher than synonymous ones, which means genes that are under little or no constraint from purifying selection.

On the topic of immune defenses:

The top 50 genes include many genes that we might a priori expect to be targets of positive selection, including four genes involved in olfaction (OR2W1, OR5I1, OR2B2, and C20orf185) and several genes involved in host-pathogen interactions, such as CMRF35H, CD72 antigen, pre-T-cell antigen receptor alpha (PTCRA), APOBEC3F, and granzyme H (GZMH). Only one of these genes was among the 50 most significant entries in the Clark et al. [10] model 2 analysis. APOBEC3F encodes an antiviral factor that has previously been demonstrated to be under positive selection by Sawyer et al. [3] who note that this gene has been associated with anti-HIV activity.
Presumably, most of these genes have been targeted by positive selection throughout the primate and mammalian phylogeny. The widespread evidence for positive selection in immune-related genes confirms the hypothesis that much positive selection in the human and mammalian genomes may be driven by a coevolutionary arms race between host immune system and pathogens (Nielsen et al. 2005:4).

The most interesting finding is the high rate of positive selection in cancer-associated genes:

While we expected to find genes involved in olfaction, spermatogenesis, and immune defense among the 50 annotated genes showing the strongest evidence for positive selection, we were surprised to find a very large proportion of cancer-related genes, especially genes involved in tumor suppression, apoptosis, and cell cycle control. These genes include four putative tumor suppressors: HYAL3, DFFA, PEPP-2 (note that both HYAL3 and PEPP-2 also appear to be involved in spermatogenesis), and C16orf3, another gene associated with tumor progression (MMP26), and a gene with unknown function but high similarity to melanoma-associated antigens (FLJ32965). In addition, there are several genes involved in apoptosis (PPP1R15A, HSJ001348, TSARG1, and GZMH). Given that many of the genes have very little functional information, it is surprising to find such a large proportion of genes that may be related to tumor development and control. The factors causing positive selection on these genes are unknown, but genes important in tumor development and suppression may be positively selected due to other functional effects of the genes, particularly in immunity and defense or in spermatogenesis. Several of the genes involved in tumor suppression or progression show testis-specific expression, and models of genomic conflict may explain the presence of positive selection in these genes (Nielsen et al. 2005:4-5).

Additionally, the paper examines the human intraspecific variation of the 50 genes with the highest levels of positive selection in the study. In a panel of humans, these genes exhibit an excess of high-frequency nonsynonymous variants, which means that not only have these genes been repeatedly subject to positive selection during human evolution, but many of them may still be under positive selection today, or at least during recent human history. This would not be unexpected, from the rate of positive selection genome-wide.

But it does raise more questions about the interpretation of human variability in terms of ancient demography. In a number of studies, the combination of an excess of rare variants and an excess of high-frequency variants has been interpreted as evidence for a bottleneck in population size. If some or all of the high-frequency variants are the result of continuing positive selection on some genes, then the model of a bottleneck goes out the window.

References:

Dorus, S. et al. 2004. Accelerated evolution of nervous system genes in the origin of Homo sapiens. Cell 119:1027-1040.

Nielsen R, et al.. 2005. A Scan for Positively Selected Genes in the Genomes of Humans and Chimpanzees. PLoS Biol 3:e170. PLoS Online

Posted at 12:00 on 05/07/2005 | permanent link

Read other posts in /reviews/genomics


Will anybody pay for genomes?

home :: reviews :: genomics

MSNBC reports this Reuters story about the new microbead technology for gene sequencing. Because it allows such a decrease in cost (down to $2 million per genome) opposed to earlier sequencing methods, news stories are connecting this with the goal for a $1000 genome test.

I suppose $1000 would be inexpensive enough that anyone with a persuasive medical need could get her genome:

The idea is to produce a technology that could be used to compare one person's genome, for example, to the existing human genome map and find an individual's differences.
"There are needs for personal genomic data already," [method developer George] Church said.
"If you are a cancer patient there are quite a number of therapies which can only be used if you have a specific genetic component."

The live vote on the MSNBC story has 76 percent of respondants saying "yes" to a $1000 genome test, as in "Yes, what a bargain! Just think of all the information that would be revealed."

That frankly surprised me, because I gave my students the same poll last week, with a twist: I asked them how much they would be willing to pay for their full genome sequence.

The results: only two would pay more than the price of a CD, around $16.00. Most didn't want the information at all --- they didn't see what possible use it could have for them.

Now keep in mind, this was after nearly a full semester of lectures on all the interesting things that anthropological genetics can tell you. I'm not turning people off of genetics, at least I hope not. But it does help to have some realism about what kind of useful information your genome can provide.

The fact is, for these college students, a full genome sequence just doesn't have much value right now. There are exceptions, especially if you have a family history of heart disease, cancer, diabetes, or some genetic disorder like cystic fibrosis or Huntington's disease. This story from MedPage Today discusses how home genetic testing for such disorders is already becoming a "booming business."

But most people have nothing at all to gain from knowing their genomes. Right now, we just don't know enough about the genome to direct people to the likely outcomes of their thousands of genes. In the future, lots of interesting things may become possible, but not today.

That wouldn't stop me, of course. I would pay the $1000, easily, just to compare my sequence with all that research I read. I could probably pay for it from my research funds. If you start seeing papers from me in a few years using a sample size of two, you'll know where it comes from.

Most people, of course, are not like me. Even if their genomes could be sequenced cheaply, they wouldn't want the data themselves. How would they assess it? What you really want is for some medical professional to have the data, in a form already compiled and compared to other people. When it arrives, genomics won't be sold like a book; it will be sold like a service, complete with recommendations for what to eat to maximize health.

Probably in the end, there will be computer programs like TurboTax for genomes, where you plug in your information, and they come up with recommendations cheaply. By that time, of course, computers will talk to you out of robot heads, and you can start calling them "Doctor."

There will probably come a time when knowledge of a child's genome at birth will encourage parents to pursue many small interventions as their child grows up. A few more supplements of this mineral might greatly enhance their athletic performance, cutting down on that food might help their scholastics, giving a regular dose of this drug might prevent heart attacks at age 40.

But there will probably come a point in life when this kind of medicine fails. College-age people are never going to care that much what their health at 50 is like. If it can't stop them smoking right now, how in the world is it going to get them to eat Brussels sprouts in the future?

I think most people would rather have a medicine that fixes what's wrong with them, instead of one that will prevent bad things from happening. Preventative medicine works as long as the disease is imminent and the treatment once it happens very unpleasant. That is, after all, the reason for Lipitor, Crestor, and all the other cholesterol-cutters. A drug that could prevent some cancer would probably become hugely popular even if the normal odds of that cancer were relatively low.

At least with people over 35. Good luck marketing it to college students.

Posted at 11:27 on 08/07/2005 | permanent link

Read other posts in /reviews/genomics


First we sequence the orangs and macaques, then gorillas

home :: reviews :: genomics

Perhaps after the announcement of the draft chimpanzee genome, you're feeling the need for another DNA rush. How long will you have to wait for a gorilla sequence? How about an orangutan? Will anybody give me a sifaka? Sifaka, anyone?

A Nature article by reporter Carina Dennis gives some details about the future schedule of genome sequencing in primates and beyond. Here's the marching order:

  1. Rhesus macaque: public draft already available, revised version "expected by the end of the year"
  2. Orangutan: sequencing already underway, draft expected "early next year"
  3. Gorilla: sequencing begins in "October this year", draft assembly expected in "a couple of years"

After that, there are more ideas of what to do. They start to run together like an insane bioprospecting auctioneer:

While some researchers are working on the the youngest shoots of the primate family tree, others are delving at the roots, to understand what the earliest primate genomes were like. To this end molecular palaeontologists are keen to sequence representatives from each of the major primate lineages. The sequencing of the marmoset, a New World monkey, has just begun. "I would also like the lemur sequence," says Asao Fujiyama of the National Institute of Informatics in Tokyo, Japan, who was part of the team that sequenced the first chimpanzee chromosome last year.

I have to say, if your goal is to reconstruct phylogeny, these sequences aren't going to help much more than what we already have. On the other hand, if you want to reconstruct the evolution of these other lineages, you're going to need a whole lot more -- not just one lemur but many, not just marmosets, but other New World monkeys as well.

But the thing is, within the next five to ten years, these kinds of projects become more feasible with off-the-shelf technology. Not entirely so -- the commercialization will use shortcuts like gene chips tuned for human-specific sequences. But substantially, so that the completion of a draft genome assembly for a primate may become feasible for a single research lab on NSF-level money.

Personally, I think it's going to generate more data than anybody will have time to analyze. But people interested in evolution in primates -- from sexual dimorphism and mating systems to social hierarchies and affiliation --- are going to have to know a lot more about genomics, or are going to have to enlist population geneticists in a lot of that work.

And then Zoboomafoo can be a genome model just like Craig Venter.

References:

Dennis C. 2005. Chimp genome: branching out. Nature 437:17-19. Full text (subscription required)

Posted at 11:46 on 09/02/2005 | permanent link

Read other posts in /reviews/genomics


Gene expression and life history choices

home :: reviews :: genomics

Usually it's a bad idea for an anthropologist to start talking about fish biology, but this paper struck me as interesting:

Alternative life histories shape brain gene expression profiles in males of the same population
Nadia Aubin-Horth et al.
Atlantic salmon (Salmo salar) undergo spectacular marine migrations before homing to spawn in natal rivers. However, males that grow fastest early in life can adopt an alternative 'sneaker' tactic by maturing earlier at greatly reduced size without leaving freshwater. While the ultimate evolutionary causes have been well studied, virtually nothing is known about the molecular bases of this developmental plasticity. We investigate the nature and extent of coordinated molecular changes that accompany such a fundamental transformation by comparing the brain transcription profiles of wild mature sneaker males to age-matched immature males (future large anadromous males) and immature females. Of the ca. 3000 genes surveyed, 15% are differentially expressed in the brains of the two male types. These genes are involved in a wide range of processes, including growth, reproduction and neural plasticity. Interestingly, despite the potential for wide variation in gene expression profiles among individuals sampled in nature, consistent patterns of gene expression were found for individuals of the same reproductive tactic. Notably, gene expression patterns in immature males were different both from immature females and sneakers, indicating that delayed maturation and sea migration by immature males, the 'default' life cycle, may actually result from an active inhibition of development into a sneaker.

Two things:

Very interesting that the "normal" behavior appears to be maintained by the suppression of genes that result in the "sneaker" phenotype. It suggests that behavioral variants may arise by new duplications or mutations, spread somewhat as stable strategies, and complete a selective sweep when ways are found to down-regulate them in accordance with environmental conditions where the strategy is maladaptive.

The other thing is the vast scope of expression differences. Here, of course, there are fairly radically different phenotypes -- probably more so than in any mammal. But if expression differences are more potentially malleable than, say, coding sequences, we may find pretty wide differences in humans as well. This study was just about the brain, not other tissues. The brain is a hugely complex system in its own right, and there is perhaps as much evolutionary potential there as the rest of the body combined.

References:

Aubin-Horth N, Landry CR, Letcher BH, Hofmann HA. 2005. Alternative life histories shape brain gene expression profiles in males of the same population. Proc Roy Acad Lond B 272:1655-1662. Abstract

Posted at 15:19 on 08/25/2005 | permanent link

Read other posts in /reviews/genomics


How much selection does it take?

home :: reviews :: genomics

I was involved in a discussion this weekend that I think reveals much about the current state of evolutionary genomics. The forum was the "Neanderthals Revisited" conference at NYU, although I might have had the same discussion almost anywhere. Earlier, I had given my presentation about the importance of selection when considering Neandertal relationships. My point was that ignoring selection leads to a preordained result, since a small amount of selection can have the same effect as very extreme hypotheses of demographic change. I applied to both morphological examples and mitochondrial genetics as examples of the way that small magnitudes of selection can have great influence on the pattern of variation. Needless to say, this line of argument did not go over very well with geneticists who had previously been engaged in mitochondrial research, employing the paradigm that assumes that mtDNA is neutral.

The comment that surprised me quite a bit (and I can say that at least a few others in the crowd were surprised as well) was given the next day by Mark Stoneking, a geneticist at the Max-Planck Institute for Evolutionary Anthropology in Leipzig. Stoneking was commenting on research the Leipzig lab has been doing in the area of regulatory control across the genome. His observation--and this was the surprise--was that regulatory changes in the genome appear consistent with the hypothesis of neutral change over time. In other words, his argument was that neutral change was the predominant mode of genomic evolution leading to living humans. He used this as a point of departure to suggest that a neutral evolution of phenotypes during the course of human evolution was likely productive hypothesis to pursue when considering the pattern of variation in ancient hominids.

I must say that I was fairly stunned by this statement. It really struck me as being inconsistent with my knowledge of genomic variation in humans. So I looked up some of the recent research from the Leipzig lab, and compared it with the papers that I had been aware of that examined the level of positive selection responsible for human evolutionary change.

The promise of evolutionary genomics toward driving information about the pattern of selection leading to human-specific traits lies mostly in the comparison of human genetic sequences with those of other primates. By examining the rate of evolution for different genes and parts of genes, it is possible to address whether the changes responsible for human phenotypic evolution were mainly due to changes in coding regions of genes, regulatory elements, or other regions. And by comparing genes with each other, it is possible to judge whether most of the changes were concentrated at a few important genes, or whether they were more broadly distributed across the genome.

But most important is the need to discover exactly what types of mutations were common in the human lineage, and for that matter in the chimpanzee lineage and other primate lineages as well. For example, a gene that has repeatedly experienced positive selection during the course of human evolution will show a greater difference between humans and chimpanzees in whatever kinds of changes were under selection. This might mean that the gene will exhibit a greater degree of the amino acid substitutions then expected at random from the total number of mutational changes. It might mean that the certain areas of the gene, such as upstream regulatory regions, will exhibit more divergence than others.

Geneticists have developed tests for these different patterns of departure from neutrality. They can test whether the number of amino acid changes is significantly higher than expected given a certain rate of mutations. They can test for equality of rates between different genetic loci. And they can test directly for violation of mutation-drift equilibrium. It is this last test that is violated by human mitochondrial DNA, for example, which leads me along with many other geneticists to believe that the molecule has been under positive selection during human prehistory.

But there are problems in applying tests of neutrality to genome-wide questions of the abundance or frequency of positive selection. Tests of neutrality are notoriously conservative, meaning that it is difficult for genetic data to demonstrate that selection actually took place. One of the reasons why tests of neutrality are so conservative is that the effects of positive selection are actually counterbalanced by other forms of selection on the genome. For example, if the coding sequence of a gene has been under positive selection during evolutionary history, then one expected effect on the distribution of variation is that the number of amino acid changes separating two species will be relatively high, especially compared to the number of amino acid differences noted within each of the species. In other words, one of the species or both of them should have been driven further apart by selection on the amino acid sequence than we would expect from their neutral level of variation. But exactly the opposite effect is expected for purifying selection. As purifying selection consistently eliminates new amino acid mutations, it should leave to species relatively close to each other in their amino acid sequences compared to the level of differences within each of those species. What this means is that if both positive selection and purifying selection have affected the gene over the course of its evolutionary history, the two opposing forces will to some extent cancel each other out. The effect of this cancellation causes an underestimation of the rate of positive selection, because purifying selection is usually continuous and positive selection is much more episodic. In essence, for most genes we may predict that the positive selection never happened, at the same time we are estimating that it purifying selection is substantially weaker than actually was.

This is the point raised by Justin Fay and colleagues (2001). Their paper was principally concerned with estimating the rate of negative, or purifying, selection across the human genome. Their research was motivated by the question of what the typical effects of mutations are--are mutations usually neutral, or they usually deleterious? In setting out to answer this question they realized that the rate of negative selection could not be independently estimated without first considering the effect of positive selection on the same genes. To estimate the rate of positive selection, Fay and colleagues devised a test that depended upon dividing polymorphisms into three subsets. One subset consisted of polymorphisms at low frequency (< 5%), which would include most deleterious alleles, since selection prevents these from increasing in frequency beyond a very rare level. A second subset were "moderate" frequency alleles occuring between 5 and 15 percent, while the third subset included "common" alleles with frequencies between 15 percent and 50 percent (a folded frequency spectrum considers only the frequency of the minor allele). The researchers reasoned that common (high-frequency) mutations were very unlikely to be deleterious, and so the ratio of nonsynonymous to synonymous sites for only these common alleles should not reflect the effects of purifying selection. Using this subset, they tested whether the divergence between species was unusually high compared to the proportion of amino acid polymorphisms within species. They found an excess of 35 percent in amino acid changes between species, compared to the expectation based on within-species polymorphism, "a large proportion, 35%, of amino acid substitutions between humans and old world [sic] monkeys are estimated to have been driven by positive selection" (Fay et al. 2001: 1232).

In a review article in Nature, Sean Carroll (2003) applied this estimate of 35 percent adaptive substitutions to gain an understanding of the number of selected changes during the past 5 to 7 million years. Based on our understanding of a subset of human genes, Carroll extrapolated that approximately 200,000 amino acid changes have occurred on the human lineage during human evolution. If 35 percent of these changes were positively selected, then this number implies that some 70,000 adaptive substitutions happened during human evolution. This is a stunningly large number. It evens out to over two adaptive changes for every one of our 30,000 genes, or one adaptive change per every hundred years. If these were evenly distributed over time, then any time period of 10,000 years during our evolution was likely to have seen 100 adaptive substitutions underway. Nor does the figure of 70,000 include the adaptive changes in regulatory elements and other non-coding portions of the DNA.

An independent line of inquiry has provided support for the idea that human genes have been regularly under positive selection. Vallender and Lahn (2004) review a list of genes that are now believed to have been under repeated positive selection during the course of hominoid evolution. Some of these display strong evidence of selection during the evolution of ancestral anthropoids or hominoids, others apparently have been under selection during the course of human evolution during the past seven million years or less.

The work of the Leipzig lab that Stoneking was referring to is reflected by the paper by Hellmann and colleagues (2003). The innovation of this research is the ability to compare human genomic regions with the same regions in chimpanzees. In contrast to earlier studies like that of Fay and colleagues (2001), who relied mainly upon macaque comparisons, this study gives a very close analogue as a comparison sample for human variation. But unlike the earlier study, Hellmann and colleagues (2003) basically ignore the issue of positive selection on the genome. They do not examine the ratio of amino acid changing variants for different frequency sets, nor do they attempt to quantify the number of adaptive substitutions in humans as compared to chimpanzees. The information in the data that would address positive selection is therefore hidden by the overall pattern of purifying selection against deleterious mutations that the researchers find.

One interesting addition to our information about positive selection is present in Hellmann and colleagues' data, however. They find that the 5' untranslated region of the genes in their study is significantly more divergent between humans and chimpanzees than other parts of the genes. This observation is consistent with the idea that this region has been under positive selection in many of these genes. Presumably the source of this positive selection is adaptive change in regulatory elements upstream of the coding regions of these genes. If this pattern is widespread, it offers an additional large number of episodes of positive selection beyond those necessary to explain the pattern of amino acid changes in humans.

So the data from the Leipzig lab do not contradict the findings of high levels of positive selection on human genes. They do confirm the idea that tests of neutrality will not pick up evidence of positive selection when purifying selection against deleterious mutations has also been acting on the genes. These data do not suggest that genetic drift has been the predominant force affecting human evolutionary change. Instead, purifying selection is shown to be a very strong force affecting the current variation of most genes, while the same estimate of positive selection made by Fay and colleagues (2001), added to the evidence for positive selection on 5' untranslated regions, are applicable to these data.

What are the stakes in this inquiry? That is, why does it matter what the level of positive selection has been in human evolution or the evolution of any other lineage?

One implication of a very high rate of adaptive evolution is that most of the molecular changes affect cellular metabolism, homeostatic processes, and other small-scale molecular features of the cell and organism. This conclusion stems from the idea that there just have not been that many changes in gross structural and anatomical aspects of organisms to require tens of thousands of selected changes. Carroll (2003) goes so far as to suggest that developmental genes may have been relatively conserved compared to the molecular processes that account for this widespread adaptive evolution. Vallender and Lahn (2003:R245) say, "Many other aspects of human biology not necessarily related to the 'branding' of our species, such as host-pathogen interactions, reproduction, dietary adaptation, and physical appearance, have also been the substrate of varying levels of positive selection."

The most important implication of a high rate of selection, at least to me, is that ancient demography is likely not a major cause of human genetic evolution and differentiation. John Gillespie's work has made clear over the past five years that positive selection can explain the pattern of genetic variation in many species. In essence, he explains a low level of polymorphism in most species as the possible result of widespread positive selection and linkage between selected and neutral sites. As large areas of the genome undergo genetic hitchhiking (reductions in variation caused by linkage to positively selected sites), the variation ultimately becomes greatly limited in ways that resemble the effects of genetic drift in a small population. The difference is that with positive selection and linkage, the level of polymorphism that can persist in a population has little if any relation to the size of the population. This theory is called "genetic draft."

If the variation of most human genes is limited by selection across the genome, then it follows that genes cannot be used to estimate ancient population size or other demographic characteristics.

Likewise, if positive selection on human genes is common enough, it appears very likely that some areas of the genome often used for demographic inquiry are themselves direct targets of selection. The most obvious are the mitochondrial DNA and the Y chromosome, both of which are completely linked over their entire lengths. For example, the Y chromosome contains around 80 genes. If these were selected at the rate typical of nuclear genes in the rest of the genome, then we can estimate that the Y chromosome underwent some 200 adaptive substitutions during the past 7 million years, or one in approximately 35,000 years. In this context, it is interesting that the most recent human ancestor of human Y chromosomes appears to have lived within the past hundred thousand years (and possibly less), and that the genetic sequences thus far studies show clear violations of neutrality. This has previously been considered evidence of a large-scale demographic replacement in recent human evolution, but it is better explained as the consequence of positive selection on the Y chromosome. A similar line of argument could apply to the mtDNA, although estimates of the frequency of adaptive changes are more difficult because the mtDNA has a substantially different rate of mutations compared to nuclear DNA.

If positive selection is such a common force in human evolution, then the possibility is clearly opened that many of the genetic differences among human populations are the result of local positive selection. We have tended to examine interpopulation differences under the assumption that genetic drift and migration are the only important parameters. But if selection plays an important role in diversifying populations, then we can expect that the level of genetic differences (for example Fst) between populations has little to do with classical correlates of genetic drift, such as population size. Instead of drift-migration equilibrium, the important factor is migration-selection equilbrium, probably constantly changing in a dynamical sense.

What mysteries remain?

Although it appears that positive selection has been very common, this says little about the distribution of such selection across the genome. Evidence from some genes and sets of genes suggests that they have been subjected to repeated episodes of adaptive evolution during the course of primate evolution. The study by Dorus and colleagues on the adaptive evolution of brain-related genes provides an example of the way that positive selection has been focused on some areas of the genome. Certainly some individual genes have probably undergone dozens of instances of positive selection--probably including many genes involved in host-pathogen interactions.

If positive selection is really so common across the genome, then a greater frequency of selection in the human lineage as opposed to other animal species might explain the level of human genetic variation. But if positive selection has been so common, then why do some animal genes appear to have fairly great variation? For example, why do so many animal species have relatively great mtDNA variation, when the mitochondrial DNA appears to be a very likely target of selective sweeps?

References:

Carroll SB. 2003. Genetics and the making of Homo sapiens. Nature 422:849-857.

Fay JC, Wyckoff GJ, Wu CI. 2001. Positive and negative selection on the human genome. Genetics 158:1227-1254.

Hellmann I, Zollner S, Enard W, Ebersberger I, Nickel B, Paabo S. 2003. Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res 13:831-837.

Vallender EJ, Lahn BT. 2004. Positive selection on the human genome. Hum Mol Genet 13:R245-R254.

Posted at 23:58 on 02/02/2005 | permanent link

Read other posts in /reviews/genomics


A future without men?

home :: reviews :: genomics

H. Allen Orr reviews Brian Sykes' book, Adam's Curse: A Future Without Men in the May 12, 2005 New York Review of Books. This is a great review (with short comparisons to Steve Jones' Y: The Descent of Men and David Bainbridge's The X in Sex: How the X Chromosome Controls Our Lives

From the review:

Sykes's case for the extinction of men hinges on an unusual problem plaguing many genes on the Y chromosome -- they tend to pick up debilitating mutations and to ultimately degenerate into genetic junk. A couple of hundred million years ago or so, the X and Y were a pair of perfectly ordinary chromosomes that each carried a full complement of the same thousand genes. Since then, however, the Y has been slowly degenerating. As a result, while the human X still carries its thousand genes, the Y carries only about a hundred. Sykes believes that the genes that remain on the Y -- including SRY as well as others required for the fertility of men -- will also degenerate. The disastrous consequence, he says, will be the disappearance of fertile males. (Sykes sometimes says that males will become sterile, while at other times he suggests they'll disappear. Genetically, at least, the difference doesn't make a difference: if all males are sterile, they may as well not be there.)
I'm afraid that this is all just silly. ... The critical point is that most of the male fertility genes now residing on the human Y exist only on that chromosome and there's no way that selection will allow their loss.
Sykes's calculation suggests otherwise because it's wrong. He seems to assume that Y chromosomes carrying mutations that partially sterilize men will get passed on to future generations as often as normal, unmutated chromosomes. But they won't -- that's what it means to be partially sterile. This misstep leads Sykes astray. There are simply no sound evolutionary grounds to support his sensational claims of the extinction of men.

In this book, Sykes constructs and defends a fairly extreme model of biological determinism for the Y chromosome, drawing historical and prehistoric human events into the fold of this model. So war, empire, and Genghis Khan himself is drawn into the story. It is good to see Orr skewering this model and its lack of fundamental population genetic logic.

Personally, I can't see the appeal of reading a book entirely about a single chromosome. Not that most chromosomes don't have interesting stories -- hey, why not chromosome 11? -- or that you can't associate human stories with a chromosome. I regularly assign Matt Ridley's Genome in my intro-level course as a quick overview to how genetics relates to human lives, and that book is essentially a series of 24 essays riffing off each of the human chromosomes (X and Y separate). But it seems to me that chromosomes are a pretty poor way to organize human experience.

Posted at 22:48 on 04/25/2005 | permanent link

Read other posts in /reviews/genomics


Human proteins are made of transposons?

home :: reviews :: genomics

A new PNAS paper by Roy Britten gives a partial answer:

This is a report of many distant but significant protein sequence relationships between human proteins and transposable elements (TEs). The libraries of human repeated sequences contain the DNA sequences of many TEs. These were translated in all reading frames, ignoring stop codons, and were used as amino acid sequence probes to search with BLASTP for similar sequences in a library of 25,193 human proteins. The probes show regions of significant amino acid sequence similarity to 1,950 different human genes, with an expectation of <10-3. In comparison with previous REPEATMASKER (Institute for Systems Biology, Seattle) studies, these probes detect many more TE sequences in more human coding sequences with greater length than previous work using DNA sequences. If the criterion is opened, very many matches are found occurring on 4,653 different genes after correction for the number seen with random amino acid sequence probes. The processes that led to these extensive sets of sequence relationships between TEs and coding sequences of human genes have been a major source of variation and novel genes during evolution. This paper lists the number of sequence similarities seen by amino acid sequence comparison, which is surely an underestimate of the actual number of significant relationships. It appears that many of these are the result of past events of duplication of genes or gene regions, rather than a direct result of TE insertion. This report of observable relationships leaves to the future the functional implications as well as the detection of the events of TE insertion.

I find it sort of interesting that sometime during evolution (probably way-back time, considering that these are partial sequence similarities rather than direct insertions) transposable elements may have been plugging modular sequence into genes and having adaptive effects. It gives one reason why some transposable elements may have lasted a long time -- they had adaptive effects once in a great while.

And it reflects back on the modular nature of proteins, if these are functional domains that are similar among many proteins and also might have skipped around the genome once upon a time.

There are a lot of big genome-wide comparative papers to be written like this. The hard part is coming up with evolutionary hypotheses to explain the results.

Posted at 11:09 on 01/28/2006 | permanent link

Read other posts in /reviews/genomics


Molecular mechanisms of change in identical twins

home :: reviews :: genomics

I'm fairly unique; not only do I study genetics, but I also have a pair of identical twins. So I notice their differences, both in how they look and how they act. From a genetic standpoint, these differences are neither unusual nor unexpected: most traits are influenced strongly by the environment, even if they are also influenced by heredity. My twins haven't had exactly the same environment -- nobody has. So they are different, for reasons stemming from their different positions in utero, their slightly different diets, their different histories of illness, and so on.

But just saying that "the environment done it" doesn't really enlighten as to the mechanism underlying these differences. How does the environment alter the phenotype? Identical twins are not interesting merely because they are different: everybody's different. Instead, they are interesting because they show how the same genome can unfold differently with slightly different conditions.

This week, a group of Spanish researchers led by Mario Fraga of the Spanish National Cancer Center have examined some of the molecular processes by which the environment can register alterations in gene expression, thereby influencing the phenotype.

In a description of the research, Rick Weiss of the Washington Post writes:

The new research, led by Mario F. Fraga and Manel Esteller of the Spanish National Cancer Center in Madrid, focused on two biological mechanisms that influence gene activity. In one, called DNA methylation, enzymes inside a cell attach a minuscule molecular decoration to a gene, deactivating that gene. In the other, called histone acetylation, a dormant gene is made active again.
These altered genetic settings can last a lifetime (though they are not passed down to a person's offspring) and can be important if, say, the gene turned off is one that protects against cancer. The extent to which epigenetic changes are preprogrammed from birth or spurred by factors outside the body has been unclear.
In the new work, described in today's issue of Proceedings of the National Academy of Sciences, researchers measured the extent to which twins of various ages, from 3 to 74, differed in the number and variety of genes that had been either turned on or shut down by epigenetic processes. They found that young twins had almost identical epigenetic profiles but that with age their profiles became more and more divergent.

This is not merely the phenotype being partially determined by the environment; it is a trace of one of the mechanisms of that influence. What happens to you can result in your genes being turned on or off -- sometimes for a lifetime. These shifts are not mutations, but controls that cause alterations of gene expression.

And the shifts arise -- at least some of the time -- because of environmental events:

In a finding that scientists said was particularly groundbreaking, the epigenetic profiles of twins who had been raised apart or had especially different life experiences -- including nutritional habits, history of illness, physical activity, and use of tobacco, alcohol and drugs -- differed more than those who had lived together longer or shared similar environments and experiences.

So far, the articles I've seen have focused on the "epigenetic" aspect. It is just a fancy way of saying that the expression of genes can be altered -- they can be turned on or off -- by events that affect the cell. The cumulation of such small effects across the body as a whole can lead to relatively big changes. Here, people have focused on the increase in cancer or other disease risk that can result from certain epigenetic changes. But more broadly, these changes are ways that conditional adaptations to different environments are implemented. If it is sometimes advantageous to produce a given protein, and sometimes better not to produce it; then a process that could control the production in light of circumstances would be very useful. Methylation and acetylation are two such processes.

For me, I will now know what to tell my twins when they ask why they're different even if they're identical. You're methylated, and she's acetylated. Hmmm... "Methyl" and "Acetyl" would be quite the twin names...

Posted at 01:23 on 07/06/2005 | permanent link

Read other posts in /reviews/genomics


Different recombination hotspots in humans and chimpanzees

home :: reviews :: genomics

Winckler et al. (2005) (Science online) surveyed sequence data from humans and chimpanzees to examine whether recombination was happening at similar rates in both species. They found that even though the human and chimpanzee sequences were 99 percent identical, recombination hotspots were highly different, and rarely occurred in the same places.

At present it is not known what molecular factors result in recombination at particular genomic locations, so it is unclear what accounts for the difference between humans and chimpanzees in hotspot locations. For this reason, the authors interpret their findings in terms of several possible hypotheses:

The lack of correlation in recombination patterns between humans and chimpanzees demonstrates that fine-scale recombination rates evolve rapidly, to an extent disproportionate to the change in nucleotide sequence. Rapid evolution of hotspots has previously been hypothesized on the basis of examples of meiotic drive at hotspots and the mechanism of DSB repair (9, 12). Our observations argue against models in which hotspots are directed solely by short, neutrally evolving DNA motifs, which would almost always be identical between the two species. Epigenetic factors, which are known to play a role in recombination hotspots (7), may vary more substantially across closely related species than does DNA sequence. Alternatively, if the trans-acting molecular machinery that initiates crossover events has nucleotide site preferences, then it is possible that substitutions in these components could dramatically alter site preference across the genome. Although DNA sequence is typically shared across human and chimpanzee, the polymorphisms in each species are not (26). It is intriguing to speculate that polymorphisms could themselves play a role in shaping fine-scale recombination; this could also explain why different alleles of a given locus can have substantially different recombination rates (9). Finally, we note that if recombination rates evolve rapidly, then in some cases, rates from "historical" polymorphism data might truly differ from contemporaneous rates in sperm (Winckler et al. 2005:110).

To me, the research raises an interesting question: if humans and chimpanzees are so divergent in recombination parameters, shouldn't we expect humans to be fairly different from each other also? On average, human alleles are about a tenth as different from each other in sequence as human alleles are from chimpanzee alleles. If the rate of change between humans and chimpanzees has been high, then human polymorphism should include a substantial recombinational component -- perhaps more significant in magnitude than conventional sequence polymorphism. As the study puts it:

By applying these analytical methods to genome-wide polymorphism surveys, an extensive collection of recombination hotspots will soon be available across the human genome. Studying these hotspots should ultimately illuminate the as yet mysterious factors that direct the location and frequency of recombination in our species (Winckler et al. 2005:110).

I wonder whether these results will ultimately affect our interpretation of diversity within and outside of Africa -- especially in light of the suggestion that human populations within Africa have undergone adaptation to several fairly distinct local environments. If there are recombinational differences that may act as either impediments or facilitators to selection on particular genomic regions, that might influence the dispersal of adaptive genes (or genetic elements). Likewise, although microsatellites are not directly related to mutational hotspots, there are substantial differences between humans and chimpanzees in terms of variable microsatellite loci. In both cases, human variability may ultimately be the result not only of the factors affecting human populations globally, but also the evolution of the systems themselves in terms of some loci becoming more mutationally active or less active in some populations over time. It is an interesting genomic world out there, that we are just beginning to understand.

References:

Winckler W, Myers SR, Richter DJ, Onofrio RC, McDonald GJ, Bontrop DE, McVean GAT, Gabriel SB, Reich D, Donnelly P, Altshuler D. 2005. Comparison of Fine-Scale Recombination Rates in Humans and Chimpanzees. Science 308:107-111.

Posted at 10:54 on 04/19/2005 | permanent link

Read other posts in /reviews/genomics


Manfred's amazing technicolor dreamcoat

home :: reviews :: genomics

The story about blonde mammoths has been making the rounds, based on this paper by Holger Römpler and colleagues in Science. The abstract:

By amplifying the melanocortin type 1 receptor from the woolly mammoth, we can report the complete nucleotide sequence of a nuclear-encoded gene from an extinct species. We found two alleles and show that one allele produces a functional protein whereas the other one encodes a protein with strongly reduced activity. This finding suggests that mammoths may have been polymorphic in coat color, with both dark- and light-haired individuals co-occurring.

Missing from the story is this: it's the first insight about phenotypic variation that anyone has derived from ancient DNA.

It's a very short paper; the essential elements are that they found the polymorphism by trying to reconstruct the Mc1r sequence in an individual that turned out to be a heterozygote for three amino acid-coding sites. They then sequenced the polymorphisms in "additional specimens" including some homozygotes, which allowed them to work out the linkage among the polymorphisms. Sequencing of additional individuals helped to rule out the possibility that the candidate polymorphisms resulted from idiosyncratic DNA damage and not genuine biological variation. One of the three polymorphisms is frequently variable among other mammal species; two of them are highly conserved -- pointing to a possible important functional difference. They then constructed a model allele including these polymorphisms to see how it would be expressed in a cell culture; they found that one of the mammoth alleles showed only partial activity. From this they inferred that the polymorphism probably had phenotypic effects on pigmentation.

Of course it's a bit of a stretch from this conclusion to blonde hair; but they are helped in this by a little-known fact: frozen mammoths show variation in their fur coloration! So the genetic observations help to explain an already-known phenotypic variation.

Would they have had the confidence to make this argument if such pigment variations were not already known? Hard to say, but one consequence of this paper has to be an increase in the confidence in such conclusions for future studies, where phenotypes aren't already known.

I would say that the most important aspect of the paper is that it shows the importance of having sequences from multiple individuals. In this study, the sequencing of other individuals helped substantiate that the polymorphisms were not DNA damage. Since these sequences are broken into short pieces, reconstructing the alleles from a heterozygote is difficult if not impossible. So finding homozygotes among these individuals was necessary to reconstruct the two alleles, which gave the opportunity to assess their functional differences. And although this study didn't include a discussion of the origin of this polymorphism, reconstructing haplotypes is essential to that question also, another enterprise that is not possible without samples form multiple individuals.

So there is only a limited amount of information that can be obtained from a single individual, especially under circumstances where the DNA is fragmented or possibly damaged. Sampling multiple individuals helps these questions a lot -- even if it may introduce problems that haven't yet been considered.

And of course, from this work (undertaken in part at the Max Planck Institute for Evolutionary Anthropology) you can see some of the strategies that will ultimately be applied to the Neandertal genome. Mc1r is sort of an obvious candidate to look at, and they are certainly going to be trying to find FoxP2 polymorphisms also. But I anticipate that the story will be much deeper -- we may spend much more time figuring out the Neandertal genome than the 150 years since the Feldhofer cave discovery.

References:

Römpler H, and 8 others. 2006. Nuclear gene indicates coat-color polymorphism in mammoths. Science 313:62. DOI link

Posted at 11:24 on 07/07/2006 | permanent link

Read other posts in /reviews/genomics/ancient


Gene expression in the mouse brain

home :: reviews :: genomics

Last week's Nature included a report on the final draft of the gene expression atlas of the mouse brain. I wish this had come out last semester -- I could have devoted four lectures to it.

There is some very interesting stuff in this if you read past the jargon. I wrote about the brain atlas last year; this paper is a report on the project's completion.

One main system-level finding is that a surprisingly large proportion of the genome is expressed in brain -- around 80 percent of all identified genes. Most of these are expressed in only a relatively small subset of genes ("70.5% of genes are expressed in less than 20% of total cells", not together in the same 20%). In other words, the distribution of gene expression is very broad, with a long tail.

My interpretation is that the large number of genes suggests a substantial role for pleiotropy between other tissues and brain. The representation of most of these genes in only a small proportion of the brain suggests that mammalian brain evolution has diversified neural function by utilizing genes already existing in the genome for small, specialized purposes in the brain.

Another system-level finding supports this notion, by noting that "large numbers of genes [have] restriced expression in specific cell populations." The specificity of gene expression in certain brain areas must underlie functional differentiation of brain regions.

More interesting, the gene expression patterns help to identify distinct regions that are not necessarily recognizable anatomically. For example, this section describes some of the gene expression organization of the cerebellum:

The basic structure of the cerebellum is well known and consists of several functionally discrete gross divisions. Additionally, the cerebellar cortex exhibits a bilaterally symmetric series of sagittally oriented bands31 mirrored by a number of genes, most notably zebrin32, 33. Strong (although not complete) correlation between patterns of cerebellar afferent segregation and zebrin expression34 indicate that molecular markers can delineate functionally discrete regions in the cerebellum.
A number of genes display heterogeneity within cerebellar granule and Purkinje cell populations. For example, Rasgrf1 defines a previously unrecognized large, contiguous domain with sharp boundaries in the granule cell layer of the rostral (Fig. 8a) and dorsal cerebellum (Fig. 8b). More complex regional patterns are observed in the Purkinje cell layer, such as that of Opn3, a non-canonical opsin (Fig. 8c, d) whose expression in the cerebellum has been described as a rostro-caudal gradient with radial stripes35. Rather than a gradient, three-dimensional reconstruction of ISH data for Opn3 reveals a more coherent pattern (Fig. 8e) involving a sharply delineated diagonal band lacking expression extending across the entire cerebellum. The overall pattern of Opn3 is both complex and discrete, with regionalized expression in distinct lobules and sagittal banding in the posterior vermis.

Seeing these "hidden boundaries" pop out in the gene expression data is sort of like on CSI where they shine the UV light on a crime scene. Suddenly lots of things are apparent that weren't visible to the naked (or microscope-aided) eye.

On the whole, the paper concludes that gene expression correlations (different genes expressed in similar patterns) tend to occur for structures around the scale of functional nuclei within larger brain regions, and gross brain regions themselves are represented by "single tight clusters" of genes. In this context, the paper discusses the differences in gene expression between different subregions of the hippocampus. Interestingly, a high proportion of genes involved in cell adhesion are among those expressed in hippocampus:

The top over-represented functional category within the regional gene set is cell adhesion (P < 1.79-10). Differential cell adhesion may be important for establishment and maintenance of topographic connectivity, or, as described recently, different forms of synaptic plasticity and remodelling27.

An interesting part of the paper to me is its consideration of gene expression within individual neurons.

Subcellular mRNA targeting
Subcellular localization and translation of mRNA transcripts in dendrites is increasingly recognized as a widespread phenomenon36, and is thought to be involved in certain forms of synaptic plasticity37, 38. Dendritic mRNA targeting is particularly obvious in the hippocampus (Fig. 9a-f) and cerebellum (Fig. 9g, h), where clear distinctions can be made between cell-dense layers, dendritic molecular layers and white matter. Targeting throughout the entire dendritic field is exemplified by the well-characterized patterns of Camk2a and Dnd1 in the hippocampus (Fig. 9b, c)39. Although often subtle, this distribution is independent of expression level, thereby distinguishing targeting from passive diffusion of mRNA into dendrites. For example, labelling for the highly expressed gene Nptx1 is confined to the soma of CA3 pyramidal cells (Fig. 9d), whereas microtubule associated proteins 1A (Mtap1a) and 2 (Mtap2) also label proximal or proximal and distal dendrites (Fig. 9e, f). Similar forms of targeting are seen in cerebellar Purkinje cells (Fig. 9h, i), as well as in oligodendrocytes (Mbp; Fig. 9k)40.
Many genes exhibiting dendritic targeting are involved in cytoskeletal organization and biogenesis, as well as in regulating synaptic plasticity. There appear to be multiple cis-acting sequence elements that mediate mRNA targeting41. Identification of sets of dendritically targeted mRNAs with shared features (Supplementary Table 4), such as regional specificity and distribution in dendrites, may aid in the identification of conserved sequence elements that correlate with cellular and intracellular transport specificity.

So subcellular function in neurons is identifiable by looking at gene expression in different microanatomical regions. That's pretty cool. It goes to show that the neurons are the most specialized adaptive part of this whole party. The neuron is a finely tuned machine. It remains to be seen to what extent neurons are different machines in different brain regions -- exactly how expression differences may influence intracellular operation. But it is tempting to speculate that the formation of different regions of the brain was a relatively simple operation compared to the

Now this atlas represents expression in the mouse brain, so not everything will apply into primates or humans. I would guess that additional functions have arisen during primate evolution by the proliferation of these nucleus-level structures, and some relative expansion of pre-existing functional regions. That kind of evolution would probably involve increasing the representation of the genome in brain expression, by using additional genes in brain development, by duplication and later functional differentiation of some brain-expressed genes, and by an increase in alternative splicing.

References:

Lein ES and many, many others. 2007. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445:168-176. doi:10.1038/nature05453

Posted at 22:22 on 01/19/2007 | permanent link

Read other posts in /reviews/genomics/brain


HIV genetics by the genome

home :: reviews :: genomics

A new whole-genome association study has found more genetic variants protective against HIV. The course of HIV infection is variable, even in the absence of medication, and it has been known for some time that some of the variation in disease progress is attributable to genetic variation among people. One gene variant (CCR5Δ32) is strongly protective against HIV-1; this is because the virus exploits the CCR5 chemokine receptor to infect T cells, and homozygotes for the Δ32 allele do not have this vulnerability.

The new research looked through the entire genome to find single nucleotide polymorphisms (SNPs) associated with variant disease phenotypes:

Understanding why some people establish and maintain effective control of HIV-1 and others do is a priority in the effort to develop new treatments for HIV/AIDS. Using a whole-genome association strategy we identified polymorphisms that explain nearly 15% of the variation among individuals in viral load during the asymptomatic set point period of infection. One of these is found within an endogenous retroviral element and is associated with major histocompatibility allele HLA-B*5701, while a second is located near the HLA-C gene. An additional analysis of the time to HIV disease progression implicated a third locus encoding a RNA polymerase subunit. These findings emphasize the importance of studying human genetic variation as a guide to combating infectious agents.

From a very large study population of infected patients, the authors were able to identify a subset for whom recurrent measurements of viral load and other essential data were available. This allowed them to find genes that associate with the temporal progression of the disease, not just its presence or absence. An article on ScienceNOW by Jon Cohen describes the setup:

The team studied 486 patients infected with HIV who had not received treatment and had known dates of infection and accurate set points. Then they checked blood samples against half a million known variations in DNA sequences, or single-nucleotide polymorphisms, which recently were identified by the International HapMap Project that looked for differences in the genomes of people from many populations. "We've approached this as a straight, quantitative genetic problem," explains David Goldstein, a geneticist at Duke University in Durham, North Carolina, who led the study. The researchers say this is the first study to ever do such a genome-wide association analysis for an infectious disease.

The study identifies a number of other candidates besides the three significant ones that receive most of the discussion. It's tricky to test for significance in genome-wide surveys because the genome is so large and there are potentially many genes with small effects on disease phenotype. Still, genes with small effect (unless rare and highly protective) are not particularly good candidates for therapeutic treatments, so the major ones are the main story.

References:

Fellay J and 26 others. A whole-genome association study of major determinants for host control of HIV-1. Science (online early) doi:10.1126/science.1143767

Posted at 18:11 on 07/22/2007 | permanent link

Read other posts in /reviews/genomics/disease


The dawn chumans

home :: reviews :: genomics

Try to wrap your mind around this one:

Humans and chimps diverged from a single ancestral population through a complex process that took 4 million years, according to a new study comparing DNA from the two species.
...
The researchers hypothesize that an ancestral ape species split into two isolated populations about 10 million years ago, then got back together after a few thousand millennia. At that time the two groups, though somewhat genetically different, would have mated to form a third, hybrid population. That population could have interbred with one or both of its parent populations. Then, at some point after 6.3 million years ago, two distinct lines arose.

That's the writeup of a new paper by Nick Patterson and colleagues in Nature. It's an advance publication, so I'm not sure it's widely available, but here's the first paragraph:

The genetic divergence time between two species varies substantially across the genome, conveying important information about the timing and process of speciation. Here we develop a framework for studying this variation and apply it to about 20 million base pairs of aligned sequence from humans, chimpanzees, gorillas and more distantly related primates. Human-chimpanzee genetic divergence varies from less than 84% to more than 147% of the average, a range of more than 4 million years. Our analysis also shows that human-chimpanzee speciation occurred less than 6.3 million years ago and probably more recently, conflicting with some interpretations of ancient fossils. Most strikingly, chromosome X shows an extremely young genetic divergence time, close to the genome minimum along nearly its entire length. These unexpected features would be explained if the human and chimpanzee lineages initially diverged, then later exchanged genes before separating permanently.

I've read the paper, and I have to say it doesn't deliver on its promises. It fails to cite previous work on the topic, it discards without explanation the hypothesis supported by most previous studies, and it promotes a "provocative" hypothesis for which there is no good evidence. It doesn't even show that the speciation of humans and chimpanzees was "complex".

It's just a mess.

Some reactions:

1. It is a breakthrough Nature has published a figure titled, "Genetic relationships differ from species relationships"! This is akin to pulling out of the dark ages.

2. The dark ages end by baby steps at Nature, at least in terms of failure to demand citation of relevant literature. Consider the two most relevant recent studies of chimp-human divergence times: Wildman et al. (2004) and Yang (2002). Both these studies estimated human-chimpanzee divergence times on the order of 5 to 7 million years. Both studies use a large number of loci in their assessment. Wildman et al. (2004) consider the effects of natural selection on their results. Yang (2002) concludes that the variance among human-chimpanzee genetic divergences for different loci is relatively slight, leading to the conclusion that the effective population size of the human-chimpanzee ancestral population was probably small (on the order of 10,000 effective individuals). The Yang (2002) result is of interest, because it reflects low variance in human-chimpanzee genetic divergence, the opposite of that found in the current study.

Several other studies might have been cited, such as Satta et al. (2004), or the granddaddy of all these kinds of studies, the original Theoretical Population Biology paper by Takahata, Satta, and Klein (1995), which lays out the theory and relation of genetic divergence to pre-speciation effective population size. This original paper by Takahata, Satta and Klein was the first to find a high degree of variance in genetic divergence times between humans and chimpanzees, and its conclusion was that the genetic divergences reflect a large effective population size in the pre-speciation human-chimpanzee ancestor.

Usually I would think it excessively picky to point out papers that should have been cited. And I should note that I don't know who is to blame -- Nature has a limit on the number of references they will accept, so it may be a length constraint. But the thing is that these papers are the fundamental and most recent work in the area. And they are based on large datasets -- not as large as here, but not tiny as the authors imply. And their results are either very much like the result here, or contradict it in ways that should have been explained. There is a long, long literature on human-chimpanzee divergence and the variance of genetic divergences in light of population history. You wouldn't know it from reading this.

3. The paper does cite Wall (2003), a study that concludes that the anc