When duplicate genes diverge

5 minute read

One of the most important mechanisms of genetic evolution is gene duplication. There are a few well-known gene families, such as the globin gene family, whose several members have diverged over hundreds of millions of years from a single ancestral gene. Each globin gene is the product of one or more duplications.

Googling through some papers, this line got my attention:

The divergence of gene expression between human duplicate genes is rapid, probably faster than that between yeast duplicates in terms of generations.

That's from Makova and Li (2003). You have to admit, it's attention-getting. Human gene expression differentiating faster than yeast?

It's about genes that have duplicated during recent evolution. When one gene becomes two, the tendency is to think the newly arrived copy will be neutral. But things are not simple -- two copies of a gene complete with associated regulatory sequences may end up making twice as much of the gene product. Occasionally that may be a good thing, but often it will be bad. So selection affects gene duplicates the same way it affects anything else.

But duplicate copies of genes present an interesting possibility: they may come to be regulated differently. This can happen if both copies retain their regulatory machinery, but some part of the regulatory chain of one copy is changed by mutation. Or it can happen if the gene duplication does not duplicate all the regulatory sequence -- for example, if it is located some considerable distance upstream from the original copy.

Makova and Li (2003) examined one kind of change in gene expression -- when different copies start to be expressed in different tissues. This kind of change is fairly easy to understand in terms of regulatory sensitivity. Different tissues express different regulatory proteins and RNAs, and regulatory sequences of genes are more or less sensitive to these different parts of the regulatory machinery. Sequence rearrangements may displace the coding sequence farther from inhibitor sites, or may decrease the chance of methylation, or may place the new copy next to a highly transcribed region. Sequence changes can remove an enhancer site, or make a transcript more susceptible to RNA interference, or any number of other changes. There are many, many possibilities for regulatory divergence.

The timescale of expression divergence studied by Makova and Li (2003) is fairly long:

We found that a large proportion of human duplicate genes have diverged rapidly in their spatial expression. Assuming that the average synonymous rate in higher primates is 1.5 x 10-9 nucleotide substitutions per site per year (Yi et al. 2002), 75.5% of human paralogs diverge in their expression in at least one tissue after only 25 Myr (KS = 0.068).

Clearly, "rapid" is a relative term. Here, we are looking at functional divergence of duplicate genes over the time occupied by the divergence of the hominoids from early Miocene apes to the present. Some proportion of these changes have occurred during the past few million years of human evolution, and may be among the genetic changes that led to the evolution of human-specific characters.

However, the broadest functional category represented by genes that differentiated functions after duplication was immune response:

It is interesting to look into the functions of duplicate genes that show rapid divergence in expression. Thus, we investigated the functions of the duplicate gene pairs with KS < 0.3 and with diverged expression (as presence or absence of expression in a tissue) in at least 50% of the tissues studied (we considered only the tissues in which at least one gene of a pair is expressed). There were 38 such gene pairs (Table 1). Also, we examined duplicate gene pairs with KS < 0.3 and a correlation coefficient of gene expression (R) < 0.5. There were 18 gene pairs in this group (Table 1). Interestingly, most of the gene pairs in these two groups overlapped. Thus, the results from the two measures concur. The functions of these genes were retrieved from LocusLink (http://www.ncbi.nlm.nih.gov/LocusLink/) manually. The gene pairs in these two groups encode enzymes (oxidoreductases, hydrolases, transferases, and an isomerase), proteins of the immune system (e.g., lymphocyte antigen, cytokine gro-beta, MHC proteins, and immunoglobulins), transcription factors, structural proteins (e.g., amelogenin, keratin, and skeletal muscle protein), and receptors (Table 1). To determine whether any of the functions were overrepresented among genes with rapid divergence in expression, we compared their functions with the functions of the other duplicate genes using the Gene Ontology database (Camon et al. 2003). There was indeed a significantly higher proportion of immune response genes among gene pairs with rapid divergence in expression compared with other gene pairs in our study (P < 0.009 for gene pairs with KS < 0.5 and diverged expression in at least 50% of studied tissues; P < 0.001 for gene pairs with KS < 0.5 and R < 0.5).

They also found that two-thirds of the duplicate genes that didn't diverge in expression were genes that are normally expressed in nearly all tissues -- in other words "ubiquitously expressed" genes. This finding has been confirmed by later work (for example, Yang, Su and Li 2005; Liao and Zhang 2006). Liao and Zhang (2006) showed that the degree of evolution in gene expression profile is negatively associated with gene expression level -- that is, more highly expressed genes evolve more slowly. This is paralleled by the observation that sequence evolution is slower for more highly expressed genes. Yang et al. (2005) showed that narrowly expressed genes (those expressed in few tissues) evolve faster than those expressed more broadly.

All these observations tend to support the idea that pleiotropic constraints are important limits to adaptive evolution. Gene duplication followed by functional divergence is one of the main ways that genetic correlations among different phenotypic characters can be decoupled. If a gene duplication can allow a single gene with two phenotypic effects to become two genes each with one phenotypic effect, then it can erase pleiotropic constraints on evolutionary change.

It's a way of opening new pathways to the evolution of complex phenotypes.

About the yeast thing: Naturally, the possibilities for regulatory divergence are higher in vertebrates than in yeast. Vertebrates have substantial differentiation of tissue types, complex developmental programs, and differential gene expression. So it's no real surprise that vertebrates should have seen more rapid regulatory evolution than yeast in this respect. But it does show that this kind of duplication and subsequent functional differentiation is a potentially rapid pattern of evolutionary change compared to other patterns of change.


Liao B-Y, Zhang J. 2006. Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolution. Mol Biol Evol 23:1119-1128. doi:10.1093/molbev/msj119

Makova KD, Li W-H. 2003. Divergence in the spatial pattern of gene expression between human duplicate genes. Genome Res 13:1638-1645. doi:10.1101/gr.1133803

Yang J, Su AI, Li W-H. 2005. Gene expression evolves faster in narrowly than in broadly expressed mammalian genes. Mol Biol Evol 22:2113-2118. doi:10.1093/molbev/msi206