You and the fugu

Last year, there was quite a bit of attention to so-called "ultra-conserved elements". These are relatively short stretches of non-coding DNA, which are shared by distantly related species with few or no substitutions. The basic principle of the ultra-conserved element is that you can take a long alignment of human and mouse genomes, and look for places where almost no mutations divide the two species.

Conservation of amino acid-coding regions has been observed for nearly as long as DNA sequence data have been available. Comparing the DNA sequences of two individuals is not so difficult, if you know what small section of the genome you're looking for. For the most part, early sequence data were collected from protein-coding genes, the parts that were known to do something.

The rest of the genome came to be called "junk" DNA. Nobody could really figure out what it was for, or whether it did anything. There certainly was a lot of it -- the vast majority of the genome is noncoding, and there are pretty long stretches even within protein-coding genes that don't themselves code for protein sequence.

Still, all this noncoding DNA has been evolving. Between humans and chimpanzees alone -- a relatively short time in terms of evolution -- there have been around 38 million DNA substitutions separating the two lineages. Out of that number, only around 40,000 changed an amino acid in a protein (Chimpanzee Genome Consortium 2005). In other words, only around 1 percent of human genetic evolution has involved changes in protein sequences -- and that doesn't count changes such as gene duplications, inversions, or other large-scale rearrangements.

One reason why there have been relatively few amino acid changes in our evolution is that many such changes tend to break things. A change in a protein can change its level of activity, the way it is folded into a functional structure, and its ability to interact with other molecules. Too much change usually results in worse performance in some way, and bad changes don't stay around very long. So protein-coding sequences tend to be conserved; they don't change as fast as the rest of the genome.

Generally speaking, things that stay the same for a long, long time must have been maintained by selection -- otherwise, a substantial number of chance mutations would have become fixed in one or both lineges. Looking at protein-coding genes, there is usually a difference between the number of amino-acid-coding changes (called nonsynonymous substitutions) and changes that do not change an amino acid in the final gene product (called synonymous substitutions).

We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes.

References:

Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, et al. (2005) Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development. PLoS Biol 3(1): e7 doi:10.1371/journal.pbio.0030007