Domain of the Duff Man

People are trolling through the human genome looking for lots of things now. It's sort of like explorers trolling the coast of the Americas looking for game and fresh water. You notice the landmarks -- the tallest mountain, the biggest river -- and you annotate them on your map. That's what informatics is doing to the human genome now: annotating the high points.

So, this new paper by Magdalena Popesco and colleagues (2006) is a pretty straightforward description of this kind of hunt. The team went looking for genes with copy number differences between humans and apes. In other words, there might be six duplicates of a given gene in the chimpanzee genome, but ten in the human genome -- an increase of four copies. These additional copies originated by some kind of duplication.

It is a natural next step to look at the genes with the most duplications in humans, and figure out what they do. That is the story of this paper:

Extreme gene duplication is a major source of evolutionary novelty. A genome-wide survey of gene copy number variation among human and great ape lineages revealed that the most striking human lineagespecific amplification was due to an unknown gene, MGC8902, which is predicted to encode multiple copies of a protein domain of unknown function (DUF1220). Sequences encoding these domains are virtually all primate-specific, show signs of positive selection, and are increasingly amplified generally as a function of a species' evolutionary proximity to humans, where the greatest number of copies (212) is found. DUF1220 domains are highly expressed in brain regions associated with higher cognitive function, and in brain show neuron-specific expression preferentially in cell bodies and dendrites.

Like those landmarks, around this basic result spring a lot of unknowns. There seem to be many proteins that include DUF1220 domains. Because it occurs in multiple copies in some genes, and across many different genes, the DUF1220 sequence qualifies as a repeat. And it would seem to work as a modular unit at least sometimes, since it can be repeated multiple times within a single gene (including MGC8902).

It has been estimated that 34 different human genes encode DUF1220 domains (table S3) (www.ncbi.nlm.nih.gov/IEB/Research/Acembly). Pfam (Version 17.0) (9) predicts that 60 human DUF1220-containing proteins exist, containing a total of 271 DUF1220 domains (fig. S1) derived from 11 seed domains (10) (fig. S2A). Estimates based on cDNA sequences indicated that 22 genes exist, including six pseudogenes (8). None of these cDNAs showed perfect identity to human genomic sequences, raising the possibility that this count is an underestimate. Recent additional sequencing of chromosome 1 identified at least 15 gene sequences that encode DUF1220 domains, although several sequence gaps still remain in DUF1220-encoding regions (11).

So there are two stories here. One is about the multiple duplication of MGC8902 on the human lineage. The draft human genome has 49 copies of it, chimpanzees have only 10.

A seperate story is about the proliferation of this DUF1220 domain, which occurs in many proteins. This domain increased in copy number on the human lineage compared to chimpanzees, the African ape lineage compared to orangutans, and primates compared to other mammals.

To me the DUF1220 story is the fascinating part. Not only one gene, but apparently many genes that contain this domain have been proliferating; additionally some genes apparently have acquired this domain during human evolution. In at least one gene, the DUF1220 domain shows evidence of positive selection, but the rest of the coding sequence doesn't.

Popesco and colleagues designed an antibody to label DUF1220 in expressed proteins, and went looking through tissue samples to find it. They found it expressed in lots of different tissue types, including several brain areas.

Then they looked closer at brains, discovering that DUF1220 lights up the neurons but not other cells.

Nobody knows what it does. It's just a high point on a map. But what a high point:

In light of the strong DUF1220 expression we observed in neurons of the neocortex, it is intriguing that multiple independent evolutionary processes [brain enlargement, neocortex expansion (16), gene duplication, and domain amplification] can be seen as having individually and cumulatively contributed to increasing the DUF1220-coding potential of the human brain, suggesting that such an increase may have conferred strong selective advantages.

What the heck does it do?

References:

Popesco MC, MacLaren EJ, Hopkins J, Dumas L, Cox M, Meltesen L, McGavran L, Wyckoff GJ, Sikela JM. 2006. Human lineage-specific amplification, selection, and neuronal expression of DUF1220 domains. Science 313:1304-1307. DOI link