Constraints on the protein universe

I don't write too much about the grand scheme of life, but the recent paper by Choi and Kim in PNAS is worth comment. They applied informatics to a universal protein structure database, hoping to find out what kinds of convergent constraints may have applied to the evolution of proteins in different species.

They found that the protein structures are constrained to four relatively small areas of the possible space. The number of proteins in each of over 7000 families were distributed as a power law, and the sizes of new protein families (i.e., the amino acid length) has been increasing over time (time being assessed by the MRCA of the species in which the protein homologs are now found).

And interestingly, proteins that are phylogenetically the oldest tend not only to be shorter, but also to have a structural arrangement with alternating alpha coils and beta strands; younger proteins are larger and tend to have a preponderance of all-alpha or all-beta structures.

The authors suggest that complexity, in terms of length, may be competing with stability on an evolutionary timescale. And then there is this suggestion at the end:

What is the implication of Fig. 3 that reveals three evolutionary stages where the relative abundance of the four major protein structure classes changed their relative ranking? One possible implication is that there were three evolutionary periods when the Earth environment changed dramatically.

There are no dates for these possible shifts, nor is it clear to me how much this depends on the species in the database (with possible enrichment for vertebrates and various bacterial and archaean genomes, but leaving a lot out in terms of diversity). But to the extent that protein stability may be a constraint on the evolution of novel features, this kind of global comparison is revealing.

References:

Choi I-G, Kim S-H. 2006. Evolution of protein structural classes and protein sequence families. Proc Nat Acad Sci USA 103:14056-14061. DOI link