Human proteins are made of transposons?

A new PNAS paper by Roy Britten gives a partial answer:

This is a report of many distant but significant protein sequence relationships between human proteins and transposable elements (TEs). The libraries of human repeated sequences contain the DNA sequences of many TEs. These were translated in all reading frames, ignoring stop codons, and were used as amino acid sequence probes to search with BLASTP for similar sequences in a library of 25,193 human proteins. The probes show regions of significant amino acid sequence similarity to 1,950 different human genes, with an expectation of <10-3. In comparison with previous REPEATMASKER (Institute for Systems Biology, Seattle) studies, these probes detect many more TE sequences in more human coding sequences with greater length than previous work using DNA sequences. If the criterion is opened, very many matches are found occurring on 4,653 different genes after correction for the number seen with random amino acid sequence probes. The processes that led to these extensive sets of sequence relationships between TEs and coding sequences of human genes have been a major source of variation and novel genes during evolution. This paper lists the number of sequence similarities seen by amino acid sequence comparison, which is surely an underestimate of the actual number of significant relationships. It appears that many of these are the result of past events of duplication of genes or gene regions, rather than a direct result of TE insertion. This report of observable relationships leaves to the future the functional implications as well as the detection of the events of TE insertion.

I find it sort of interesting that sometime during evolution (probably way-back time, considering that these are partial sequence similarities rather than direct insertions) transposable elements may have been plugging modular sequence into genes and having adaptive effects. It gives one reason why some transposable elements may have lasted a long time -- they had adaptive effects once in a great while.

And it reflects back on the modular nature of proteins, if these are functional domains that are similar among many proteins and also might have skipped around the genome once upon a time.

There are a lot of big genome-wide comparative papers to be written like this. The hard part is coming up with evolutionary hypotheses to explain the results.