From 100,000 to 25,000, a tale

Larry Moran has summarized a long history of the changing estimates of human gene number over the last fifty years. The post was invoked by the supposed "surprise" at the current low estimate of human gene number -- only around 25,000 genes, genome-wide.

People who learned about human genetics around the time I did often heard that the total human gene number was estimated at 100,000. Of course, there was no real evidence for the gene number, aside from various limiting assumptions. Moran raises several of the ways that people tried to estimate total gene number, ranging from genetic load arguments to hybridization experiments that attempted to find "unique" versus repetitive DNA fractions.

Here's a sample:

It was about this time that Walter Gilbert made his famous back-of-the-envelope calculation of 100,000 genes in the human genome. This was the estimate that became widely quoted when the human genome project was first proposed. It's interesting to note that Gilbert's estimate was not based on any experimental evidence; indeed, it conflicted with most of the available evidence suggesting far fewer genes. The larger number seemed less threatening to scientists who were worried that we might not have more genes than a fruit fly.

If you ever find yourself needing to tell this story, Moran provides a good starting point.

UPDATE (3/22/2006): Carl Zimmer notes a recent estimate that places the human gene number just above 18,000. This post is also highly recommended, especially for its consideration of just what all those genes do:

Today scientists still don't know the function of 5898 genes in the human genome. In other words, over the past six years about 7,000 genes either have been figured out or have vanished into the land of nevermind. That's progress, of a sort. But unknown genes still represent a major slice of the human genome, because the total number of genes has fallen as well. The blue slice in the pie above represents 32.2% of all our known genes. For all the work that has poured into the genome, for all the grand announcements, we still don't know have the faintest idea of what about a third of our genes are for.

That's a bit generous; working with functional categories you soon realize that the "function" of most genes is only "known" by observing structural similiarities with other known genes. For instance, a gene in humans might have a similar part of its amino acid sequence (or "motif") with a gene in Drosophila, which has known effects when mutated. That's pretty indirect knowledge of function, but something like this is all we have for many inferred human genes.

That's what makes life interesting.