Text-mining science

There are many reasons why we should have an arXiv for human evolution, and this isn’t the most important one…but I really wish I could do this with the literature on modern human origins right now:

They want to break down the full text of the articles into component phrases to see how often a particular word or phrase appears relative to others a measure of how 'meme-like' a term is. Their goals: to give Arxiv a new tool for identifying original source papers in physics, mathematics and computer science and to enable historians to spot trends from the 20 years that the repository has existed.
How do you find the moment when a given scientific transformation occurred? asks Jean-Baptiste Michel, co-director of the Cultural Observatory and a postdoctoral researcher in psychology at Harvard. You can help the reader figure out where in time the most relevant papers were located, which has always been difficult to do.

That’s from an article by Eric Hand about the “Cultural Observatory” at Harvard, who are going to apply Google’s n-gram approach to the physics preprint service (“Researchers aim to chart intellectual trends in Arxiv”). My manuscript on shrinking human brain size will be in there, but I don’t imagine it’s tightly linked to any trends in physics.

My impression of the modern human origins problem over the last 20 years is that it unfolded along parallel lines with some long-term stability of citation and linking. Only a small number of papers were consistently cited across the gamut of researchers – for example, archaeologists writing on modern human origins would consistently cite only a handful of papers from biological anthropologists, geneticists would usually cite only two or three papers consistently from archaeologists. Those papers formed a highly artificial tradition. Most often papers in Science or Nature, they were highly abbreviated forms of arguments, too brief to illustrate why specialists persisted in disagreements about certain issues. So, many geneticists writing about modern human origins never really understood the morphological argument in favor of some regional continuity, and many paleoanthropologists never understood the limits of genetic models supporting complete replacement.

UPDATE (2012-02-25): A reader writes:

Are you SURE that the shrinking brains aren't related to trends in physics? Like maybe string theory?