For all those fake papers you've written

From the New Scientist technology blog (via Slashdot):

You may remember the story of some cheeky MIT students who wrote a computer programme to generate scientific papers. Well, now some researchers at the Indiana University School of Informatics have come up with an Inauthentic Paper Detector to foil it.
Mehmet Dalkilic, a data mining expert explains how it works: "We believe that there are subtle, short- and long-range word or even word string repetitions that exist in human texts, but not in many classes of computer-generated texts that can be used to discriminate based on meaning."

What is interesting is that these "subtle long-range repetitions" are definitely part of our comprehension of a text, but we don't necessarily have the confidence to claim a text is fake if it lacks them. We have the statistical sense innately that the computer in this program is making explicit.

It is one of the many ways that we help ourselves to make language more comprehensible -- a certain redundancy that keys the mind back to the subject at hand. A good writer uses those repeated phrases to make the text more understandable.

And it is one of the many reasons why natural texts have a relatively low information content, at least for their length -- they consistently follow certain patterns. For our minds, that's a good thing! It lets us understand them.