Decay processes of language lexicons

3 minute read

The Telegraph has a short article on Mark Pagel’s research into reconstructing ancient languages:

Dr Pagel has tracked how words have changed by comparing languages from the Indo-European family, which includes most of the past and present languages of Europe, the Middle East and the Indian sub-continent.
He has been able to track the evolutionary history of Indo-European back using a computer and said that some of the oldest words were well over 10,000 years old even though the original Indo-European language is thought to date back no more than 9,000 years.
"I can say with confidence that there are sounds or words that predate Indo-European," he said. "If you look at 'thou', 'I' and 'who', we can now tell they are probably at least 15,000 to 20,000 years old. The sounds used then for these meanings were probably very similar to those used today."

Pagel is far from alone in reconstructing proto-Indo-European, of course, but he is introducing evolutionary methods to the problem – called “glottochronology” – in a unique way.

The press release from IBM is actually more infomative (the project used an IBM supercomputer for its analysis):

Looking to the future, the less frequently certain words are used, the more likely they are to be replaced. Other simple rules have been uncovered - numerals evolve the slowest, then nouns, then verbs, then adjectives. Conjunctions and prepositions such as: and, or, but and on, over, against evolve the fastest, some as much as 100 times faster than numerals. Throw which is expected to evolve quickly, has a half-life of 900 years, there are 42 unrelated sounds for it across all the languages. In 10,000 years time, it will likely have been replaced in 10 of them possibly including English, unless of course we all do our part to keep the word in circulation.
50% of the words we use today would be unrecognisable to our ancestors living 2,500 years ago. If a time-traveller came to us, and told us he wanted to go back to that period, we could arm him with the appropriate phrase book, and hopefully keep him out of trouble explained Mark Pagel, Professor of Evolutionary Biology at the University of Reading.

I’ll have to wait to see the paper. I will be interested to get an idea of some of the dates they are proposing for language families and their relations.

There are still questions that would pose problems for this method. If we have to rely on highly-conserved words to do chronologies of language relations deeper than 5000 years, we are limited to a very small subset of words in any given language. How likely are these conserved words to be borrowed – causing similarities without recent ancestry? Are the probabilities of change across many languages over a short time (the source of the statistic) really comparable to the probability of change in a few ancient languages over a long time?

UPDATE (2009-02-28): A linguistically-minded reader writes:

Regarding this recent post in your blog, I was surprised to see you actually recommend the press release by IBM and the University of Reading ("The press release from IBM is actually more infomative ..."). Even an amateur linguist like myself can see that it's crap. There's been considerable discussion on Language Log, starting from the BBC interview, where Pagel was induced to make a right fool of himself, linguistically speaking (which doesn't say anything one way or the other about his computer expertise, of course).

You can find the Language Log story here, along with comments. I always withhold judgment on a piece of work until I’ve read it myself, but I certainly sympathize with those who think the story was poorly reported in the press.

Particularly the one with the headline “Handy phrasebook for Doctor Who”.