Darwin, languages, and genetics

How are languages and genes related to each other? Anthropology is an interdiscipinary subject, and this is probably the topic that pushes that envelope the furthest, in terms of calling on the expertise of many different disciplines in the humanities and sciences.

As an organizing principle, many workers have begun with the hypothesis that languages and genes each form genealogical relationships among populations, and that the coevolution of languages and populations should make these genealogies resemble each other. In other words, French, Spanish, Portuguese and Italian all descend from Latin, and the present-day populations of France, Spain, Portugal, and Italy all descend from the population of the Roman Empire. Hence, the relations of the languages and the relations of the populations are parallel to each other.

This general idea is older than Darwin’s Origin of Species, but Darwin’s words on the subject have been quoted more often than anyone else’s:

If we possessed a perfect pedigree of mankind, a genealogical arrangement of the races of man would afford the best classification of the various languages now spoken throughout the world (Darwin1859, 422).</div>

Like many of Darwin’s words, however, these are generally pulled from the surrounding context without further discussion. The sentence actually serves as an example in Darwin’s defense of the phylogenetic tree as a description of relationships. In the previous paragraph, he points out that the similarities among different species cannot be made to fit any simple series. Instead, a hierarchical, genealogical arrangement can account for many of their similarities and differences. And after this sentence, he describes differences in rate of language change as an analogy for the evolution of organisms:

If we possessed a perfect pedigree of mankind, a genealogical arrangement of the races of man would afford the best classification of the various languages now spoken throughout the world; and if all extinct languages, and all intermediate and slowly changing dialects, had to be included, such an arrangement would, I think, be the only possible one. Yet it might be that some very ancient language had altered little, and had given rise to few new languages, whilst others (owing to the spreading and subsequent isolation and states of civilisation of the several races, descended from a common race) had altered much, and had given rise to many new languages and dialects. The various degrees of difference in the languages from the same stock, would have to be expressed by groups subordinate to groups; but the proper or even only possible arrangement would still be genealogical; and this would be strictly natural, as it would connect together all languages, extinct and modern, by the closest affinities, and would give the filiation and origin of each tongue (Darwin1859, 422–423).</div>

Thus, Darwin’s discussion—in which language relationships are an example—raises two separate issues: (1) Whether similarities are described by seriation or hierarchy, and (2) Whether differences arise at a constant or changing rate. His readers would have been aware of historical linguistics, including the observation that no ”Great Chain” of languages could be constructed out of grammatical and phonological changes that are manifestly hierarchical. Likewise, they would be familiar with two ancient languages that had manifested long-term stasis in a small community of speakers.

These points discussed by Darwin remain active elements of debate about the relationships of recent human languages and genes:

  • How much of linguistic diversity is attributable to the genealogical relations among languages, and how much derives from horizontal modes of transmission, such as the borrowing of words and syntactic patterns?
  • How often do populations undergo language shifts?
  • How much of human genetic variation is attributable to ancient population divergences, and how much to recent gene flow?
  • Are genes today the same as those present in ancient populations, or have they been replaced by selection or other demographic processes?

Each point considers a way that the genealogy of languages may come to differ from genetic relationships of populations. Within a single generation, there is a very high concordance between language and genes: People inherit their genes from their parents, and they tend to learn the same language as their parents. But only a slight mismatch in each generation may, over the course of many generations, add up to a huge difference in the histories of the two systems. And if those differences are biased in some direction, instead of random noise, then they may not only obscure the real history; they may strongly point to a false one.

To return to our example: French, Spanish, Portuguese, and Italian are not the only major Romance languages: there is also Romanian, for example. Romanians are genetically most similar to their neighbors in southeastern Europe, such as Serbs and Greeks—neither Romance speakers. In this case, population movements after the Roman period, such as the migrations of Slavic peoples, transformed the languages spoken in the Balkans without fundamentally altering the genetic similarities. And the persistence of Greek reminds us that the vast expansion of the Roman empire could not supplant some linguistic communities, and did not itself erase some earlier genetic patterns.

So should we expect the ”perfect pedigree” of human populations to resemble the genealogy of languages? It would help if there were factors that tended to reinforce such similarities instead of destroying them. We have to go beyond the simple statistical comparison of language and gene trees, which over enough time really shouldn’t resemble each other very much—at least, if the deviations in their evolutionary patterns are just noise. Instead, we have to consider how demography shapes genetic and linguistic transfer.


