Molecular systematics and species trees

I’d like to point readers to a recent essay in Evolution, by Scott V. Edwards, titled, “Is a new and general theory of molecular systematics emerging?”

Edwards covers some of the recent progress and problems encountered when using molecular evidence to test phylogenetic hypotheses. A sampling of the issues: How do we combine information from different sets of molecular data? Can we just compile sequences from many gene loci together into one analysis (“concatenation”), or do we need to make allowances for genealogical diversity among loci? How do prior assumptions affect the outcomes of analyses, like the presence or absence of polytomies (branching points where three or more species emerge simultaneously)?

I try to think of things that students should read as they get up to speed with evolutionary genetics. Edwards’ essay raises many important points, and as I read through it, I reflected on the ways that paleoanthropologists increasingly need to be aware of the inner workings of molecular studies of phylogeny.

If we’re interested in the phylogeny of species, we need to know how the “tree” of relationships of species may be manifested in the genealogical relationships among genes. Discordances between genes result from the fact that gene trees are not species trees. Species are genetically variable, and the living descendants of an ancient species may have inherited different parts of the variation of ancient species. Depending on the demography of that ancient population, gene trees representing the evolution of two distinct genetic loci may have different topological properties.

From Edwards:

John Avise encapsulated the relationship between gene and species trees well in 1994: Gene trees and species trees are equally real phenomena, merely reflecting different aspects of the same phylogenetic process. Thus, occasional discrepancies between the two need not be viewed with consternation as sources of error in phylogeny estimation. When a species tree is of primary interest, gene trees can assist in understanding the population demographies underlying the speciation process (pp. 133 and 138 in Avise 1994). This essay is in part meant to reemphasize Avise' perspective and to remind readers that species trees are in fact the primary interest of systematics.

Genealogies involve some unknown parameters. Applying the fossil and archaeological record may let us constrain those parameters, just as applying molecular biology and pedigree comparisons may let us constrain the parameters describing the mutational process.

To my mind, this is where paleoanthropologists need to be most attentive: Molecular methods are not in conflict with fossil approaches, they implicitly depend upon them. Yet, communication between the two fields rarely involves actual numbers, so a frequent occurrence is that a “bottleneck” in paleoanthropology with a 10 percent reduction in population becomes a “bottleneck” in genetics with a 1000-fold reduction in population.

Testing of demographic hypotheses moved on to genome-wide polymorphism data several years ago. The logical equivalent for species divergences is lineage sorting – a model that’s been applied since the mid-1990’s. The hominoids are extremely well studied from the standpoint of molecular systematics, and remain the central example in most theoretical papers incorporating multiple loci. This year I have noticed several interesting implementations of whole-genome polymorphism comparisons among species embedded in phylogenetic trees. The higher mutation rate of CpG sites has long been known, but we now know that a 50-bp or longer flanking region may influence local mutation rate. As we move from genes to gene networks, our comparisons will not be the same nucleotide, but classes of mutations across classes of genes.

This is another of those cases where the future lies in better algorithms. Edwards seems a man after my own heart – the computer programs lend a superficial veneer of rigor, when the underlying assumptions are in need of challenge:

Producing phylogenies directly from gene sequences essentially in one step, without additional transformations, is now the dominant mode of phylogenetic analysis and indeed it has advanced the field enormously. Nonetheless, I suggest that the very success of this paradigm and the ease with which phylogenies could be produced directly from DNA matrices led to a comfort zone in phylogenetics. If we can imagine systematic methods themselves as a likelihood surface, I suggest that the current paradigm is a local optimum in that surface, an optimum that is useful but ultimately incomplete in so far as it has failed to model the potential for gene tree/species tree discordance even cursorily (Fig. 3) (Edwards 2009:6).

His theme is an old one – how do we use “total evidence” methods in phylogenetics. Variance among loci gives the problem a newish twist, one that may add information that other techniques have left on the table. But we have to wring it out of the data.


Edwards SV. 2009. Is a new and general theory of molecular systematics emerging? Evolution 63:1-19. doi:10.1111/j.1558-5646.2008.00549.x