Exome sequencing as a stopgap

The new Genome Biology has a perspective piece by Jacob Tennessen and colleagues, titled “The promise and limitations of population exomics for human evolution studies” Tennessen:exomics:2011. Exomics is the study of the coding part of the genome, which is only 30 megabases as opposed to the 3 gigabases of a whole genome. Today it is possible to apply methods that sequence only the protein-coding parts of the genome, by combining methods that capture such regions with next-generation sequencing. The result is vastly cheaper than a whole genome, and some of this cost savings can be applied to increase the coverage, which increases the sequence accuracy.

Tossing away 99% of the genome is not an ideal sampling strategy for many purposes. However, when it comes to phenotype prediction, we can make some predictions about how changes in amino acid sequences will affect protein function. Many important phenotypic changes are caused by non-coding variations in gene regulation, but genetics has not yet reached a state of knowledge where these can be readily predicted. So, if we’re sequencing people’s genomes for the purposes of finding disease or phenotype variants, exome sequences give much of the information that we can presently evaluate.

James Hadfield noted the spree of exome sequencing publications at his blog, Core Genomics (“Exome capture comparison publication splurge”). He tags the rationale for

A lot of people I have talked to are now looking at screening pipelines which use Exome-Seq ahead of WGS to reduce the number of whole Human genomes to be sequenced. The idea being that the exome run will find mutations that can be followed up in many cases and only those with no hits can be selected for WGS.

I have heard a number of geneticists looking at exome sequencing as an intermediate step in population genetics, a way to increase the size of samples more affordably than whole genome sequencing makes possible at present. I don’t think this will last long, as whole genomes offer much more for population genetic analysis and are rapidly dropping in price, but that depends on how technology develops. If we are consistently in the situation where researchers can multiplex 50 exomes at high coverage for the same price as one whole genome, it may make sense to use that strategy for a long time.

23andMe is starting an exome sequencing project. Daniel MacArthur’s comments on G+ and the subsequent reader comments are interesting.