Sequencing is outpacing computing

1 minute read

The New York Times notices DNA sequencing’s Malthusian trap: “DNA sequencing caught in deluge of data.”

That is a decline [in sequencing costs] by a factor of more than 800 over four years. By contrast, computing costs would have dropped by perhaps a factor of four in that time span.
The lower cost, along with increasing speed, has led to a huge increase in how much sequencing data is being produced. World capacity is now 13 quadrillion DNA bases a year, an amount that would fill a stack of DVDs two miles high, according to Michael Schatz, assistant professor of quantitative biology at the Cold Spring Harbor Laboratory on Long Island.

I have spoken with several scientists in other fields, like astronomy and particle physics, who deal with truly big datasets. Until now, biology data has actually been pretty small potatoes compared with the sheer amount pumped out by large projects in other fields. But that’s changing. The Times article points out a unique aspect of the data problem in genetics: There are now thousands of labs that can generate large datasets, many of whom have no special plan for data archiving or availability.

Google has enough capacity to do all of genomics in a day, said Dr. Schatz of Cold Spring Harbor, who is trying to apply Googles techniques to genomics data. Prodded by Senator Charles E. Schumer, Democrat of New York, Google is exploring cooperation with Cold Spring Harbor.
Googles venture capital arm recently invested in DNAnexus, a bioinformatics company. DNAnexus and Google plan to host their own copy of the federal sequence archive that had once looked as if it might be closed.

I don’t see Google as a deus ex machina for this one – although I do observe that several other big data projects are sponsored by large Microsoft investors or founders.