Floating to the top of the data

The New York Times writes today about “Big Data” and its effects on disparate fields of science and public policy: “The Age of Big Data”.

For my money, this quote should be at the beginning of the article instead of embedded near the end:

Big Data has its perils, to be sure. With huge data sets and fine-grained measurement, statisticians and computer scientists note, there is increased risk of false discoveries. The trouble with seeking a meaningful needle in massive haystacks of data, says Trevor Hastie, a statistics professor at Stanford, is that many bits of straw look like needles.
Big Data also supplies more raw material for statistical shenanigans and biased fact-finding excursions. It offers a high-tech twist on an old trick: I know the facts, now lets find em. That is, says Rebecca Goldin, a mathematician at George Mason University, one of the most pernicious uses of data.

The article begins by hyping the career prospects for graduates who can analyze large datasets. I would emphasize that good analytical skills don’t emerge naturally from working with data, they must be learned as part of one’s scientific training. The top hazard working with large datasets is that they can temporarily knock out your BS meter.

We are obviously in the realm of big data now in paleoanthropology, as we grapple here to compare genomes that sum into the terabytes. I periodically link to stories about open access in astronomy precisely for this reason: those instruments generate terabytes of data and more every night.