Archiving old data: The case from astronomy

I’m catching up to the news. Last week, Science carried a report by Yudhijit Battacharjee, about some astronomers’ efforts to build digital archives of old photographic plates. There are collections of tens of thousands of these plates, each of them a small snapshot of the sky at a moment up to 150 years ago. The point of an archive is so that researchers looking for long-period phenomena can look at direct historical reference points.

Proponents argue that old plates provide the only way modern astronomers can study astrophysical phenomena on time scales longer than a few decades. "Why would you want to wait another 100 years to learn how certain stars might be varying in brightness and position over long time periods when we have this resource right here in front of us?" asks Grindlay, referring to the Harvard collection.
Preserving and scanning old plates, however, has been slow to win support from the broader astronomy community and funding agencies. Universities and observatories often discard plate collections when astronomers retire. Digitization projects in the United States and Europeincluding DASCH [Digital Access to a Sky Century at Harvard, the project described in the article]have proceeded in fits and starts on shoestring budgets.
"We live in a world where money is fixedso the question is, what is the relative merit of the old data compared to new data?" says David Monet, an astronomer with the U.S. Naval Observatory's (USNO's) station in Flagstaff, Arizona, who until 2000 led the scanning of some 20,000 old plates for a searchable online sky catalog. Although he spent nearly 15 years on that project, Monet now thinks historical observations are of little value because of limitations on how accurately the brightness and position of objects can be determined on the images. "The thrill of going back 50 years" is one thing, he says, but "is the science case for doing so strong enough?"

I want to make a direct analogy with the skeletal biology of Holocene peoples. There are thousands of skeletal remains housed in collections around the world from the last 10,000 years. There are way too many for any one person to study or master, and in fact there are few people who even know the locations of many of the collections. Most of these collections have archaeological or cultural associations of some kind, but unless you are a specialist in a given area of the world, you aren’t necessarily going to understand the connections of those associations to time, other regions or peoples.

In other words, anthropologists have a large and rich record of biological change over the last 10,000 years, but it is very challenging to put together a global picture representing more than a handful of well-developed case studies. I should know – that’s precisely the project I’m most interested in comparing with information about recent genetic changes.

There are some great projects and particular individuals who have made huge contributions to data collection and accessibility on recent collections. There are in fact too many for me to name individually, and I think we have to remember that those people and projects need more support – they make so much work possible in our field, which is after all a comparative science.

Still, we could do a lot more making collections and data comparable to each other and more widely available. This is a time period where, in my experience, collections are exceptionally accessible. The problem is that there are so many things that no one person (or even research group) can keep track of them all. Economizing on time, it is sometimes easier to hit several big collections, which leaves many small collections overlooked. But big collections have their biases. And even in the context of big collections, it becomes very costly to contemplate collecting new data like scans, or warehousing caliper measurements or morphometrics in data archives.

Anthropology is not like astronomy, where new data collection involves petabytes of telescopic data. Skeletal collections remain our primary data long after they’re excavated – and as ancient genomics becomes more and more possible, the amount of data that we can collect from these skeletal remains will massively increase. So tracking these essentials – the morphology, scans if possible, the cultural and temporal associations – will become more and more important to a broader range of scholars.

UPDATE (2009-05-02): I should mention that the Global History of Health Project is a great example of the kind of systematic study and archiving that could be done in this time period. It’s the subject of an article in this week’s Science by Ann Gibbons, and I’ll be writing more about it later.

References:

Bhattacharjee Y. 2009. Stars in dusty file cabinets. Science 324:460-461. doi:10.1126/science.324_460