Link: Sustaining open access databases

Cameron Neylon considers some of the challenges in keeping open data access initiatives sustainable over the long term: “Squaring Circles: The economics and governance of scholarly infrastructures”. He examines some proposals for supporting scholarly “goods”, and emphasizes that they are usually initiated without much planning for what to do if support eventually is lost:

Second, we can look at stable long standing infrastructures (Crossref, Protein Data Bank, NCBI, arXiv, SSRN) and note that in most cases governance arrangements are an accident of history and were not explicitly planned. Crises of financial sustainability (or challenges of expansion) for these organisations are often coupled to or lead to a crisis in governance, and in some cases a breakdown of community trust. Changes are therefore often made to governance in response to a specific crisis.
Where there is governance planning it frequently adopts a “best practice” model which looks for successful examples to draw from. It is not often based on “worse case scenario” planning. We suggest that this is a problem. We can learn as much from failures of sustainability and their relationship to governance arrangements as from successes.

The Social Science Research Network is a potent example. Initiated as an open repository, it was recently taken over by Elsevier, which changed some policies and began to take down some user-contributed content. Many projects that are started by small groups of founders, or funded initially by grants from a single source, are risky over the long term because in some cases the only way to sustain them is to change their policies or accessibility.

I would observe that the open source software movement is a good source of examples for how to ensure that data will be sustained in the open in the event that funding to a project is lost, or a project is acquired by a private entity. Many open software projects have been acquired or developed by private entities, and when they want to change the license terms, the projects are often subject to forking, where the original project may change but a community maintains an open version. The key is that communities of people must coordinate action and be prepared to mount an equivalent project. Widespread public mirroring of source code by individuals and institutions is a major part of what makes this possible.

To me, this suggests that any open data initiatives should be working closely with universities to ensure that their data are mirrored in an open manner.

Large-scale academic and scientific projects sometimes generate so much data that a large number of redundant mirrors are impractical. But it is exactly these instances that we should focus attention upon and ensure that national or international funding bodies are involved in their maintenance.