The bugs will out

1 minute read

Following up on the editorial by Ralph Cicerone, calling for more effective data sharing, an editorial in the Guardian by computer scientist Darrel Ince reinforces the point about openness for software used in scientific models:

So, if you are publishing research articles that use computer programs, if you want to claim that you are engaging in science, the programs are in your possession and you will not release them then I would not regard you as a scientist; I would also regard any papers based on the software as null and void.

When I have published work based on computer models, all algorithms have been included in an appendix or cited directly in previous publications. Of course, population genetics algorithms are relatively simple compared to some kinds of models. But as Ince points out, it is expected in complex fields like econometrics and mathematics that such algorithms will be deposited with the journal or in an open access source.

The most important fact is that large software projects inevitably contain errors. If scientists do not use standard methods for testing and validating software, then their code will invariably be worse than computer industry norms:

There is enough evidence for us to regard a lot of scientific software with worry. For example Professor Les Hatton, an international expert in software testing resident in the Universities of Kent and Kingston, carried out an extensive analysis of several million lines of scientific code. He showed that the software had an unacceptably high level of detectable inconsistencies. For example, interface inconsistencies between software modules which pass data from one part of a program to another occurred at the rate of one in every seven interfaces on average in the programming language Fortran, and one in every 37 interfaces in the language C. This is hugely worrying when you realise that just one error just one will usually invalidate a computer program.

If someone demonstrated ESP but the experiment wouldn’t work without one particular experimenter standing in the room, we would rightly judge that it is not science. Likewise, if no one else can use your computer program, its results aren’t science. Simple as that.