Mike the Mad Biologist checks in with an interesting post on “The Double Standard of Genomic Data Release and the Role of Incentives”. The question: why do large genome sequencing centers release data quickly and long pre-publication, while smaller projects (ostensibly bound by the same rules) don’t?
[F]or the large centers, this is essentially contract work: the funding agency has determined that a certain amount of genomic data is required to aid other scientists in one or more disciplines, and the center is obligated to deliver these data. That's what pays the bills. There's none of the all-too-typical R01 (or similar grant) progress update "we didn't deliver what we said we would, but we found this other thing that's interesting." If you're supposed to deliver X genomes, you deliver those genomes, period. In fact, some of these arrangements aren't even grants, they're actually federal contracts. Funding agencies learned this the hard way, as too many early sequencing centers resembled 'genomic roach motels': DNA checks in, but sequence doesn't check out.
I love that line. His message is that smaller projects are contingent on renewal, which depends much more on research objectives and publications than sequence output. The dynamics are clearly relevant outside genetics, where publication output is vastly more important – it’s not obvious that data access is ever considered as a positive reason for grant approval.