john hawks weblog

paleoanthropology, genetics and evolution

data access

  • White House to recognize open science

    Wed, 2013-05-22 13:53 -- John Hawks

    The White House is looking to recognize people who are leading in open science efforts, either by providing free access to data or by using data that is already publicly available. I imagine that public education efforts using open data would also qualify for this recognition: "Seeking Outstanding 'Open Science' Champions of Change". The reward is a trip to a White House event June 20.

    We are asking for your help to identify “Open Science” Champions of Change—outstanding individuals, organizations, or research projects promoting and using open scientific data for the benefit of society. For example, a Champion’s work may involve:

    Providing free access to data or publications generated from scientific research; or

    Leading research that uses publically available scientific data.

    Anyone can nominate an “Open Science” candidate for consideration by May 23, 2013 (under “Theme of Service,” choose “Open Science”). In the “Reason for Nominating” section of the nomination form, please also include information about any upcoming open-science-related announcements or new steps that the individual or organization you are nominating has planned, which could potentially be launched at the Champions of Change event.

    I just found out about this process this morning, but it looks like a constructive step in recognizing people who are moving science in a more open direction. Earlier this year, the White House recommended a new policy on data access, which I found to be very helpful in comparison to the concurrent policy on publication access "White House policy on data access".

    Nominations for this honor are due tomorrow (Thursday), using the short online nomination form. I hope many worthy people can be recognized in this way!

    Synopsis: 
    A call for nominations for excellent open science researchers and advocates
  • White House policy on data access

    Sun, 2013-02-24 23:29 -- John Hawks

    The White House this week announced a new policy on public access to results from federally funded research. The announcement has gotten

    Michael Eisen comments: "No celebrations here: why the White House public access policy sucks".

    The administration fell hook line and sinker for the ridiculous argument put forth by publishers that the only way for researchers and the public to get the servies they provide is to give them monopoly control over the articles for a year – the year when they are of greatest potential use.

    Think about how absurd this is. Publishers, whose role should be to disseminate information as widely as possible, are now the only reason why the public will continue to not have access to research results their tax dollars paid for.

    Why is Eisen so exercised? Here's an excerpt from the White House policy memo describing the policy on publication access:

    In developing their public access plans, agencies shall seek to put in place policies that enhance innovation and competitiveness by maximizing the potential to create new business opportunities and are otherwise consistent with the principles articulated in section 1.

    Agency plans must also describe, to the extent feasible, procedures the agency will take to help prevent the unauthorized mass redistribution of scholarly publications.

    In other words, it's no longer just a matter of copyright agreements with publishers; now the federal agencies themselves must help police PDF sharing among researchers. I wonder where "mass redistribution" will kick in.

    Further, the memo does not set a 12-month access embargo as a maximum, it directs agencies to adopt the 12-month embargo as a guideline. There is a lot not to like in the memo.

    Most of the public attention to the decision has been directed at the effects on scientific publications. I have long been interested in a second area: the public access to data generated by federally funded research.

    The White House Office of Science and Technology Policy last year requested public comment on two questions: open dissemination of federally-funded research and open access to data resulting from federally-funded research. I commented last year in response to the OSTP request ("Public interests in data from federally funded research") about the value of data to scientists and others who are not members of federally funded labs. The present announcement from the White House did not indicate how these comments from last year may have contributed to the decision, but it includes general recommendations on both publication and data access.

    As it stands, the text of the memo essentially keeps in place the data access requirements established under the Bush administration. That is not a bad thing, and indeed the recommendations listed in the memo seem very reasonable. I quote them here at length:

    Each agency’s public access plan shall:

    a) Maximize access, by the general public and without charge, to digitally formatted scientific data created with Federal funds, while:

    i) protecting confidentiality and personal privacy,

    ii) recognizing proprietary interests, business confidential information,and intellectual property rights and avoiding significant negative impact on intellectual property rights, innovation, and U.S. competitiveness, and

    iii) preserving the balance between the relative value of long-term preservation and access and the associated cost and administrative burden;

    b) Ensure that all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans, as appropriate, describing how they will provide for long-term preservation of, and access to, scientific data in digital formats resulting from federally funded research, or explaining why long-term preservation and access cannot be justified;

    c) Allow the inclusion of appropriate costs for data management and access in proposals for Federal funding for scientific research;

    d) Ensure appropriate evaluation of the merits of submitted data management plans;

    e) Include mechanisms to ensure that intramural and extramural researchers comply with data management plans and policies;

    f) Promote the deposit of data in publicly accessible databases, where appropriate and available;

    g) Encourage cooperation with the private sector to improve data access and compatibility, including through the formation of public-private partnerships with foundations and other research funding organizations;

    h) Develop approaches for identifying and providing appropriate attribution to scientific data sets that are made available under the plan;

    i) In coordination with other agencies and the private sector, support training, education, and workforce development related to scientific data management, analysis, storage, preservation, and stewardship; and

    j) Provide for the assessment of long-term needs for the preservation of scientific data in fields that the agency supports and outline options for developing and sustaining repositories for scientific data in digital formats, taking into account the efforts of public and private sector entities.

    These recommendations are all basically already in the NSF data access policies, meaning that the new White House memo will maintain the status quo at that level.

    The problem is that the current policy is toothless. Continued data access is a very serious problem threatening the integrity of science. Self-archiving and institutional archiving have been sufficient to pass data management portions of grant applications, but have proven to be woefully insufficient to enable access to data. Meanwhile, some fields have intensive data collection but very little or no data entering the public domain as part of digital repositories. The recommendations listed above do nothing to change the current situation.

    Nevertheless there is some room within the recommendations for agency directors to take bolder action on data access. Section (j) perhaps provides the best hope. If federal funding agencies actually assess the long-term needs of each field supported by funding, many (including anthropology) will clearly benefit from the establishment of standard digital repositories.

    I hope that NSF will not sit on its current policy but will instead work to extend access more broadly. At the same time, I wish the White House had given clearer guidance to enable the creation of digital repositories and to require their standard use as a condition of continued funding of research projects.

    Synopsis: 
    A new memo from the Obama administration alerts my interest in data access.
  • Mouse brain mapping

    Tue, 2012-06-05 12:39 -- John Hawks

    This merits some attention: "Neuroscientists reach major milestone in whole-brain circuit mapping project".

    The data consist of gigapixel images (each close to 1 billion pixels) of whole-brain sections that can be zoomed to show individual neurons and their processes, providing a “virtual microscope.” The images are integrated with other data sources from the web, and are being made fully accessible to neuroscientists as well as interested members of the general public (http://mouse.brainarchitecture.org). The data are being released pre-publication in the spirit of open science initiatives that have become familiar in digital astronomy (e.g., Sloan Digital Sky Survey) but are not yet as widespread in neurobiology.

    It's a press release from Cold Spring Harbor Labs, giving some background on the project and its use of a "shotgun" mapping approach for neuronal connections. For me, the most exciting aspect of the open access data is the potential of running analyses across different datasets, such as the gene expression element of the Allen Brain Atlas. Drawing conclusions may require a sample more representative of different stages of ontogeny than is now available, but these will be the next logical step -- understanding brain structure really requires us to understand how it develops.

  • Big data, no access, no replication possible

    Tue, 2012-05-22 15:29 -- John Hawks

    The New York Times has an article by John Markoff today, pointing to several disputes over the standards for data release with scientific papers. "Troves of Personal Data, Forbidden to Researchers".

    These cases mostly relate to data gathered by corporations about their users or customers, which raises privacy concerns that are similar in some ways to those attending biomedical research. For that reason, I don't think that they are a good comparison with the situation in paleoanthropology, but they do overlap to a great extent with the issues in human genetics. In either case, the article has many elements that are useful to think about:

    He added that corporate control of data could give preferential access to an elite group of scientists at the largest corporations. “If this trend continues,” he wrote, “we’ll see a small group of scientists with access to private data repositories enjoy an unfair amount of attention in the community at the expense of equally talented researchers whose only flaw is the lack of right ‘connections’ to private data.”

    Also, I did not realize this:

    The data-sharing policy of the journal Science says, “All data necessary to understand, assess and extend the conclusions of the manuscript must be available to any reader of Science.”

    Several paleoanthropology papers have been published in the last few years without meeting this basic standard.

  • Making Big Data work in genetics

    Tue, 2012-05-15 15:33 -- John Hawks

    Laura Clarke and colleagues report on the data access and management practices of the 1000 Genomes Project [1].

    The larger data volumes and shorter read lengths of high-throughput sequencing technologies created substantial new requirements for bioinformatics, analysis and data-distribution methods. The initial plan for the 1000 Genomes Project was to collect 2× whole genome coverage for 1,000 individuals, representing ~6 giga–base pairs of sequence per individual and ~6 tera–base pairs (Tbp) of sequence in total. Increasing sequencing capacity led to repeated revisions of these plans to the current project scale of collecting low-coverage, ~4× whole-genome and ~20× whole-exome sequence for ~2,500 individuals plus high-coverage, ~40× whole-genome sequence for 500 individuals in total (~25-fold increase in sequence generation over original estimates). In fact, the 1000 Genomes Pilot Project collected 5 Tbp of sequence data, resulting in 38,000 files and over 12 terabytes of data being available to the community. In March 2012 the still-growing project resources include more than 260 terabytes of data in more than 250,000 publicly accessible files.

    The paper acknowledges that this large-scale genetic sequencing project nevertheless generates far less data than physics and astronomy projects. The Large Synoptic Survey Telescope, for example, will generate 20 terabytes each night of operation, while the Large Hadron Collider will generate roughly 15 petabytes per year. The 1000 Genomes Project data to date add up to around two weeks of LSST operation. Still, it's not hard to see how high-coverage sequencing will start to catch up in data storage and transfer requirements.

    We are now in a golden age of data centralization. But five years from now, we may return to a second era of disposable data, as gene expression and whole-genome resequencing studies will generate far more data than any central repository can store. We will need curation practices to identify and preserve data that have value beyond the project for which they were collected.

    The beautiful thing about this is that when data are abundant, they don't all have to work together. There is a real role for a new generation of curators to facilitate the mashups of the future.


    References

  • Public interests in data from federally funded research

    Thu, 2012-01-12 20:20 -- John Hawks

    I submitted the following essay in response to the Request for Information on Public Access to Digital Data Resulting from Federally Funded Research from the National Science and Technology Council's Interagency Working Group on Digital Data.

    This RFI is not the same as the current bill before Congress ("Open access op/ed in NY Times"), which would restrict public access to research articles based on federally funded research. Research articles are a very important issue, but I hope that the access to digital data will not be overshadowed by the attention to published results. As a paleoanthropologist, I believe that access to digital data from federally funded research projects is a fundamentally important issue, as I remark below.

    Introduction

    The United States provides grant funding to scientists through many federal programs. This funding advances work of public interest that might not happen without federal assistance.

    The creation of scientific knowledge may serve the public interest directly by enabling useful inventions or supplying actionable information on issues of public importance. A funded project may also serve the public interest indirectly, by (1) finding negative results that prevent wasted effort or public harm; (2) building the scientific infrastructure that enables future discoveries and advances; (3) training new and established scientists in effective research techniques; (4) enhancing international cooperation and public/private partnerships.

    Congress and the Executive Branch have recognized that access to the published results of scientific research is not sufficient to advance the direct and indirect public interests served by federally funded projects. Facilitating the indirect benefits of research is a major aim of federal agencies' "Broader Impacts" and data access rules. These policies have been a qualified success since their implementation, limited mainly by the exceptions carved out by programs and agencies to avoid requiring certain kinds of data to be reported along with research reports.

    I argue that open public access to digital data should be a requirement for all federally funded scientific research. Digital data can be maintained by federal agencies as a part of the reporting requirement of federal grant funding. Doing so will advance the interest of the public and ensure that today's science generates a continuing heritage of research excellence.

    Data access and transparency

    Transparency is essential to public trust. Scientific conclusions are formed by observation and replication, and for this process to be transparent, all data must be available for independent inspection. The possibility of such inspection should not be limited to qualified researchers, because the very existence of special access requirements blocks transparency of the scientific process.

    Changing technology has shifted the public's expectations about transparency. Digital technology enables most research data to be shared rapidly and at low cost. If data are produced in digital form, and digital data can be shared at low cost, researchers and agencies cannot credibly claim that the difficulty of reproducing and disseminating data is a sufficient reason to restrict access. Where no competing interest argues for restricted access (such as human subjects protections), a lack of access to digital data itself can now be a compelling reason for public distrust.

    Therefore, federally funded researchers should release digital data to the public by default. Federal agencies should facilitate this public reporting by requiring digital data to be supplied as part of final project reporting.

    Data access has a well-established record of success

    The recent history of human genetics demonstrates that open access to data has unforeseen benefits that can spawn innovation, support more effective education, and catalyze new discovery. In genetics, both federal and journal policies require release of data; raw data from federally funded projects are often available as they are generated, long before publication.

    My own laboratory has no federal research funding to date, but is actively engaged in research using data from federally funded projects. Today my laboratory trains undergraduate students in genetics with new data from ongoing federally funded genetic projects such as the 1000 Genomes Project. We use open access data from archaic human genomes to investigate the variation of ancient people and their relationships to living humans. This kind of work would be impractical without clearly established open data access policy.

    The open access to data from the Human Genome Project facilitated the rapid development of microarrays that are now used on a broad scale in human genetics to investigate the genetic correlates of human health and disease. Access to data from these studies has enabled other scientists to independently replicate many genetic associations. More important, meta-analysis of such data has shown that many associations cannot be replicated, while also showing some cases in which nonsignificant results across different samples give rise to a significant finding when pooling those samples. Access to negative results and raw data is necessary, in other words, to establish the facts in subsequent research. This goes beyond access to published research results and requires open access to unpublished digital data.

    Intellectual property protections and data access

    Research data are somewhat distinct from the intellectual property issues relating to research publications. Some kinds of data do not meet the standard of originality necessary for copyright protection, such as sequence data, CT or MRI data, or data from measurement instruments. For raw data from instruments, there is no intellectual property reason why federal agency should not maintain an open archive for the public.

    Much research data is unquestionably subject to copyright protection, such as lab notebooks, written descriptions, photographs, and original reconstructions. Yet there is still a substantial public and scientific interest in inspecting such data. For example, photographic documentation of archaeological sites and specimens are of particular scientific value and are today routinely produced by digital technologies and stored in digital form. Some primary digital records are unique products that cannot be recreated at another time and place: for example, in situ photographs of specimens, photographs and records of sites before excavation, and digital reconstructions. The scientific record would be incomplete without such contributions, and maintaining an archive of such data over the long term is a difficult task for a single investigator, beyond the scope of a grant term.

    In cases where it is impracticable to obtain Creative Commons or other open licenses to such content, a funding agency should at a minimum require that a copy of all such archival information be deposited along with the final project report and a limited-use non-commercial license permitting electronic dissemination of these materials to the public as part of the report.

    Metadata and data access

    Many have noted that raw data may be useless in the absence of additional information about how the data were obtained. Such information is known as "metadata". Researchers generate instrumental data using particular instrument settings and recording standards. They gather observational data under particular research protocols. These standards are may change quickly as instrumentation, technology, and scientific results themselves demand new practices.

    Some scientists note the problem of incompatible metadata, using it as an argument against to delay the establishment of open public access to data. In their view, the public are likely to misunderstand or misuse scientific data where metadata are not clearly indicated. Meta-analyses combining data from multiple research projects are an important secondary use of digital data, and such meta-analyses are impossible when data cannot be reconciled into common observational or instrumental frameworks. Performing original work with data collected in heterogeneous contexts is a research speciality of its own, and is itself sometimes targeted by federal grants.

    However, meta-analysis is only one purpose of data access. Transparency, replicability, and education are central public interests that do not require the reconciliation of data collection methods from multiple studies. They require only clear description of the methods under which data were obtained. At a minimum, final research reports on federally funded projects must describe the standards of data collection with sufficient detail to allow independent replication, including all unpublished results and data.

    Successes of data access in paleoanthropology

    I am an anthropologist, and am most familiar with the scientific data relating to human evolution. These data include genetic observations on living and skeletal samples of humans. They also include fossil and archaeological evidence such as photographs, CT scans, isotopic records, anatomical measurements and descriptions.

    For many years, nearly all genetic data resulting from federally funded research have been made available for public download. Much genetic data generated by non-federally funded research programs, including foreign and domestic institutes, has also been free for public download. These data have resulted in a massive acceleration of research on recent human evolution and human origins. They have also led to unexpected discoveries and a burgeoning contribution of other disciplines to understanding our evolution.

    Data from radiocarbon dating and other isotopic sampling has also been made available to the public. Human occupation sites are among the best sources of evidence about past climates. The investment of federal resources in human evolution research has generated a temporal record that is now essential to studying changes in the faunal and plant compositions of past environments. Free access to records has enabled stronger calibration of radiocarbon dates, the development of a more secure chronology, and a more highly replicable scientific record correlating different regions of the world. Our understanding of such events changes is vastly stronger when data are made public.

    Institutions and data access in paleoanthropology

    By contrast, CT scans and photographs pertaining to human origins are typically not made freely accessible to the public. The United States funding agencies are not the only parties with an interest in such data. In particular, museums and institutes that curate specimens often permit data collection under agreements that restrict the dissemination of the resulting data. Such agreements may be equated to "non-disclosure agreements" with respect to scientific data.

    An institution has a legitimate interest in controlling the public use of images and access to curated materials. Nevertheless, the lack of access to digital data results in reduplication of effort, overapplication of destructive sampling and measurement techniques, and unnecessary handling of precious and fragile specimens. Where it is practical, the United States should facilitate agreements with institutions that allow the release of digital data produced by public funding. Where release is not possible, funding should be granted only for those activities that will result in the release of data under a limited-use non-commercial license. Non-disclosure of data from instruments such as CT scanners, electron microscopes, or mass spectrometers is incompatible with scientific replication.

    Scientific careers and data access in paleoanthropology

    The economy of federal funding for scientific production sometimes leads to perverse incentives for high-ranking researchers that prevent public access to research data. Some scientists believe that their own future research will require exclusive access to data. Others want to impede research achievements by their academic rivals, or to maintain prestige and future funding opportunities.

    Scientific data in some areas may constitute "trade secrets" until they are protected by patents. Even in noncommercial research, federally funded scientists sometimes claim exclusive ownership over data that they plan to use in future research. In my own field of paleoanthropology, data secrecy supports a clandestine "quid pro quo" economy among researchers, in which established researchers and institutions allow furtive looks at unpublished data, to support and consolidate their power and influence.

    This is a game that the United States should simply decline to play. When federal research supports scientific results that are not subject to independent replication, it betrays the public interest in science.

    Established collaborations and centers of scientific research will always exert a strong influence upon the future of science, irrespective of federal data access policies. But established players should not use federal funding to construct barriers to open inquiry.

    Conclusion

    Open public access to data is one indication that a research project is following scientific principles. Making digital data available to the public would be good practice for any researcher, irrespective of funding source. Data access mitigates the risk that negative data will be unreported. Data access facilitates broader stewardship of research projects, in particular where collaborations create data that are distributed across many institutions. Data access and reporting standards enable other researchers to fill in for those who cannot complete scientific project due to health or other personal reasons.

    Federal grant agencies already have successful repositories for many kinds of digital data. Such data are shared with the public at minimal cost relative to the overall budget for federal research grants. Supporting digital data repositories has itself been an important granting aim for several federal agencies and continues to be an active part of scientific infrastructure. Limiting such repositories for the exclusive use of a small cadre of researchers is enormously wasteful of resources, when they can be opened to an interested public for a small incremental cost.

    The public has repeatedly invented surprising uses for digital data that can complement or enhance the scientific record. But much more important, open access to digital data serves the scientific values of transparency and independent replication, essential to maintaining public trust and investment in the research enterprise.

    Synopsis: 
    My response to a federal Request for Information on the topic of digital data access to federally funded research
  • Ecologists against public access to peer reviewed publications

    Fri, 2012-01-06 14:59 -- John Hawks

    This seems incredible, from Jonathan Eisen: "YHGTBFKM: Ecological Society of America letter regarding #OpenAccess is disturbing".

    Wow -- I am really disturbed by the letter the Ecological Society of America (ESA) has written to the White House OSTP in regard to Open Access publishing.

    ...

    So - the justification here for not making ecological articles available is that they are MORE important over time? So the taxpayers pays for research that is valuable and because it is valuable over time we should make it less freely available? Seriously?

    This next week is an important one for proponents of open access publication and data access, as the White House Office for Science and Technology Policy has requested public comments related to both these issues for federally funded research. I will be posting my letter about data access when I complete it this weekend. I encourage everyone to pay attention and submit a letter if possible. It is dismaying to see professional scientific societies take public stands against making their members' research available.

  • Will monographs arise from the dead, or eat our brains?

    Sat, 2011-10-01 21:26 -- John Hawks

    Inside Higher Ed reviews and interviews an author who argues that the scholarly monograph shackles academics to an obsolete model of communication:

    So it is strategic that Kathleen Fitzpatrick, director of scholarly communication at the Modern Language Association and a professor of media studies at Pomona College, invokes the living dead early to illustrate her argument in Planned Obsolescence: Publishing, Technology, and the Future of the Academy (NYU Press). The scholarly press book, she writes, “is no longer a viable mode of communication … [yet] it is, in many fields, still required in order to get tenure. If anything, the scholarly monograph isn’t dead; it is undead."

    I agree with this thesis in part. Sixty-dollar monographs are going the way of the thylacine. Locking scholarly content in the tall stacks of university libraries doesn't disseminate it. Peer review no longer improves work to the extent that it's worth locking it up in response. It is ridiculous for anyone to judge the quality of a young scholar's work by the imprint of a "prestigious" academic press. Tenure committees have simply delegated their responsibilities to editors, and the editors do a poor job.

    But I disagree that the scholarly monograph is dead. Personally, I expect monographs to undergo a renaissance as more academics adopt e-publishing. Academic presses affiliated with universities should be going all-digital, and should start massively promoting their back catalogs as e-books at fire-sale prices. The smart ones will take the opportunity to change their agenda, competing to publish new books by a new generation of scholars who are building a broad readership both inside and outside academia. There's no reason why we need to constrain our scholarship to books so boring that nobody wants to read them. Tomorrow's scholars should be engaging with a much broader public than university presses have historically cultivated.

    The stumbling block is that these books still must serve as a guide to the academic quality of young scholars' work. On this count, Fitzpatrick provides some useful ideas about how to build quality scholarship under a more collaborative model:

    The way to make this work, Fitzpatrick says, is to change the currency of scholarly communications from paper to credit. Instead of rewarding faculty for getting a lot of paper published, universities should consider how helpful tenure candidates have been in parsing other people’s articles written and helping others refine their ideas, she says. Journals could help out with this by creating “trust metrics” that cede more weight to academics who consistently give constructive feedback. They could also encourage frequent, thoughtful reviews by making them prerequisites for publishing one’s own work — thus attracting the sort of critical mass of reviewers that Fitzpatrick argues is necessary for successful peer-to-peer review (and which some previous high-profile experiments with the model failed to get).

    Under such a system, faculty members could glide to tenure on the wings of their reputations as positive contributors to the advancement of knowledge in their field — a metric the current “publish-or-perish” model does not adequately represent, Fitzpatrick says. “Little in graduate school or on the tenure track inculcates helpfulness,” she writes, “and in fact much militates against it.”

    Obviously I think this model would be better than our current one. Still, I worry about the actual assignment of credit. Quite frankly, all my writing here has done wonders for my influence, but has had a substantial drawback: Many of my ideas are used by other scholars without credit or citation. We compete for research support, and in that competition I get no credit or acknowledgement whatsoever for any contributions I make. That's a cost I've been willing to pay for what I do, but if we expect more young academics to share their ideas broadly, we're going to need to change the culture of research funding to recognize their contributions appropriately.

    My favorite part of the interview is the last question, which asked Fitzpatrick to give advice about new models of publication to a junior faculty member, librarian, and university provost, respectively.

    Finally, to the provost: understand that scholarly communication is a core responsibility of the university – so fundamental to the university mission, in fact, that it must be thought of as part of the institution’s infrastructure, not as a revenue center. And every university must develop some kind of plan for scholarly communication. If you leave disseminating the work of your faculty exclusively to corporate publishers, corporations will profit from it at your institution’s expense. Instead, invest in the structures that will get your faculty’s work into broader circulation – not least because those structures will help you make clear to the concerned public why the university continues to matter today.

    I'm going to append to this post the first link to my entry in the Anthropologies project: "What's wrong with anthropology?" where I discuss my own perspective on these problems. Needless to say, I think things need to change. I expect the change in scholarly communication to be highly specific to each academic field, as what works for cultural anthropology will not be the same as what works for genetics or English. But new approaches will be digital, and that means a university may find much more ability to support multiple approaches than is possible with print. The tools to support varied forms are already available, if universities would support and extend them, they could capture much of the need for academic communication.

    Synopsis: 
    Making academic writing relevant means abandoning the monograph, says a specialist.
  • The great world CT-scanning tour

    Fri, 2011-09-16 22:24 -- John Hawks

    The international version of Der Spiegel is running an English-language profile of the traveling CT-scan project from Jean-Jacques Hublin and the Max-Planck Institute for Evolutionary Anthropology: "German Scientists Bring Fossils into the Computer Age"

    To show just what the future holds for his field, Hublin crossed the back courtyard of the anatomy institute in Tel Aviv. There, next to the dumpsters, stands a 20-foot (6-meter) container that the Israeli technicians like to smoke behind. The box's exterior gives no hint that it holds a laboratory on prehistoric man unlike any other one in the world.

    This is a topic that should be followed closely by anyone interested in paleoanthropology's future. The article seems to imply that the data are being made freely available, but of course they are not. I am confident that, in the future, all data like these will be openly available, as they are now made routinely available in other fields of science. But for the time being, our field is one of the exceptions - and the closed nature of the data is a serious impediment given the great challenges we face educating the public about human evolution.

    The Spiegel article sets up the politics as a confrontation between Hublin and museum curators:

    Until now, Hublin says, it was usual to handle fossils from the dawn of mankind "like relics or national treasures." Under these circumstances, curators assumed the role of keepers of the Grail.

    In this way, curators were holding on the reins of scientific power. After all, it is vital for researchers to have access to the fossils. "Whoever is denied (this access) will never get anywhere," Hublin says.

    A New Era for Research

    Indeed, Hublin believes having a virtual fossil archive could herald the end of this system. He sees his work as boosting accessibility to the objects and says curators "are afraid of losing control."

    In my experience, the article's frame is overly simplistic. Scans aren't open unless the people who have them make them open. Believe me, if there were a lot of open scans out there, I'd be posting visualizations here on the weblog. Obviously people use funding and position to compete for prestige and control, and their strategies depend on the resources under their charge. When curators or institutions give permission to scan, it becomes a contractual matter. A foreign researcher coming to scan may demand a period of exclusivity, an institution might demand some meaningful local involvement in the research. The ultimate disposition of the data may be of little importance to either party relative to their more immediate needs. I am familiar with cases where scan data were never returned to the institution, despite promises of access, and other cases where institutions have refused to allow scanning because they objected to a long exclusivity period for the scanning team.

    Fossil remains of our ancestors and relatives are national treasures — indeed, even more broadly, they are pieces of world heritage. We have the technology today to bring those extraordinary objects to everyone in the world. So I think its a great shame that the politics of science continues to obscure our fossil record.

    Synopsis: 
    Der Spiegel profiles the Max-Planck CT-scanning trek to Israel, raising the politics of data access.
  • Floating on the data

    Mon, 2011-08-22 12:19 -- John Hawks

    Technology Review reports on a recent conference trying to spread data mining techniques. The point of departure is the growth of electronic sensor networks in industry and online social media information: "The New Big Data".

    People have been working with graphs of data for hundreds of years, but the graphs now being plotted from social networks or sensor networks are of an unprecedented scale, Apte says. "These are gigantic graphs," he says. "You're talking about millions of nodes and tens of millions of links."

    Dealing with graphs of that size and scope, and applying modern analytic tools to them, calls for better algorithms and other innovations.

    I'm dealing here with genetic data networks, which are becoming rapidly denser and we're beginning to apply these kinds of network methods to understand them. Once you begin to pass beyond the analysis of a single locus, and spread the data across the whole genome, it becomes necessary to go beyond a single tree, to understand the relationships (and commonalities) among genealogical networks that connect people with each other. In some ways, this shares more with epidemiological modeling than with traditional genetics.

Pages

Subscribe to data access

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.