Open Access, Open Data, Open Science
January 26, 2016
By Melissa Cheung
The open access (OA) movement has gained a lot of momentum, especially with the implementation of OA policies from funding agencies in many countries. We can think of openness in science as a continuum, and the next step in making science open and transparent is open access to research data, also known as open data. As modern research becomes more complex and data-driven, access to research data is critical to advancing scientific knowledge. Open data enables a wider group of researchers to build upon existing knowledge by reusing data in novel ways.
Although some disciplines already have a culture of sharing research data, most researchers are reluctant to make their research data publicly available. The main reasons for not sharing research data include the fear of being scooped by a competitor before publishing their findings, improper or lack of attribution, misinterpretation of the data by others, or the embarrassment of someone identifying errors in the data or the analysis.
Despite these concerns, the benefits of open access to research data outweigh the risks. As the recent working paper “The open research value proposition: How sharing can help researchers succeed” outlines, open science practices can boost a researcher's career through increased citations and opportunities for collaboration.
In a publish or perish culture, journal citations are the primary currency and other valuable research outputs are typically not considered in assessments of research impact. However, many forms of research outputs can be valuable. For example, negative results, where the data do not support the hypothesis, are rarely published in journal articles; but making these data publicly available and citable would reduce the costs of duplicating failed experiments, while allowing researchers to receive credit for their work.
In some disciplines, publicly sharing your research data, preliminary analysis, or a pre-print establishes scientific priority, giving researchers a citation advantage: their work can be cited even as the final paper undergoes peer-review. Furthermore, researchers can receive feedback and improve their work by sharing their research data or preliminary analysis before publication, resulting in the submission of higher quality articles that are more likely to be accepted for publication.
Reproducibility is one of the fundamental principles in scientific research, and yet many studies are not reproducible due to low quality or fraudulent data. Research methods and findings can only be validated if independent researchers can replicate results, and any researcher can tell you how difficult it is to reproduce research results based only on vague methodology sections in journal articles. Sharing research data and other open science practices could solve the reproducibility problem and help to curb scientific misconduct. Researchers who share their data have a greater incentive to ensure that their work will stand up to scrutiny while allowing others to verify results and validate conclusions, thus increasing the quality of science overall.
Other barriers to making data publicly available include intellectual property issues and ethical issues related to confidential data. In these situations, open access to data may not be appropriate. However, there are ways for researchers to share their research data while maintaining their intellectual property rights and protecting the confidentiality of sensitive data. For example, researchers can anonymize sensitive data, limit access to authorized researchers, or restrict data reuse to certain conditions that protect the privacy of human participants.
The good news is that researchers’ perceptions and practices around data sharing are changing. Most researchers support the principles behind open access to data, but appropriate policies and standards for research data management and data sharing are still in development. Several journal publishers already have data sharing policies and guidelines, such as PLOS, Nature, PNAS, and Science. Meanwhile, more and more funding agencies are requiring researchers to make their data publicly available in data repositories. There is typically no cost to researchers to deposit their data sets in data repositories and the repositories generally enable the data set to be fully cited. Careful annotation, standardization, and organization of deposited data sets is essential to reduce the risk of the data being misinterpreted or misused by others.
In Canada, several reports and consultations on open access to research data have been conducted over the years. Although the recommendations from these reports have yet to be implemented, it is expected that Canada will follow international trends in managing and sharing research data. In fact, the Tri-Agency is working towards developing requirements for research data management, which includes data sharing.
Currently, under Section 3.2 of the Tri-Agency Open Access to Publications policy, the Canadian Institutes of Health Research (CIHR) already has a requirement for researchers to deposit certain types of research data in an appropriate public data repository, and the Social Sciences and Humanities Research Council (SSHRC) has its own Research Data Archiving Policy that encourages data sharing. Furthermore, the Tri-Agency released a draft Statement of Principles on Digital Data Management in July 2015, which is expected to be adopted in the near future. The draft statement outlines the expectations, roles, and responsibilities for research data management to support Canada’s commitment to open science.
As well as being leaders in open access initiatives, libraries and librarians have a long history of organizing, preserving, and providing access to information. They already have the expertise required to preserve data, while making it discoverable, reusable, and citable. Indeed many libraries are key players in providing data management services, online repositories, and developing policies and guidelines for best practices. For example, the Queen’s University Library and the University of Guelph Library offer services to help researchers make their data available in online repositories. Additionally, the Canadian Association of Research Libraries (CARL) launched a project, called Portage, to develop a library-based research data management network in Canada, which will include a national preservation and discovery system for research data.
There are many steps in the data lifecycle before research data are ready to be shared, and researchers need to plan ahead in order to properly prepare their data to ensure that it will be discoverable and reusable by others. Data management plans (DMP) can help researchers decide how to best manage and share their research data during each stage of the data lifecycle. As such, many funding agencies in the US and the UK require researchers to submit a DMP with their grant proposals. In anticipation of similar mandates being introduced in Canada, Portage has launched the online DMP Assistant to guide researchers in writing a DMP.
The benefits of open access are widely accepted by the research community. However, there is still a ways to go towards making science open and harnessing the full potential of data-driven research. Data should be considered valuable research outputs that should be shared publicly and cited in the same way as journal articles. As funding agencies and publishers introduce data sharing policies, libraries and librarians are prepared to lend their expertise to assist researchers in managing their research data.
Melissa Cheung (@mwscheung) is a Science and Engineering Research Liaison Librarian at the University of Ottawa. She received her MIS from the School of Information Studies at the University of Ottawa; and also holds a Bachelor of Science from Carleton University. She is interested in makerspaces, research data management, scientific publishing, and research impact metrics.