The FAIR Guiding Principles for scientific data management and stewardship

Scientific data - Tập 3 Số 1
Mark D. Wilkinson1, Michel Dumontier2, IJsbrand Jan Aalbersberg3, Gabrielle Appleton3, Myles Axton4, Arie Baak5, Niklas Blomberg6, Jan‐Willem Boiten7, Luiz Olavo Bonino da Silva Santos8, Philip E. Bourne9, Jildau Bouwman10, Anthony J. Brookes11, Tim W. Clark12, Mercè Crosas13, Ingrid Dillo14, Olivier Dumon3, Scott Edmunds15, Chris T. Evelo16, Richard Finkers17, Alejandra González-Beltrán18, Alasdair J. G. Gray19, Paul Groth3, Carole Goble20, Jeffrey S. Grethe21, Jaap Heringa22, Peter A.C. ’t Hoen23, Rob Hooft24, Tobias Kuhn25, Ruben Kok22, Joost N. Kok26, Scott J. Lusher27, Maryann E. Martone28, Albert Mons29, Abel L. Packer30, Bengt Persson31, Philippe Rocca‐Serra18, Marco Roos32, René van Schaik33, Susanna‐Assunta Sansone18, Erik Schultes34, Thierry Sengstag35, Ted Slater36, George Strawn37, Morris A. Swertz38, Mark Thompson32, Johan van der Lei39, Erik M. van Mulligen39, Jan Velterop40, Andra Waagmeester41, Peter Wittenburg42, Katherine Wolstencroft43, Jun Zhao44, Barend Mons45
1Center for Plant Biotechnology and Genomics, Universidad Politécnica de Madrid, Madrid, 28223, Spain
2Stanford University, Stanford, 94305-5411, USA
3Elsevier,, Amsterdam, 1043 NX, The Netherlands
4Nature Genetics, New York, 10004-1562, USA
5Euretos and Phortos Consultants, Rotterdam, 2741 CA, The Netherlands
6ELIXIR, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
7Lygature, Eindhoven, 5656 AG, The Netherlands
8Vrije Universiteit Amsterdam, Dutch Techcenter for Life Sciences, Amsterdam, 1081 HV, The Netherlands
9Office of the Director, National Institutes of Health, Rockville, 20892, USA
10TNO, Zeist, 3700 AJ, The Netherlands
11Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
12Harvard Medical School, Boston, MA 02115, Massachusetts, USA
13Harvard University, Cambridge, MA 02138, Massachusetts, USA
14Data Archiving and Networked Services (DANS), The Hague, 2593 HW, The Netherlands
15GigaScience, Beijing Genomics Institute, Shenzhen, 518083, China
16Department of Bioinformatics, Maastricht University, Maastricht, 6200 MD, The Netherlands
17Wageningen UR Plant Breeding, Wageningen, 6708 PB, The Netherlands
18Oxford e-Research Center, University of Oxford, Oxford, OX1 3QG, UK
19Heriot-Watt University, Edinburgh EH14 4AS UK
20School of Computer Science, University of Manchester, Manchester M13 9PL, UK
21Center for Research in Biological Systems, School of Medicine, University of California San Diego,, La Jolla, 92093-0446, California, USA
22Dutch Techcenter for the Life Sciences, Utrecht, 3501 DE, The Netherlands
23Department of Human Genetics, Leiden University Medical Center, Dutch Techcenter for the Life Sciences, Leiden, 2300 RC, The Netherlands
24Dutch TechCenter for Life Sciences and ELIXIR-NL, Utrecht, 3501 DE, The Netherlands
25VU University Amsterdam, Amsterdam 1081 HV, The Netherlands
26Leiden Center of Data Science, Leiden University, Leiden, 2300 RA, The Netherlands
27Netherlands eScience Center, Amsterdam, 1098 XG, The Netherlands
28National Center for Microscopy and Imaging Research, UCSD, San Diego, 92103, USA
29Phortos Consultants, San Diego, 92011, USA
30SciELO/FAPESP Program, UNIFESP Foundation, São Paulo, 05468-901, Brazil
31Bioinformatics Infrastructure for Life Sciences (BILS), Science for Life Laboratory, Dept of Cell and Molecular Biology, Uppsala University, Uppsala, S-751 24, Sweden
32Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
33Bayer CropScience, Gent Area, 1831, Belgium
34Leiden Institute for Advanced Computer Science, Leiden University Medical Center, Leiden, 2300 RA, The Netherlands
35Swiss Institute of Bioinformatics and University of Basel, Basel, 4056, Switzerland
36Cray, Inc., Seattle, 98164, USA
37unaffiliated
38University Medical Center Groningen (UMCG), University of Groningen, Groningen, 9713 GZ, The Netherlands
39Erasmus MC, Rotterdam, 3015 CE, The Netherlands
40Independent Open Access and Open Science Advocate, Guildford, GU1 3PW, UK
41Micelio, Antwerp, 2180, Belgium
42Max Planck Compute and Data Facility, MPS, Garching, 85748, Germany
43Leiden Institute of Advanced Computer Science, Leiden University, Leiden, 2333 CA, The Netherlands
44Department of Computer Science, Oxford University, Oxford, OX1 3QD, UK
45Leiden University Medical Center, Leiden and Dutch TechCenter for Life Sciences, Utrecht, 2333 ZA, The Netherlands

Tóm tắt

Abstract

There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.

Từ khóa


Tài liệu tham khảo

Roche, D. G., Kruuk, L. E. B., Lanfear, R. & Binning, S. A. Public Data Archiving in Ecology and Evolution: How Well Are We Doing? PLOS Biol. 13, e1002295 (2015).

Bechhofer, S. et al. Research Objects: Towards Exchange and Reuse of Digital Knowledge. Nat. Preced. 10.1038/npre.2010.4626.1 (2010).

Benson, D. A. et al. GenBank. Nucleic Acids Res. 41, D36–D42 (2013).

Berman, H., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10, 980–980 (2003).

The Uniprot Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).

Wenger, M. et al. The SIMBAD astronomical database-The CDS reference database for astronomical objects. Astron. Astrophys. Suppl. Ser. 143, 9–22 (2000).

Crosas, M. "The Dataverse Network®: An Open-Source Application for Sharing, Discovering and Preserving Data". D-Lib Mag 17 (1), p2 (2011).

White, H. C., Carrier, S., Thompson, A., Greenberg, J. & Scherle, R. The Dryad data repository: A Singapore framework metadata architecture in a DSpace environment. Univ. Göttingen, p157 (2008).

Lecarpentier, D. et al. EUDAT: A New Cross-Disciplinary Data Infrastructure for Science. Int. J. Digit. Curation 8, 279–287 (2013).

Martone, M. E. FORCE11: Building the Future for Research Communications and e-Scholarship. Bioscience 65, 635 (2015).

White, E. et al. Nine simple ways to make it easier to (re)use your data. Ideas Ecol. Evol. 6 (2013).

Sandve, G. K., Nekrutenko, A., Taylor, J. & Hovig, E. Ten Simple Rules for Reproducible Computational Research. PLoS Comput. Biol. 9, e1003285 (2013).

Altman, M. & King, G. in D-Lib Magazine 13, no. 3/4 (2007).

Wolstencroft, K. et al. SEEK: a systems biology data and model management platform. BMC Syst. Biol. 9, 33 (2015).

Bauch, A. et al. openBIS: a flexible framework for managing and analyzing complex data in biology research. BMC Bioinformatics 12, 468 (2011).

Sansone, S.-A. et al. Toward interoperable bioscience data. Nat. Genet. 44, 121–126 (2012).

González-Beltrán, A., Maguire, E., Sansone, S.-A. & Rocca-Serra, P. linkedISA: semantic representation of ISA-Tab experimental metadata. BMC Bioinformatics 15, S4 (2014).

González-Beltrán, A. et al. From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics. PLoS ONE 10, e0127612 (2015).

Harland, L. Open PHACTS: A Semantic Knowledge Infrastructure for Public and Commercial Drug Discovery Research. Knowl. Eng. Knowl. Manag. Lect. Notes Comput. Sci. 7603/2012, 1–7 (2012).

Groth, P. et al. API-centric Linked Data integration: The Open PHACTS Discovery Platform case study. Web Semant. Sci. Serv. Agents World Wide Web 29, 12–18 (2014).

Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

Bourne, P. E., Berman, H. M., Watenpaugh, K., Westbrook, J. D. & Fitzgerald, P. M. D. The macromolecular crystallographic information file (mmCIF). Meth. Enzym 277, 571–590 (1997).

Rose, P. W. et al. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 43, D345–D356 (2015).

Kinjo, A. R. et al. Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format. Nucleic Acids Res. 40, D453–D460 (2012).

Gutmanas, A. et al. PDBe: Protein Data Bank in Europe. Nucleic Acids Res. 42, D285–D291 (2014).

UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).

Starr, J. et al. Achieving human and machine accessibility of cited data in scholarly publications. PeerJ Comput. Sci. 1, e1 (2015).

Wilkinson, M., Dumontier, M. & Durbin, P. DataFairPort: The Perl libraries version 0.231 10.5281/zenodo.33584 (2015).

Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. San Diego CA: FORCE11 https://www.force11.org/datacitation (2014).

Ohno-machado, L. et al. NIH BD2K bioCADDIE white paper—Data Discovery Index. http://dx.doi.org/10.6084/m9.figshare.1362572 (2015).

NIH BD2K bioCADDIE WG3 Members. WG3-MetadataSpecifications: NIH BD2K bioCADDIE Data Discovery Index WG3 Metadata Specification v1 doi:10.5281/zenodo.28019 (2015).

Musen, M. A. et al. The center for expanded data annotation and retrieval. J. Am. Med. Informatics Assoc. 22, 1148–1152 (2015).