Developing and implementing an institute-wide data sharing policy

Springer Science and Business Media LLC - Tập 3 - Trang 1-8 - 2011
Stephanie OM Dyke1, Tim JP Hubbard1
1Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK

Tóm tắt

The Wellcome Trust Sanger Institute has a strong reputation for prepublication data sharing as a result of its policy of rapid release of genome sequence data and particularly through its contribution to the Human Genome Project. The practicalities of broad data sharing remain largely uncharted, especially to cover the wide range of data types currently produced by genomic studies and to adequately address ethical issues. This paper describes the processes and challenges involved in implementing a data sharing policy on an institute-wide scale. This includes questions of governance, practical aspects of applying principles to diverse experimental contexts, building enabling systems and infrastructure, incentives and collaborative issues.

Tài liệu tham khảo

Summary of Principles Agreed at the First International Strategy Meeting on Human Genome Sequencing: 25-28 February 1996. 1996, Bermuda HUGO, [http://www.ornl.gov/sci/techresources/Human_Genome/research/bermuda.shtml] Bentley D: Genomic sequence information should be released immediately and freely in the public domain. Science. 1996, 274: 533-534. 10.1126/science.274.5287.533. Waterston R, Sulston J: The genome of Caenorhabditis elegans. Proc Natl Acad Sci USA. 1995, 92: 10836-10840. 10.1073/pnas.92.24.10836. Sanger Institute Data Release Policy. 1998, [http://web.archive.org/web/19980625053324/www.sanger.ac.uk/Projects/release-policy.shtml] The International SNP Map Working Group: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001, 409: 928-933. 10.1038/35057149. Mouse Genome Sequencing Consortium, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, et al: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262. International HapMap Consortium: The International HapMap Project. Nature. 2003, 426: 789-796. 10.1038/nature02168. International Human Genome Sequencing Consortium: The publication of the working draft of the human genome by the International Human Genome Sequencing Consortium: Initial Sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062. Sharing data from large-scale biological research projects: a system of tripartite responsibility. Report of a meeting organized by the Wellcome Trust and held on 14-15. 2003, [http://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/web_document/wtd003207.pdf]January at Fort Lauderdale, USA Arzberger P, Schroeder P, Beaulieu A, Bowker G, Casey K, Laaksonen L, Moorman D, Uhlir P, Wouters P: Science and government. An international framework to promote access to data. Science. 2004, 303: 1777-1778. 10.1126/science.1095958. Promoting Access to Public Research Data for Scientific, Economic, and Social Development. 2003, OECD Follow Up Group on Issues of Access to Publicly Funded Research Data, Final Report, [http://dataaccess.ucsd.edu/Final_Report_2003.pdf] OECD: OECD Declaration on Access to Research Data from Public Funding. Adopted on 30 January 2004 in Paris OECD Principles and Guidelines for Access to Research Data from Public Funding. [http://www.oecd.org/dataoecd/9/61/38500813.pdf] National Institutes of Health Data Sharing Policy. [http://grants.nih.gov/grants/policy/data_sharing/] Medical Research Council policy on data sharing and preservation. [http://www.mrc.ac.uk/Ourresearch/Ethicsresearchguidance/Datasharinginitiative/Policy/index.htm] Wellcome Trust policy on data management and sharing. [http://www.wellcome.ac.uk/About-us/Policy/Policy-and-position-statements/WTX035043.htm] Biotechnology and Biological Sciences Research Council Data sharing policy. [http://www.bbsrc.ac.uk/organisation/policies/position/policy/data-sharing-policy.aspx] The Digital Archiving Consultancy, The Bioinformatics Research Centre (University of Glasgow) and The National e-Science Centre: Large-scale data sharing in the life sciences: data standards, incentives, barriers and funding models (The 'Joint Data Standards Study'). [http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552] The ENCODE Project Consortium: The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004, 306: 636-640. The ENCODE Project Consortium, Myers RM, Stamatoyannopoulos J, Snyder M, Dunham I, Hardison RC, Bernstein BE, Gingeras TR, Kent WJ, Birney E, Wold B, Crawford GE: A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 2011, 9: e1001046-10.1371/journal.pbio.1001046. The Wellcome Trust Case Control Consortium: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007, 447: 661-678. 10.1038/nature05911. Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, Van Vooren S, Moreau Y, Pettett RM, Carter NP: DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am J Hum Genet. 2009, 84: 524-533. 10.1016/j.ajhg.2009.03.010. The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature. 2010, 467: 1061-1073. 10.1038/nature09534. International Cancer Genome Consortium, Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, Bernabé RR, Bhan MK, Calvo F, Eerola I, Gerhard DS, Guttmacher A, Guyer M, Hemsley FM, Jennings JL, Kerr D, Klatt P, Kolar P, Kusada J, Lane DP, Laplace F, Youyong L, Nettekoven G, Ozenberger B, Peterson J, Rao TS, Remacle J, Schafer AJ, Shibata T, Stratton MR, et al: International network of cancer genome projects. Nature. 2010, 464: 993-998. 10.1038/nature08987. The Malaria Genomic Epidemiology Network: A global network for investigating the genomic epidemiology of malaria. Nature. 2008, 456: 732-737. 10.1038/nature07632. Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, Hao L, Kiang A, Paschall J, Phan L, Popova N, Pretel S, Ziyabari L, Lee M, Shao Y, Wang ZY, Sirotkin K, Ward M, Kholodov M, Zbicz K, Beck J, Kimelman M, Shevelev S, Preuss D, Yaschenko E, Graeff A, Ostell J, Sherry ST: The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007, 39: 1181-1186. The European Genome-phenome Archive. [http://www.ebi.ac.uk/ega/] Toronto International Data Release Workshop Authors, Birney E, Hudson TJ, Green ED, Gunter C, Eddy S, Rogers J, Harris JR, Ehrlich SD, Apweiler R, Austin CP, Berglund L, Bobrow M, Bountra C, Brookes AJ, Cambon-Thomsen A, Carter NP, Chisholm RL, Contreras JL, Cooke RM, Crosby WL, Dewar K, Durbin R, Dyke SO, Ecker JR, El Emam K, Feuk L, Gabriel SB, Gallacher J, Gelbart WM, et al: Prepublication data sharing. Nature. 2009, 461: 168-170. Field D, Sansone SA, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy K, Kolar P, Kolker E, Maxon M, Millard S, Mugabushaka AM, Perrin N, Remacle JE, Remington K, Rocca-Serra P, Taylor CF, Thorley M, Tiwari B, Wilbanks J: 'Omics data sharing. Science. 2009, 326: 234-236. 10.1126/science.1180598. Wellcome Trust Sanger Institute Data Sharing Policy. [http://www.sanger.ac.uk/datasharing/] Wellcome Trust Open Access Policy. [http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Open-access/index.htm] Wellcome Trust Sanger Institute Publication Policy. [http://www.sanger.ac.uk/datasharing/docs/wtsi_publication_policy.pdf] Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 2: 2078-2079. Hsi-Yang Fritz M, Leinonen R, Cochrane G, Birney E: Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 2011, 21: 734-740. 10.1101/gr.114819.110. Wellcome Trust Sanger Institute Data Sharing Guidelines. [http://www.sanger.ac.uk/datasharing/docs/wtsi_datasharing_guidelines.pdf] Leinonen R, Akhtar R, Birney E, Bower L, Cerdeno-Tárraga A, Cheng Y, Cleland I, Faruque N, Goodgame N, Gibson R, Hoad G, Jang M, Pakseresht N, Plaister S, Radhakrishnan R, Reddy K, Sobhany S, Ten Hoopen P, Vaughan R, Zalunin V, Cochrane G: The European Nucleotide Archive. Nucleic Acids Res. 2011, 39: D28-31. 10.1093/nar/gkq967. Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, Mani R, Rayner T, Sharma A, William E, Sarkans U, Brazma A: ArrayExpress - a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 2007, 35: D747-750. 10.1093/nar/gkl995. BioSharing. [http://otter.oerc.ox.ac.uk/biosharing/] DECIPHER v5.1 data sharing policy. [http://decipher.sanger.ac.uk/datasharing/] Wellcome Trust Sanger Institute website data resource pages. [http://www.sanger.ac.uk/resources/downloads/] Campbell EG, Clarridge BR, Gokhale M, Birenbaum L, Hilgartner S, Holtzman NA, Blumenthal D: Data withholding in academic genetics: evidence from a national survey. JAMA. 2002, 287: 473-480. 10.1001/jama.287.4.473. Kaye J, Heeney C, Hawkins N, de Vries J, Boddington P: Data sharing in genomics - re-shaping scientific practice. Nat Rev Genet. 2009, 10: 331-335. 10.1038/nrg2573. Data producers deserve citation credit [editorial]. Nat Genet. 2009, 41: 1045- Cambon-Thomsen A, Thorisson GA, Mabile L, Andrieu S, Bertier G, Boeckhout M, Cambon-Thomsen A, Carpenter J, Dagher G, Dalgleish R, Deschênes M, di Donato JH, Filocamo M, Goldberg M, Hewitt R, Hofman P, Kauffmann F, Leitsalu L, Lomba I, Mabile L, Melegh B, Metspalu A, Miranda L, Napolitani F, Oestergaard MZ, Parodi B, Pasterk M, Reiche A, Rial-Sebbag E, Rivalle G: The role of a Bioresource Research Impact Factor as an incentive to share human bioresources. Nat Genet. 2011, 43: 503-504. 10.1038/ng.831. Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, caBIG Data Sharing and Intellectual Capital Workspace: Towards a data sharing culture: recommendations for leadership from academic health centers. PLoS Med. 2008, 5: e183-10.1371/journal.pmed.0050183. Guttmacher AE, Nabel EG, Collins FS: Why data-sharing policies matter. Proc Natl Acad Sci USA. 2009, 106: 16894-10.1073/pnas.0910378106. Hanson B, Sugden A, Alberts B: Making data maximally available. Science. 2011, 331: 649-10.1126/science.1203354. Standard cooperating procedures [editorial]. Nat Genet. 2011, 43: 501- Hrynaszkiewicz I, Norton ML, Vickers AJ, Altman DG: Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers. BMJ. 2010, 340: 181-10.1136/bmj.c181. Kleppner D, Sharp PA: Research data in the digital age. Science. 2009, 325: 368-10.1126/science.1178927. Stodden V: The scientific method in practice: reproducibility in the computational sciences (February 9, 2010). MIT Sloan School Working Paper no. 4773, [http://ssrn.com/abstract=1550193] -10 Boulton G, Rawlins M, Vallance P, Walport M: Science as a public enterprise: the case for open data. Lancet. 2011, 377: 1633-1635. 10.1016/S0140-6736(11)60647-8. Framingham Heart Study. [http://www.framinghamheartstudy.org/index.html] Alzheimer's Disease Neuroimaging Initiative. [http://www.adni-info.org/] Travis K: Sharing data in biomedical and clinical research. Science (Career Magazine). 2011 Piwowar HA, Day RS, Fridsma DB: Sharing detailed research data is associated with increased citation rate. PloS One. 2007, 2: e308-10.1371/journal.pone.0000308. European Roadmap for Research Infrastructures, Roadmap. 2008, [http://ec.europa.eu/research/infrastructures/pdf/esfri_report_20090123.pdf] Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D: BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010, 26: 2204-2207. 10.1093/bioinformatics/btq351. Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Larsson P, Longden I, McLaren W, Overduin B, Pritchard B, Riat HS, Rios D, Ritchie GR, Ruffier M, Schuster M: Ensembl 2011. Nucleic Acids Res. 2011, 39: D800-806. 10.1093/nar/gkq1064. Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A, Diekhans M, Dreszer TR, Giardine BM, Harte RA, Hillman-Jackson J, Hsu F, Kirkup V, Kuhn RM, Learned K, Li CH, Meyer LR, Pohl A, Raney BJ, Rosenbloom KR, Smith KE, Haussler D, Kent WJ: The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 2011, 39: D876-882. 10.1093/nar/gkq963. Down TA, Piipari M, Hubbard TJ: Dalliance: interactive genome viewing on the web. Bioinformatics. 2011, 27: 889-890. 10.1093/bioinformatics/btr020. Omitola T, Koumenides CL, Popov IO, Yang Y, Salvadores M, Correndo G, Hall W, Shadbolt N: Integrating public datasets using linked data: challenges and design principles. Future Internet Assembly. Ghent, Belgium. 2010, [http://eprints.ecs.soton.ac.uk/21955/] House of Lords, Science and Technology Committee: Genomic medicine. 2009, London: The Stationery Office Limited, [http://www.publications.parliament.uk/pa/ld200809/ldselect/ldsctech/107/107i.pdf]