Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute

Gen-Tao Chiang1, Peter Clapham1, Guoying Qi2, Kevin Sale3, Guy Coates1
1Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
2Wellcome Trust Sanger Institute, New Sequencing Technologies, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
3Wellcome Trust Sanger Institute, Infrastructure Management Team, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK

Tóm tắt

Từ khóa


Tài liệu tham khảo

Mardis ER: A decade's perspective on DNA sequencing technology. Nature 2011, 470(7333):198–203. 10.1038/nature09796

The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 2010, 467(7319):1061–1073. 10.1038/nature09534

UK10K[ http://www.uk10k.org/ ]

Cuff JJ, Coates G, Cutts T, Rae M: The Ensembl Computing Architecture. Genome Research 2004, 14: 971–975. 10.1101/gr.1866304

Lustre[ http://wiki.lustre.org/index.php/Main_Page ]

Schmuck F, Roger H: GPFS: A Shared-Disk File System for Large Computing Clusters. In Proceedings of the FAST'02 Conference on File and Storage Technologies. Monterey, California, USA; 2002:231–244.

Bell G, Hey T, et al.: Beyond the Data Deluge. Science 2009, 323(5919):1297–1298. 10.1126/science.1170411

Chervenak A, Foster I, Kesselman C, Salisbury C, Tuecke S: The Data Grid: Towards and Architecture for the Distiributed Management and Analysis of Large Scientific Datasets. Journal of Network and Computer Applications 2001, 23: 187–200.

Baru C, Moore R, Rajasekar A, Wan M: The SDSC Storage Resource Broker. IBM Toronto Centre for Advanced Studies Conference (CASCON'98) Toronto, Canada 1998.

Hedges M, Blanke T, et al.: Rule-based curation and preservation of data: A data grid approach using iRODS. Future Generation Computer Systems 2009, 25(4):446–452. 10.1016/j.future.2008.10.003

Rajasekar A, Moore R, et al.: Applying Rules as Policies for Large-Scale Data Sharing. Intelligent Systems, Modelling and Simulation (ISMS), 2010 International Conference on Liverpool, UK

Saljea EKH, Artachoa E, Austen KF, Bruin RP, Calleja M, Chappell H, Chiang G-T, Dove MT, Frame I, Goodwin A, Kleese van Damc K, Marmierd A, Parker SC, Pruneda M, Todorovac IT, Trachenko K, Tyer R, White TOH, Walker AM: eScience for molecular-scale simulations and the eMinerals project. Phil Trans R Soc A 2009, 367: 967–985. 10.1098/rsta.2008.0195

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25(16):2078–2079. 10.1093/bioinformatics/btp352

Jordan C, Stanzione D, et al.: Comprehensive Data Infrastructure for Plant Bioinformatics. In Interfaces and Abstractions for Scientific Data Storage (IASDS10). Create, Greece; 2010.

AUKS[ http://sourceforge.net/projects/auks/ ]

Basney J, Humphrey M, Welch V: The MyProxy Online Credential Repository. Software: Practice and Experience 2005, 35: 9:801–816.

iRODS User Group Meeting 2011[ https://www.irods.org/index.php/iRODS_User_Group_Meeting_2011 ]

Chiang G-T, Dove MT, Bovolo I, Ewen J: Implementing a Grid/Cloud eScience Infrastructure for Hydrological Sciences. Guide to eScience: next generation scientific research and discovery 2011. Computer Communications and Networks, Springer, Part 1, pp 3–28 Computer Communications and Networks, Springer, Part 1, pp 3-28

Chiang G-T, White TOH, Bovolo I, Ewen J: Geo-visualisation Fortran Library. Computers and Geosciences 2011, 37: 65–74. 10.1016/j.cageo.2010.04.012