Bioconductor: open software development for computational biology and bioinformatics

Genome Biology - Tập 5 - Trang 1-16 - 2004
Robert C Gentleman1, Vincent J Carey2, Douglas M Bates3, Ben Bolstad4, Marcel Dettling5, Sandrine Dudoit4, Byron Ellis6, Laurent Gautier7, Yongchao Ge8, Jeff Gentry1, Kurt Hornik9, Torsten Hothorn10, Wolfgang Huber11, Stefano Iacus12, Rafael Irizarry13, Friedrich Leisch9, Cheng Li1, Martin Maechler5, Anthony J Rossini14, Gunther Sawitzki15, Colin Smith16, Gordon Smyth17, Luke Tierney18, Jean YH Yang19, Jianhua Zhang1
1Department of Biostatistical Science, Dana-Farber Cancer Institute, Boston, USA
2Channing Laboratory, Brigham and Women's Hospital, Boston, USA
3[Department of Statistics, University of Wisconsin-Madison, Madison, USA]
4Division of Biostatistics, University of California, Berkeley, Berkeley, USA
5Seminar for Statistics LEO C16, ETH Zentrum, Switzerl
6Department of Statistics, Harvard University, Cambridge, USA
7Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
8Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, USA
9Institut für Statistik und Wahrscheinlichkeitstheorie, TU Wien, Wien, Austria
10Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
11Division of Molecular Genome Analysis, DKFZ (German Cancer Research Center), Heidelberg, Germany
12Department of Economics, University of Milan, Milan, Italy
13Department of Biostatistics, Johns Hopkins University, Baltimore, USA
14Department of Medical Education and Biomedical Informatics, University of Washington, NE Pacific, Seattle, USA
15Statistisches Labor, Institut für Angewandte Mathematik, Heidelberg, Germany
16Department of Molecular Biology, The Scripps Research Institute, La Jolla, USA
17Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
18Department of Statistics and Actuarial Science, University of Iowa, Iowa City, USA
19Center for Bioinformatics and Molecular Biostatistics, Univerisity of California, San Francisco, San Francisco, USA

Tóm tắt

The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.

Tài liệu tham khảo

Bioconductor. [http://www.bioconductor.org] GNU operating system - Free Software Foundation. [http://www.gnu.org] Dafermos GN: Management and virtual decentralised networks: The Linux project. First Monday. 2001, 6 (11): [http://www.firstmonday.org/issues/issue6_11/dafermos/index.html] Free Software Project Management HOWTO. [http://www.tldp.org/HOWTO/Software-Proj-Mgmt-HOWTO] Torvalds L: The Linux edge. Comm Assoc Comput Machinery. 1999, 42: 38-39. 10.1145/299157.299165. Raymond ES: The cathedral and the bazaar. First Monday. 1998, 3 (3): [http://www.firstmonday.org/issues/issue3_3/raymond/index.html] R Development Core Team: R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. 2003 The R project for statistical computing. [http://www.R-project.org] Spot home page. [http://spot.cmis.csiro.au/spot] Wu H, Kerr MK, Cui X, Churchill GA: MAANOVA: a software package for the analysis of spotted cDNA microarray experiments. In The Analysis of Gene Expression Data: Methods and Software. Edited by: Parmigiani G, Garrett E, Irizarry R, Zeger S. 2003, New York: Springer-Verlag, 313-341. Li C, Wong WH: Model based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA. 2001, 98: 31-36. 10.1073/pnas.011404098. Chambers JM: Programming with Data: A Guide to the S Language. 1998, New York: Springer-Verlag eXtensible markup language (XML). [http://www.w3.org/XML] Box D, Ehnebuske D, Kakivaya G, Layman A, Mendelsohn N, Nielsen H, Thatte S, Winer D: Simple Object Access Protocol (SOAP) 1.1. [http://www.w3.org/TR/SOAP/] Stein L: Creating a bioinformatics nation. Nature. 2002, 417: 119-120. 10.1038/417119a. Message-Passing Interface (MPI). [http://www.mpi-forum.org] Parallel Virtual Machine (PVM). [http://www.csm.ornl.gov/pvm/pvm_home.html] Mascagni M, Ceperley DM, Srinivasan A: SPRNG: a scalable library for parallel pseudorandom number generation. In Monte Carlo and Quasi-Monte Carlo Methods 1998. Edited by: Niederreiter H, Spanier J. 2000, Berlin: Springer Verlag Rossini AJ, Tierney L, Li M: Simple parallel statistical computing in R. University of Washington Biostatistics Technical Report #193. 2003, [http://www.bepress.com/uwbiostat/paper193] Li M, Rossini AJ: RPVM: cluster statistical computing in R. RNews. 2001, 1: 4-7. SmartEiffel - the GNU Eiffel compiler. [http://smarteiffel.loria.fr] Distributed component object model (DCOM). [http://www.microsoft.com/com/tech/dcom.asp] GraphViz. [http://www.graphviz.org] Steele GL: Common LISP: The Language. 1990, London: Butterworth-Heinemann Shalit A, Starbuck O, Moon D: Dylan Reference Manual. 1996, Boston, MA: Addison-Wesley Leisch F: Sweave: dynamic generation of statistical reports using literate data analysis. In Compstat 2002 - Proceedings in Computational Statistics. Edited by: Härdle W, Rönz B. 2002, Heidelberg, Germany: Physika Verlag, 575-580. Vignette screenshot. [http://www.bioconductor.org/Screenshots/vExplorer.jpg] Purdy GN: CVS Pocket Reference. 2000, Sebastopol, CA: O'Reilly & Associates Concurrent Versions System (CVS). [http://www.cvshome.org] R Development Core Team: Writing R extensions. Vienna, Austria: R Foundation for Statistical Computing. 2003 Siek JG, Lee LQ, Lumsdaine A: The Boost Graph Library: User Guide and Reference Manual. 2001, Boston, MA: Addison-Wesley BOOST. [http://www.boost.org] Mei H, Tarczy-Hornoch P, Mork P, Rossini AJ, Shaker R, Donelson L: Expression array annotation using the BioMediator biological data integration system and the Bioconductor analytic platform. In Proceedings AMIA 2003. 2003, Bethesda, MD: American Medical Informatics Association Raymond ES: Software Release Practice HOWTO. [http://tldp.org/HOWTO/Software-Release-Practice-HOWTO/index.html] Buckheit J, Donoho DL: Wavelab and reproducible research. In Wavelets and Statistics. Edited by: Antoniadis A. 1995, New York:Springer-Verlag Gentleman R, Temple Lang D: Statistical analyses and reproducible research. Bioconductor Project Working Paper #2. 2002, [http://www.bepress.com/bioconductor/paper2] Rossini AJ, Leisch F: Literate statistical practice. University of Washington Biostatistics Technical Report #194. 2003, [http://www.bepress.com/uwbiostat/paper194] Schwab M, Karrenbach M, Claerbout J: Making scientific computations reproducible. Technical Report, Stanford University. Stanford: Stanford Exploration Project. 1996 The Perl directory. [http://www.perl.org] Python programming language. [http://www.python.org] Zhang J, Carey V, Gentleman R: An extensible application for assembling annotation for genomic data. Bioinformatics. 2003, 19: 155-56. 10.1093/bioinformatics/19.1.155. BioPerl. [http://BioPerl.org] BioPython. [http://BioPython.org] BioJava. [http://BioJava.org] Stajich J, Block D, Boulez K, Brenner S, Chervitz S, Dagdigian C, Fuellen C, Gilbert J, Korf I, Lapp H, et al: The BioPerl toolkit: Perl modules for the life sciences. Genome Res. 2002, 12: 1611-1618. 10.1101/gr.361602. The Omega project for statistical computing. [http://www.omegahat.org] BioMOBY. [http://BioMOBY.org] Chiaretti S, Li X, Gentleman R, Vitale A, Vignetti M, Mandelli F, Ritz J, Foa R: Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood. 2004, 103: 2771-2778. 10.1182/blood-2003-09-3243.