Chipster: user-friendly analysis software for microarray and other high-throughput data

M Aleksi Kallio1, Jarno Tuimala1, Taavi Hupponen1, Petri Klemelä1, Massimiliano Gentile1, Ilari Scheinin2, M.K. Koski1, Janne Käki1, Eija Korpelainen1
1CSC - IT Center for Science, Keilaranta 14, Keilaniemi, Espoo, Finland
2Department of Pathology, VU University Medical Center, Amsterdam, The Netherlands

Tóm tắt

AbstractBackgroundThe growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software.ResultsChipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies.ConclusionsChipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available.

Từ khóa


Tài liệu tham khảo

Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5 (10): R80-10.1186/gb-2004-5-10-r80.

Chipster website. [http://chipster.csc.fi/]

Chipster open source project. [http://chipster.sourceforge.net]

Chipster Wiki. [http://sourceforge.net/apps/mediawiki/chipster/]

Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP: GenePattern 2.0. Nat Genet. 2006, 38 (5): 500-501. 10.1038/ng0506-500.

Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, et al: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 2004, 20 (17): 3045-3054. 10.1093/bioinformatics/bth361.

Goecks J, Nekrutenko A, Taylor J: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010, 11 (8): R86-10.1186/gb-2010-11-8-r86.

Parkinson H, Sarkans U, Kolesnikov N, Abeygunawardena N, Burdett T, Dylag M, Emam I, Farne A, Hastings E, Holloway E, et al: ArrayExpress update--an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res. 2011, 39 (Database): D1002-1004. 10.1093/nar/gkq1040.

Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, et al: NCBI GEO: archive for functional genomics data sets--10 years on. Nucleic Acids Res. 2011, 39 (Database): D1005-1010. 10.1093/nar/gkq1184.

Scheinin I, Myllykangas S, Borze I, Bohling T, Knuutila S, Saharinen J: CanGEM: mining gene copy number changes in cancer. Nucleic Acids Res. 2008, 36 (Database): D830-835.

Gautier L, Cope L, Bolstad BM, Irizarry RA: affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004, 20 (3): 307-315. 10.1093/bioinformatics/btg405.

Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002, 18 (Suppl 1): S96-104. 10.1093/bioinformatics/18.suppl_1.S96.

Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, et al: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2005, 33 (20): e175-10.1093/nar/gni179.

Gautier L, Moller M, Friis-Hansen L, Knudsen S: Alternative mapping of probes to genes for Affymetrix chips. BMC Bioinformatics. 2004, 5: 111-10.1186/1471-2105-5-111.

Barbosa-Morais NL, Dunning MJ, Samarajiwa SA, Darot JF, Ritchie ME, Lynch AG, Tavare S: A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data. Nucleic Acids Res. 2010, 38 (3): e17-10.1093/nar/gkp942.

Du P, Kibbe WA, Lin SM: nuID: a universal naming scheme of oligonucleotides for illumina, affymetrix, and other microarrays. Biol Direct. 2007, 2: 16-10.1186/1745-6150-2-16.

Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: Article 3

Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98 (9): 5116-5121. 10.1073/pnas.091062498.

Elo LL, Hiissa J, Tuimala J, Kallio A, Korpelainen E, Aittokallio T: Optimized detection of differential expression in global profiling experiments: case studies in clinical transcriptomic and quantitative proteomic datasets. Brief Bioinform. 2009, 10 (5): 547-555. 10.1093/bib/bbp033.

BrainArray website. [http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/genomic_curated_CDF.asp]

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.

Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M: KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010, 38 (Database): D355-360. 10.1093/nar/gkp896.

Falcon S, Gentleman R: Using GOstats to test gene lists for GO term association. Bioinformatics. 2007, 23 (2): 257-258. 10.1093/bioinformatics/btl567.

Goeman JJ, Buhlmann P: Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007, 23 (8): 980-987. 10.1093/bioinformatics/btm051.

Barry WT, Nobel AB, Wright FA: Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics. 2005, 21 (9): 1943-1949. 10.1093/bioinformatics/bti260.

Kamburov A, Pentchev K, Galicka H, Wierling C, Lehrach H, Herwig R: ConsensusPathDB: toward a more complete picture of cell biology. Nucleic Acids Res. 2011, 39 (Database): D712-717. 10.1093/nar/gkq1156.

Pavesi G, Pesole G: Using Weeder for the discovery of conserved transcription factor binding sites. Curr Protoc Bioinformatics. 2006, Chapter 2: Unit 2 11

Bembom O, Keles S, van der Laan MJ: Supervised detection of conserved motifs in DNA sequences with cosmo. Stat Appl Genet Mol Biol. 2007, 6: Article 8

Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A: JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010, 38 (Database): D105-110. 10.1093/nar/gkp950.

Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A, et al: The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 2011, 39 (Database): D876-882. 10.1093/nar/gkq963.

Venkatraman ES, Olshen AB: A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics. 2007, 23 (6): 657-663. 10.1093/bioinformatics/btl646.

van de Wiel MA, Kim KI, Vosse SJ, van Wieringen WN, Wilting SM, Ylstra B: CGHcall: calling aberrations for array CGH tumor profiles. Bioinformatics. 2007, 23 (7): 892-894. 10.1093/bioinformatics/btm030.

van de Wiel MA, Wieringen WN: CGHregions: dimension reduction for array CGH data with minimal information loss. Cancer Inform. 2007, 3: 55-63.

van de Wiel MA, Brosens R, Eilers PH, Kumps C, Meijer GA, Menten B, Sistermans E, Speleman F, Timmerman ME, Ylstra B: Smoothing waves in array CGH tumor profiles. Bioinformatics. 2009, 25 (9): 1099-1104. 10.1093/bioinformatics/btp132.

Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C: Detection of large-scale variation in the human genome. Nat Genet. 2004, 36 (9): 949-951. 10.1038/ng1416.

Van Wieringen WN, Van De Wiel MA, Ylstra B: Weighted clustering of called array CGH data. Biostatistics. 2008, 9 (3): 484-500.

van de Wiel MA, Smeets SJ, Brakenhoff RH, Ylstra B: CGHMultiArray: exact P-values for multi-array comparative genomic hybridization data. Bioinformatics. 2005, 21 (14): 3193-3194. 10.1093/bioinformatics/bti489.

van Wieringen WN, van de Wiel MA: Nonparametric testing for DNA copy number induced differential mRNA gene expression. Biometrics. 2009, 65 (1): 19-29. 10.1111/j.1541-0420.2008.01052.x.

Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. J Comput Biol. 2000, 7 (1-2): 203-214. 10.1089/10665270050081478.

Download page for case study sessions. [http://chipster.csc.fi/case-studies/]

Lenburg ME, Liou LS, Gerry NP, Frampton GM, Cohen HT, Christman MF: Previously unidentified changes in renal cell carcinoma gene expression identified by parametric analysis of microarray data. BMC Cancer. 2003, 3: 31-10.1186/1471-2407-3-31.

Baldewijns MM, van Vlodrop IJ, Vermeulen PB, Soetekouw PM, van Engeland M, de Bruine AP: VHL and HIF signalling in renal cell carcinogenesis. J Pathol. 2010, 221 (2): 125-138. 10.1002/path.2689.

Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, et al: Population genomics of human gene expression. Nat Genet. 2007, 39 (10): 1217-1224. 10.1038/ng2142.

Genevar website. [http://www.sanger.ac.uk/humgen/genevar/]

Andre F, Job B, Dessen P, Tordai A, Michiels S, Liedtke C, Richon C, Yan K, Wang B, Vassal G, et al: Molecular characterization of breast cancer with high-resolution oligonucleotide comparative genomic hybridization array. Clin Cancer Res. 2009, 15 (2): 441-451. 10.1158/1078-0432.CCR-08-1791.

Hess KR, Anderson K, Symmans WF, Valero V, Ibrahim N, Mejia JA, Booser D, Theriault RL, Buzdar AU, Dempsey PJ, et al: Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. J Clin Oncol. 2006, 24 (26): 4236-4244. 10.1200/JCO.2006.05.6861.

Siggberg L, Ala-Mello S, Jaakkola E, Kuusinen E, Schuit R, Kohlhase J, Bohm D, Ignatius J, Knuutila S: Array CGH in molecular diagnosis of mental retardation - A study of 150 Finnish patients. Am J Med Genet A. 2010, 152A (6): 1398-1410.

Helms MW, Kemming D, Contag CH, Pospisil H, Bartkowiak K, Wang A, Chang SY, Buerger H, Brandt BH: TOB1 is regulated by EGF-dependent HER2 and EGFR signaling, is highly phosphorylated, and indicates poor prognosis in node-negative breast cancer. Cancer Res. 2009, 69 (12): 5049-5056. 10.1158/0008-5472.CAN-08-4154.

Koschmieder A, Zimmermann K, Trissl S, Stoltmann T, Leser U: Tools for managing and analyzing microarray data. Brief Bioinform. 2011

Battke F, Symons S, Nieselt K: Mayday--integrative analytics for expression data. BMC Bioinformatics. 2010, 11: 121-

Saeed AI, Bhagabati NK, Braisted JC, Liang W, Sharov V, Howe EA, Li J, Thiagarajan M, White JA, Quackenbush J: TM4 microarray software suite. Methods Enzymol. 2006, 411: 134-193.

Embster website. [http://chipster.csc.fi/embster/]

Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.

Hadoop - Apache Software Foundation project home page. [http://hadoop.apache.org/]