Independent component analysis reveals new and biologically significant structures in micro array data

BMC Bioinformatics - Tập 7 - Trang 1-12 - 2006
Attila Frigyesi1,2, Srinivas Veerla3, David Lindgren3, Mattias Höglund3
1Department of Cardiology, University Hospital, Lund, Sweden
2Centre for Mathematical Sciences, Mathematical Statistics, Lund University, Lund, Sweden
3Department of Clinical Genetics, Lund University Hospital, Lund, Sweden

Tóm tắt

An alternative to standard approaches to uncover biologically meaningful structures in micro array data is to treat the data as a blind source separation (BSS) problem. BSS attempts to separate a mixture of signals into their different sources and refers to the problem of recovering signals from several observed linear mixtures. In the context of micro array data, "sources" may correspond to specific cellular responses or to co-regulated genes. We applied independent component analysis (ICA) to three different microarray data sets; two tumor data sets and one time series experiment. To obtain reliable components we used iterated ICA to estimate component centrotypes. We found that many of the low ranking components indeed may show a strong biological coherence and hence be of biological significance. Generally ICA achieved a higher resolution when compared with results based on correlated expression and a larger number of gene clusters with significantly enriched for gene ontology (GO) categories. In addition, components characteristic for molecular subtypes and for tumors with specific chromosomal translocations were identified. ICA also identified more than one gene clusters significant for the same GO categories and hence disclosed a higher level of biological heterogeneity, even within coherent groups of genes. Although the ICA approach primarily detects hidden variables, these surfaced as highly correlated genes in time series data and in one instance in the tumor data. This further strengthens the biological relevance of latent variables detected by ICA.

Tài liệu tham khảo

Quackenbush J: Computational analysis of microarray data. Nat Rev Genet 2001, 2: 418–27. 10.1038/35076576

Jutten C, Herault J: Independent component analysis (INCA) versus principal component analysis. In Signal processing IV: Theories and applications. Amsterdam Elsevier; 1988:643–646.

Chiappetta P, Roubaud MC, Torrésani B: Blind source separation and the analysis of microarray data. J Comput Biol 2004, 11: 1090–1109. 10.1089/cmb.2004.11.1090

Hyvärinen A, Oja E: A fast fixed-point algorithm for independent component analysis. Neural Compuatation 1997, 9: 1483–1492. 10.1162/neco.1997.9.7.1483

Liebermeister W: Linear modes of gene expression determined by independent component analysis. Bioinformatics 2002, 18: 51–60. 10.1093/bioinformatics/18.1.51

Martoglio AM, Miskin JW, Smith SK, MacKay DJ: A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer. Bioinformatics 2002, 18: 1617–24. 10.1093/bioinformatics/18.12.1617

Saidi SA, Holland CM, Kreil DP, MacKay DJ, Charnock-Jones DS, Print CG, Smith SK: Independent component analysis of microarray data in the study of endometrial cancer. Oncogene 2004, 23: 6677–6683. 10.1038/sj.onc.1207562

Zhang XW, Yap YL, Wei D, Chen F, Danchin A: Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis. Eur J Hum Gen 2005, 13: 1303–1311. 10.1038/sj.ejhg.5201495

Bullinger L, Dohner K, Bair E, Frohling S, Schlenk RF, Tibshirani R, Dohner H, Pollack JR: Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N Engl J Med 2004, 350: 1605–1616. 10.1056/NEJMoa031046

Chang HY, Sneddon JB, Alizadeh AA, Sood R, West RB, Montgomery K, Chi JT, van de Rijn M, Botstein D, Brown PO: Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol 2004, 2: 206–214. 10.1371/journal.pbio.0020206

Chung CH, Parker JS, Karaca G, Wu J, Funkhouser WK, Moore D, Butterfoss D, Xiang D, Zanation A, Yin X, Shockley WW, Weissler MC, Dressler LG, Shores CG, Yarbrough WG, Perou CM: Molecular classification of head and neck squamous cell carcinomas using patterns of gene expression. Cancer Cell 2004, 5: 489–500. 10.1016/S1535-6108(04)00112-6

Ihmels J, Bergmann S, Barkai N: Defining transcription modules using large-scale gene expression data. Bioinformatics 2004, 20: 1993–2003. 10.1093/bioinformatics/bth166

Handl J, Knowles J, Kell DB: Computational cluster validation in post-genomic data analysis. Bioinformatics 2005, 21: 3201–3212. 10.1093/bioinformatics/bti517

Hyvärinen A, Karhunen J, Oja E: Independent Component Analysis. John Wiley & Sons; 2001.

Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB: Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17: 520–5. 10.1093/bioinformatics/17.6.520

Heyer LJ, Kruglyak S, Yooseph S: Exploring expression data: identification and analysis of coexpressed genes. Genome Res 1999, 9: 1106–1115. 10.1101/gr.9.11.1106

Gene expression Omnibus[http://www.ncbi.nlm.nih.gov/geo/]

Stanford Microarray Database[http://smd.stanford.edu/index.shtml]