Application of independent component analysis to microarrays
Tóm tắt
We apply linear and nonlinear independent component analysis (ICA) to project microarray data into statistically independent components that correspond to putative biological processes, and to cluster genes according to over- or under-expression in each component. We test the statistical significance of enrichment of gene annotations within clusters. ICA outperforms other leading methods, such as principal component analysis, k-means clustering and the Plaid model, in constructing functionally coherent clusters on microarray datasets from Saccharomyces cerevisiae, Caenorhabditis elegans and human.
Tài liệu tham khảo
Butte A: The use and analysis of microarray data. Nat Rev Drug Discov. 2002, 1: 951-960. 10.1038/nrd961.
Ando T, Suguro M, Hanai T, Kobayashi T, Honda H, Seto M: Fuzzy neural network applied to gene expression profiling for predicting the prognosis of diffuse large B-cell lymphoma. Jpn J Cancer Res. 2002, 93: 1207-1212.
Brown M, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA. 2000, 97: 262-267. 10.1073/pnas.97.1.262.
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286: 531-537. 10.1126/science.286.5439.531.
Mukherjee S, Tamayo P, Mesirov JP, Slonim D, Verri A, Poggio T: Support vector machine classification of microarray data. Technical Report No. 182, AI Memo 1676. 1999, MIT, Cambridge: Massachusetts Institute of Technology
Ben-Dor A, Shamir R, Yakhini Z: Clustering gene expression patterns. J Comput Biol. 1999, 6: 281-297. 10.1089/106652799318274.
Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998, 95: 14863-14868. 10.1073/pnas.95.25.14863.
Kim SK, Lund J, Kiraly M, Duke K, Jiang M, Stuart JM, Eizinger A, Wylie BN, Davidson GS: A gene expression map for Caenorhabditis elegans. Science. 2001, 293: 2087-2092. 10.1126/science.1061603.
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan E, Dmitrovsky E, Sander ES, Golub TR: Interpreting patterns of gene expression with self-organizing maps: method and application to hematopoietic differentiation. Proc Natl Acad Sci USA. 1999, 96: 2907-2912. 10.1073/pnas.96.6.2907.
Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet. 1999, 22: 281-285. 10.1038/10343.
Kaminski N, Friedman N: Practical approaches to analyzing results of microarray experiments. Am J Respir Cell Mol Biol. 2002, 27: 125-132.
Bussermaker HJ, Li H, Siggia ED: Regulatory element detection using correlation with expression. Nat Genet. 2001, 27: 167-174. 10.1038/84792.
Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. J Comput Biol. 2000, 7: 601-620. 10.1089/106652700750050961.
Lazzeroni L, Owen A: Plaid models for gene expression data. Statistica Sinica. 2002, 12: 61-86.
Segal E, Battle A, Koller D: Decomposing gene expression into cellular processes. In Proceedings of the Eighth Pacific Symposium on Biocomputing: January 3-7 2003. Edited by: Altman RB, Durker AK, Hunter L, Jung TA, Klein TE. 2003, Kauai, Hawaii: World Scientific Publishing Company;, 89-100.
Segal E, Barash Y, Simon I, Friedman N, Koller D: From promoter sequence to expression. In Proceedings of the Sixth Annual International Conference on Research in Computational Molecular Biology: April 18-21 2002. Edited by: Myers G, Hannenhalli S, Saukoff D, Istrail S, Pevzner P, Waterman M. 2002, Washington DC: ACM Press, 263-272.
Jolliffe IT: Principle Component Analysis. 1986, New York: Springer-Verlag
Alter O, Brown PO, Botstein D: Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA. 2000, 97: 10101-10106. 10.1073/pnas.97.18.10101.
Misra J, Schmitt W, Hwang D, Hsiao L, Gullans S, Stephanopoulos G, Stephanopoulos G: Interactive exploration of microarray gene expression patterns in a reduced dimensional space. Genome Res. 2002, 12: 1112-1120. 10.1101/gr.225302.
Jutten C, Herault J: Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Processing. 1991, 24: 1-10. 10.1016/0165-1684(91)90079-X.
Makeig S, Bell AJ, Jung TP, Sejnowski TJ: Independent component analysis of electroencephalographic data. In Proceedings of the Advances in Neural Information Processing Systems: November 27-December 2 1995; Denver, Colorado. Edited by: Touretzky D, Mozer M, Hasselmo M. 1996, Cambridge (MA): MIT Press, 145-151.
Vigario R: Extraction of ocular artifacts from EEG using independent component analysis. Electroenceph Clin Neurophysiol. 1997, 103: 395-404. 10.1016/S0013-4694(97)00042-8.
Vigario R, Jousmaki V, Hamalainen M, Hari R, Oja E: Independent component analysis for identification of artifacts in magnetoencephalographic recordings. In Proceedings of the Advances in Neural Information Processing Systems: Dec 1-6 1997; Denver, Colorado. Edited by: Jordan M, Kearns M, Solla S. 1998, Cambridge (MA): MIT Press, 229-235.
Stone JV, Porrill J, Porter NR, Wilkinson ID: Spatiotemporal independent component analysis of event-related fMRI data using skewed probability density functions. NeuroImage. 2002, 15: 407-421. 10.1006/nimg.2001.0986.
Back AD, Weigend AS: A first application of independent component analysis to extracting structure from stock returns. Int J Neural Syst. 1997, 8: 473-484. 10.1142/S0129065797000458.
Kiviluoto K, Oja E: Independent component analysis for parallel financial time series. In Proceedings of the Fifth International Conference on Neural Information Processing: October 21-23 1998. Edited by: Kearns M, Solla S, Cohn D. 1999, Kitakyushu, Japan: MIT Press, 895-898.
Hyvärinen A, Karhunen J, Oja E: Independent Component Analysis. 2001, New York: John Wiley & Sons
Amari S: Natural gradient works efficiently in learning. Neural Comput. 1998, 10: 251-276. 10.1162/089976698300017746.
Bell AJ, Sejnowski TJ: An information-maximization approach to blind separation and blind decovolution. Neural Comput. 1995, 7: 1129-1159.
Cardoso JF: High-order contrasts for independent component analysis. Neural Comput. 1999, 11: 157-192. 10.1162/089976699300016863.
Hyvärinen A: Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks. 1999, 10: 626-634. 10.1109/72.761722.
Lee TW, Girolami M, Sejnowski TJ: Independent component analysis using an extended Infomax algorithm for mixed subgaussian and supergaussian sources. Neural Comput. 1999, 11: 417-441. 10.1162/089976699300016719.
Burel G: Blind separation of sources: a nonlinear neural algorithm. Neural Networks. 1992, 5: 937-947.
Harmeling S, Zieche A, Kawanabe M, Muller K: Kernel feature spaces and nonlinear blind source separation. Proceedings of the Advances in Neural Information Processing Systems: December 3-8 2001; Vancouver, British Columbia, Canada. Edited by: Dietterich TG, Becker S, Ghahramani Z. 2002, Cambridge (MA): MIT Press, 761-768.
Hyvärinen A, Pajunen P: Nonlinear independent component analysis: existence and uniqueness results. Neural Networks. 1999, 12: 429-439. 10.1016/S0893-6080(98)00140-3.
Liebermeister W: Linear modes of gene expression determined by independent component analysis. Bioinformatics. 2002, 18: 51-60. 10.1093/bioinformatics/18.1.51.
Muller KR, Mika S, Ratsch G, Tsudat K, Scholkopf B: An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks. 2001, 12: 181-201. 10.1109/72.914517.
The Lactose Operon. Edited by: Beckwith JR, Zipser E. 1970, New York: Cold Spring Harbor Laboratory, Cold Spring Harbor
Yuh CH, Bolouri H, Davidson EH: Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. Science. 1998, 279: 1896-1902. 10.1126/science.279.5358.1896.
Atkinson MR, Savageau MA, Myers JT, Ninfa AJ: Development of genetic circuitry exhibiting toggle switch or oscillatory behavior in Escherichia coli. Cell. 2003, 113: 597-607.
Savageau MA: Design principles for elementary gene circuits: elements, methods, and examples. Chaos. 2001, 11: 142-159. 10.1063/1.1349892.
Hyvärinen A: Survey on independent component analysis. Neural Computing Surveys. 1999, 2: 94-128.
The Gene Ontology Consortium: Creating the gene ontology resource: design and implementation. Genome Res. 2001, 11: 1425-1433. 10.1101/gr.180801.
Kanehisa M, Goto S: KEGG for computational genomics. In Current Topics in Computational Molecular Biology. Edited by: Jiang T, Xu Y, Zhang MQ. 2002, Cambridge (MA): MIT Press, 301-315.
Hsiao L, Dangond F, Yoshida T, Hong R, Jensen RV, Misra J, Dilon W, Lee K, Clark K, Harverty P, et al: A Compendium of gene expression in normal human tissues reveals tissue-specific genes and distinct expression patterns of housekeeping genes. Physiol Genomics. 2001, 7: 97-104.
Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998, 9: 3273-3297.
Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, et al: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell. 1998, 2: 65-73.
Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000, 11: 4241-4257.
Troyanskaya O, Cantor M, Sherlock G, Brown PO, Hastie T, Tibshirani R, Botstein D, Altman RB: Missing value estimation methods for DNA microarrays. Bioinformatics. 2001, 17: 520-525. 10.1093/bioinformatics/17.6.520.
Additional data files for "Application of independent component analysis to microarrays". [http://www.stanford.edu/~silee/ICA/]
Quackenbush J: Microarray data normalization and transformation. Nat Genet. 2002, 32: 496-501. 10.1038/ng1032.
Yeast cell cycle analysis project. [http://cellcycle-www.stanford.edu]
HuGE Index (Human Gene Expression Index). [http://www.hugeindex.org]
Additional data files for "Systematic determination of genetic network architecture". [http://arep.med.harvard.edu/network_discovery/]
Plaid models, for microarrays and DNA expression. [http://www-stat.stanford.edu/~owen/plaid/]
Web supplement to "Genomic expression programs in the response of yeast cells to environmental changes". [http://www-genome.stanford.edu/yeast_stress]
Hardie DG, Carling D: The AMP-activated protein kinase: fuel gauge of the mammalian cell?. Eur J Biochem. 1997, 246: 259-273.
Hardie DG: Roles of the AMP-activated/SNF1 protein kinase family in the response to cellular stress. Biochem Soc Symp. 1999, 64: 13-27.
Supplemental Data for "A gene expression map for C. elegans". [http://cmgm.stanford.edu/~kimlab/topomap]
Giannakopoulos X, Karhunen J, Oja E: An experimental comparison of neural algorithms for independent component analysis and blind separation. Int J Neural Syst. 1999, 9: 99-114. 10.1142/S0129065799000101.
Scholkopf B, Burges CJC, Smola AJ: Advances in Kernel Methods - Support Vector Learning. 1999, Cambridge (MA): MIT press
Amari S, Wu S: Improving support vector machine classifiers by modifying kernel functions. Neural Networks. 1999, 12: 783-789. 10.1016/S0893-6080(99)00032-5.
Cristianini N, Campbell C, Shawe-Taylor J: Dynamically adapting kernels in support vector machines. In Proceedings of the Advances in Neural Information Processing Systems: December 1-3 1998; Denver, Colorado. Edited by: Kearns M, Solla S, Cohn D. 1999, Cambridge (MA): MIT Press, 204-210.
Karhunen J, Oja E, Wang L, Vigario R, Joutsensalo J: A class of neural networks for independent component analysis. IEEE Trans On Neural Networks. 1997, 8: 486-504. 10.1109/72.572090.
Chan K, Lee TW, Sejnowsji T: Handling missing data with variational Bayesian learning of ICA. In Proceedings of the Advances in Neural Information Processing Systems: 10-12 Dec 2002; Vancouver, British Columbia, Canada. 2003, Cambridge (MA): MIT Press
Chen Y, Dougherty ER, Bittner ML: Ratio-based decision and the quantitative analysis of cDNA microarray images. J Biomed Opt. 1997, 2: 364-374. 10.1117/1.429838.
Durbin BP, Hardin JS, Hawkins DM, Rocke DM: A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics. 2002, 18 Suppl 1: S105-S110.
Rocke D, Durbin B: A model for measurement error for gene expression arrays. J Comput Biol. 2001, 8: 557-569. 10.1089/106652701753307485.
Ikeda S, Toyama K: Independent component analysis for noisy data - MEG data analysis. Neural Networks. 2000, 13: 1063-1074. 10.1016/S0893-6080(00)00071-X.
Girolami M, Fyfe C: Generalized independent component analysis through unsupervised learning with emergent bussgang properties. In Proceedings of the IEEE International Conference on Neural Networks; June 9-12 1997. Edited by: Karaayannis NB. 1997, Houston: IEEE Press, 1788-1791.
Taleb A, Jutten C: Source separation in post-nonlinear mixtures. IEEE Transaction on Signal Processing. 1999, 47: 2807-2820. 10.1109/78.790661.
Download the EEGLAB Toolbox for Matlab. [http://www.sccn.ucsd.edu/~scott/ica-download-form.html]
Download Extended InfoMax for Matlab (4.2c or 5.0). [http://www.cnl.salk.edu/~tewon/ICA/Code/ext_ica_download.html]
The FASTICA Package for MatLab. [http://www.cis.hut.fi/projects/ica/fastica]
Blind Source Separation and Independent Component Analysis. [http://www.tsi.enst.fr/~cardoso/guidesepsou.html]