Discriminatory Mining of Gene Expression Microarray Data

Journal of VLSI signal processing systems for signal, image and video technology - Tập 35 - Trang 255-272 - 2003

Zuyi Wang¹, Yue Wang², Jianping Lu¹, Sun-Yuan Kung³, Junying Zhang¹, Richard Lee⁴, Jianhua Xuan¹, Javed Khan⁵, Robert Clarke⁴

¹Department of Electrical Engineering and Computer Science, The Catholic University of America, Washington, DC, USA

²Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Alexandria, USA

³Department of Electrical Engineering Princeton University Princeton (USA)

⁴Lombardi Cancer Center, Georgetown University, Washington, DC, USA.

⁵National Human Genome Research Institute, National Institutes of Health, Bethesda, USA

Tóm tắt

Recent advances in machine learning and pattern recognition methods provide new analytical tools to explore high dimensional gene expression microarray data. Our data mining software, VISual Data Analyzer for cluster discovery (VISDA), reveals many distinguishing patterns among gene expression profiles, which are responsible for the cell's phenotypes. The model-supported exploration of high-dimensional data space is achieved through two complementary schemes: dimensionality reduction by discriminatory data projection and cluster decomposition by soft data clustering. Reducing dimensionality generates the visualization of the complete data set at the top level. This data set is then partitioned into subclusters that can consequently be visualized at lower levels and if necessary partitioned again. In this paper, three different algorithms are evaluated in their abilities to reduce dimensionality and to visualize data sets: Principal Component Analysis (PCA), Discriminatory Component Analysis (DCA), and Projection Pursuit Method (PPM). The partitioning into subclusters uses the Expectation-Maximization (EM) algorithm and the hierarchical normal mixture model that is selected by the user and verified “optimally” by the Minimum Description Length (MDL) criterion. These approaches produce different visualizations that are compared against known phenotypes from the microarray experiments. Overall, these algorithms and user-selected models explore the high dimensional data where standard analyses may not be sufficient.

Tài liệu tham khảo

D.J. Duggan, M.L. Bittner, Y. Chen, P. Meltzer, and J.M. Trent, "Expression Profiling Using cDNA Microarrays," Nature Genetics, vol. 21, 1999, pp. 10-14. U. Scherf, D.T. Ross, M. Waltham, L.H. Smith, J.K. Lee, L. Tanabe, K.W. Kohn, W.C. Reinhold, T.G. Myers, D.T. Andrews, D.A. Scudiero, M.B. Eisen, E.A. Sausville, Y. Pommier. D. Botstein, P.O. Brown, and J.N. Weinstein, "A Gene Expression Database for the Molecular Pharmacology of Cancer," Nature Genetics, vol. 24, 2000, pp. 236-244. M. Bittner, P. Meltzer, Y. Chen, Y. Jiang, E. Seftor, M. Hendrix, M. Radmacher, R. Simon, Z. Yakhinl, A. Ben-Dor, N. Sampas, E. Dougherty, E. Wang, F. Marincola, C. Gooden, J. Lueders, A. Glatfelter, P. Pollock, J. Carpten, E. Gillanders, D. Leja, K. Dietrich, C. Beaudry, M. Berens, D. Alberts, V. Sondak, N. Hayward, and J. Trent, "Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling," Nature, vol. 406, no. 3, 2000, pp. 536-540. H. Zhang, C.-Y. Yu, B. Singer, and M. Xiong, "Recursive Partitioning for Tumor Classification with Gene Expression Microarray Data," Proc. Natl. Acad. Sci., vol. 98, no. 12, 2001, pp. 6730-6735. T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, 1999, pp. 531-537. J. Khan, J.S. Wei, M. Rigner, L.H. Saal, M. Lananyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, and P.S. Meltzer, "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks," Nature Medicine, vol. 7, no. 6, 2001, pp. 673-679. P. Tamayo, D. Slonim, J. Msirov et al., "Interpreting Pattern of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation," Proc. Natl. Acad. Sci., vol. 96, 1999, pp. 2907-2912. E. Hartuv, A.O. Schmitt, L. Lange, S. Meier-Ewert, H. Lehrach, and R. Schamir, "An Algorithm for Clustering cDNA Fingerprints," Genomics, vol. 66, 2000, pp. 249-256. A. Ben-Hur, D. Horn, H.T. Siegelmann, and V. Vapnik, "Support Vector Clustering," J. Machine Learning Research, vol. 2, 2001, pp. 125-137. 270 Wang et al. Y. Wang, L. Luo, M.T. Freedman, and S.-Y. Kung, "Probabilistic Principal Component Subspaces: A Hierarchical Finite Mixture Model for DataVisualization," IEEE Trans. Neural Nets, vol. 11, no. 3,2000, pp. 625-636. Y. Wang, J. Lu, and Z. Wang et al., "Discriminative Mining of Gene Microarray Data," in Proc. of IEEE Neural Network for Signal Processing Workshop, Sept. 2001, pp. 23-32. S.T. Roweis and L.K. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, 2000, pp. 2323-2326. R. Lotlikar and R. Kothari, "Fractional-Step Dimensionality Reduction," IEEE Trans. Pattern Anal. Machine Intell., vol. 22, no. 6, 2000, pp. 623-627. G.E. Hinton, P. Dayan, and M. Revow, "Modeling the Manifolds of Images of Handwritten Digits," IEEE Trans. Neural Net., vol. 8, no. 1, 1997, pp. 65-74. N. Kambhatla and T.K. Leen, "Dimension Reduction by Local Principal Component Analysis," Neural Computation, vol. 9, no. 7, 1997, pp. 1493-1516. M.E. Tipping and C.M. Bishop, "Mixtures of Probabilistic Principal Component Analyzers," Neural Computation, vol. 11, 1999, pp. 443-482. C.M. Bishop and M.E. Tipping, "A Hierarchical Latent Variable Model for Data Visualization," IEEE Trans. Pattern Anal. Machine Intell., vol. 20, no. 3, 1998, pp. 282-293. S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed., Upper Saddle River, New Jersey: Prentice-Hall, Inc., 1999. D.M. Titterington, A.F.M. Smith, and U.E. Markov, Statistical Analysis of Finite Mixture Distributions, New York: JohnWiley, 1985. E. Mjolsness and D. DeCoste, "Machine Learning for Science: State of the Art and Future Prospects," Science, vol. 293, 2001, pp. 2051-2055. J. Rissanen, "Modeling by Shortest Data Description," Automatica, vol. 14, 1978, pp. 465-471. A.K. Jain, R.P.W. Duin, and J. Mao, "Statistical Pattern Recognition: A Review," IEEE Trans. Pattern Anal. Machine Intell., vol. 22, no. 1, 2000, pp. 4-37. J.H. Friedman, "Exploratory Projection Pursuit," J. Ame. Stat. Asso., vol. 82, no. 397, 1987, pp. 249-266. A. Hyvarinen and E. Oja, "Independent Component Analysis: Algorithms and Applications," Neural Networks, vol. 13, 2000, pp. 411-430. B. Ripley,Pattern Recognition and Neural Networks, Cambridge University Press, 1996. Y. Wang, S.-H. Lin, H. Li, and S.-Y. Kung, "Data Mapping by Probabilistic Modular Networks and Information-Theoretic Criteria," IEEE Trans. Signal Processing, vol. 46, no. 12, 1998, pp. 3378-3397. K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., New York: Academic Press, 1990. S.-Y. Kung, Principal Component Neural Network, New York: Wiley, 1996. R.N. Bracewell, Two-Dimensional Imaging, Prentice-Hall, Inc., 1995.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA