Relational patterns of gene expression via non-metric multidimensional scaling analysis

Bioinformatics (Oxford, England) - Tập 21 Số 6 - Trang 730-740 - 2005
Y‐h. Taguchi1, Y. Oono2
1Department of Physics, Faculty of Science and Technology, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan.
2Department of Physics 1110 W. Green Street, Urbana, IL 61801, USA#TAB#

Tóm tắt

Abstract Motivation: Microarray experiments result in large-scale data sets that require extensive mining and refining to extract useful information. We demonstrate the usefulness of (non-metric) multidimensional scaling (MDS) method in analyzing a large number of genes. Applying MDS to the microarray data is certainly not new, but the existing works are all on small numbers (<100) of points to be analyzed. We have been developing an efficient novel algorithm for non-metric MDS (nMDS) analysis for very large data sets as a maximally unsupervised data mining device. We wish to demonstrate its usefulness in the context of bioinformatics (unraveling relational patterns among genes from time series data in this paper). Results: The Pearson correlation coefficient with its sign flipped is used to measure the dissimilarity of the gene activities in transcriptional response of cell-cycle-synchronized human fibroblasts to serum. These dissimilarity data have been analyzed with our nMDS algorithm to produce an almost circular relational pattern of the genes. The obtained pattern expresses a temporal order in the data in this example; the temporal expression pattern of the genes rotates along this circular arrangement and is related to the cell cycle. For the data we analyze in this paper we observe the following. If an appropriate preparation procedure is applied to the original data set, linear methods such as the principal component analysis (PCA) could achieve reasonable results, but without data preprocessing linear methods such as PCA cannot achieve a useful picture. Furthermore, even with an appropriate data preprocessing, the outcomes of linear procedures are not as clear-cut as those by nMDS without preprocessing. Availability: The FORTRAN source code of the method used in this analysis (pure nMDS) is available at http://www.granular.com/MDS/ Contact:  [email protected] Supplementary information:  http://www.granular.com/MDS/B1_2005.

Từ khóa


Tài liệu tham khảo

Borg, I. and Groenen, P. Modern Multidimensional Scaling1997, NY Springer

Cho, R.J., Huang, M., Campbell, M.J., Dong, H., Steinmetz, L., Sapinoso, L., Hampton, G., Elledge, S.J., Davis, R.W., Lockhart, D.J. 2001Transcriptional regulation and function during the human cell cycle. Nat. Genet.27, pp. 48–54

Cox, T.F. and Cox, M.A.A. Multidimensional Scaling1994, London Chapman and Hall

Donoho, D.L., Vetterli, M., DeVore, R.A., Daubechies, I. 1998Data compression and harmonic analysis. IEEE Trans. Inform. Theory44, pp. 2435–2476

Dyrskjot, L., Thykjaer, T., Kruhoffer, M., Jensen, J.L., Marcussen, N., Hamilton-Dutoit, S., Wolf, H., Orntoft, T.F. 2002Identifying distinct classes of bladder carcinoma using microarrays. Nat. Genet.3390–96

Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D. 1998Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci., USA9514863–14868

Green, P.E., Carmone, F.J., Jr, Smith, S.M. Multidimensional Scaling: Concepts and Applications1970, Boston, MA Allyn and Bacon

Hollander, M. and Wolfe, D.A. Nonparametric Statistical Methods1999, NY John Wiley & Sons

Holter, N.S., Mitra, M., Maritan, A., Cieplak, M., Banavar, J.R., Fedoroff, N.V. 2000Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc. Natl Acad. Sci., USA97, pp. 8409–8414

Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T.L., Jeffrey, C.F., Trent, J.M., Staudt, L.M., Hudson, J., Jr, Boguski, M.S., et al. 1999The transcriptional program in the response of human fibroblasts to serum. Science28383–87

Johansson, D., Lindgren, P., Beglund, A. 2003A multivariate approach applied to microarray data for identification of genes with cell cycle-coupled transcription. Bioinformatics19467–473

Kanaya, S., Kinouchi, M., Abe, T., Kudo, Y., Yamada, Y., Nishi, T., Mori, H., Ikemura, T. 2001Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome. Gene27689–99

Kasturi, J., Acharya, R., Ramanathan, M. 2003An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics19449–458

Kruskal, J.B. 1964Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika291–27

Kruskal, J.B. 1964Nonmetric multidimensional scaling: a numerical method. Psychometrika29115–129

Lagreid, A., Hvidsten, T.R., Midelfart, H., Komorowski, J., Sandvik, A.K. 2003Predicting gene ontology biological process from temporal gene expression patterns. Genome Res.13965–979

Lehmann, E.L. Nonparametrics1975, San Francisco, CA Holden-Day

Shedden, K. and Cooper, S. 2002Analysis of cell-cycle-specific gene expression in human cells as determined by microarray and double-thymidine block synchronization. Proc. Natl Acad. Sci., USA99, pp. 4379–4384

Shepard, R.N. 1962The analysis proximities: multidimensional scaling with an unknown distance function, I. Psychometrika27125–140

Shepard, R.N. 1962The analysis proximities: multidimensional scaling with an unknown distance function, II. Psychometrika27219–246

Shmulevich, I. and Zhang, W. 2002Binary analysis and optimization-based normalization of gene expression data. Bioinformatics18555–565

Slonim, D.K. 2002From patterns to pathways: gene expression data analysis of age. Nat. Genet. Suppl.32502–508

Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B. 1998Comprehensive identification of cell cycle regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell93273–3297

Taguchi, Y-h. and Oono, Y. 2004Nonmetric multidimensional scaling as a data-mining Tool: new algorithm and new targets. In Toda, M., Komatsuzaki, T., Konishi, T., Rice, R.S., Berry, S.A. (Eds.). Geometrical Structures of Phase Space Multidimensional Chaos130, pp. 315–351 Special Volume of Adv. Chem. Phys.

Taguchi, Y-h., Oono, Y., Yokoyama, K. 2001New possibilities of non-metric multidimensional scaling. Proc. Inst. Stat. Math.49133–153 (in Japanese)