Interactive Exploration of Microarray Gene Expression Patterns in a Reduced Dimensional Space

Genome Research - Tập 12 Số 7 - Trang 1112-1120 - 2002
Jatin Misra1,2, William Schmitt1,2, Daehee Hwang1,2, Li-Li Hsiao1,2, S. R. Gullans1,2, George Stephanopoulos1,2, Gregory Stephanopoulos1,2
12Renal Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA
2Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

Tóm tắt

The very high dimensional space of gene expression measurements obtained by DNA microarrays impedes the detection of underlying patterns in gene expression data and the identification of discriminatory genes. In this paper we show the use of projection methods such as principal components analysis (PCA) to obtain a direct link between patterns in the genes and patterns in samples. This feature is useful in the initial interactive pattern exploration of gene expression data and data-driven learning of the nature and types of samples. Using oligonucleotide microarray measurements of 40 samples from different normal human tissues, we show that distinct patterns are obtained when the genes are projected on a two-dimensional plane spanned by the loadings of the two major principal components. These patterns define the particular genes associated with a sample class (i.e., tissue). When used separately from the other genes, these class-specific (i.e., tissue-specific) genes in turn define distinct tissue patterns in the projection space spanned by the scores of the two major principal components. In this study, PCA projection facilitated discriminatory gene selection for different tissues and identified tissue-specific gene expression signatures for liver, skeletal muscle, and brain samples. Furthermore, it allowed the classification of nine new samples belonging to these three types using the linear combination of the expression levels of the tissue-specific genes determined from the first set of samples. The application of the technique to other published data sets is also discussed.[Online supplementary material available atwww.genome.org.]

Từ khóa


Tài liệu tham khảo

10.1038/35000501

10.1073/pnas.97.18.10101

10.1073/pnas.97.1.262

Dillon W.R. Goldstein M. (1984) Multivariate Analysis. (John Wiley & Sons, New York), pp 23–52.

10.1073/pnas.95.25.14863

10.1126/science.286.5439.531

10.1073/pnas.150242097

10.1152/physiolgenomics.00040.2001

10.1016/S0092-8674(00)00015-5

Kamimura R.T. (1997) ‘Application of multivariate statistics to fermentation database mining.‘ Ph.D. thesis (Massachusetts Institute of Technology, Cambridge).

10.1016/0378-1119(86)90103-4

McNally, 1994, The interaction of β(2) glycoprotein-I and heparin and its effect on β(2) glycoprotein-I antiphospholipid antibody cofactor function in plasma., Thromb. Haemost., 72, 578, 10.1055/s-0038-1648918

10.1038/7675

10.1073/pnas.96.16.9212

Rannar, 1998, Adaptive batch monitoring using hierarchical PCA., Chemomet. Intell. Lab. Sys., 41, 73, 10.1016/S0169-7439(98)00024-0

Spellman, 1998, Comprehensive identification of cell cycle–regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization., Mol. Biol. Cell, 9, 3273, 10.1091/mbc.9.12.3273

Stephanopoulos, G., Hwang, D., Schmitt, W.A., Misra, J., and Stephanopoulos, G., 2002. Mapping physiological states from microarray expression measurements. Bioinformatics (in press)..

10.1073/pnas.96.6.2907

Vander A.J. Sherman J.H. Luciano D.H. (1994) Human Physiology. pp 454–457, and pp. 308–312. McGraw-Hill, New York.