Detecting intergene correlation changes in microarray analysis: a new approach to gene selection
Tóm tắt
Microarray technology is commonly used as a simple screening tool with a focus on selecting genes that exhibit extremely large differential expressions between different phenotypes. It lacks the ability to select genes that change their relationships with other genes in different biological conditions (differentially correlated genes). We intend to enrich the above procedure by proposing a nonparametric selection procedure that selects differentially correlated genes. Using both simulations and resampling techniques, we found that our procedure correctly detected genes that were not differentially expressed but differentially correlated. We also applied our procedure to a set of biological data and found some potentially important genes that were not selected by the traditional method. Microarray technology yields multidimensional information on the function of the whole genome. Rather than treating intergene correlation as a nuisance to the traditional gene selection procedures which are essentially univariate, our method utilizes the rich information contained in the correlation as a new selection criterion. It can provide additional useful candidate genes for the biologists.
Tài liệu tham khảo
Dudoit S, Shaffer J, Boldrick J: Multiple hypothesis testing in microarray experiments. Statistical Science 2003, 18: 71–103.
Simon RM, Korn EL, McShane LM, Radmacher MD, Wright GW, Zhao Y: Design and Analysis of DNA Microarray Investigations. Springer Verlag; 2003.
Klebanov L, Jordan C, Yakovlev A: A new type of stochastic dependence revealed in gene expression data. Stat Appl Genet Mol Biol 2006, 5: Article7.
Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ: Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA 2005, 102(38):13544–13549.
Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing JR: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 2002, 1(2):133–143.
Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, Watson SJ, Meng F: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 2005, 33(20):e175.
Szabo A, Boucher K, Carroll W, Klebanov L, Tsodikov A, Yakovlev A: Variable selection and pattern recognition with gene expression data generated by the microarray technology. Mathematical Biosciences 2002, 176: 71–98.
Szabo A, Boucher K, Jones D, Tsodikov AD, Klebanov LB, Yakovlev AY: Multivariate exploratory tools for microarray data analysis. Biostatistics 2003, 4(4):555–567.
Xiao Y, Frisina R, Gordon A, Klebanov L, Yakovlev A: Multivariate search for differentially expressed gene combinations. BMC Bioinformatics 2004, 5: 164.
Klebanov L, Gordon A, Xiao Y, Land H, Yakovlev A: A permutation test motivated by microarray data analysis. Computational Statistics and Data Analysis 2005.
Gordon A, Glazko G, Qiu X, Yakovlev A: Control of the Mean Number of False Discoveries, Bonferroni, and Stability of Multiple Testing. The Annals of Applied Statistics 2007, 1: 179–190.
Jaeger J, Sengupta R, Ruzzo WL: Improved gene selection for classification of microarrays. Pac Symp Biocomput 2003, 53–64.
Goeman JJ, Geer SA, de Kort F, van Houwelingen HC: A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 2004, 20: 93–99.
Geman D, d'Avignon C, Naiman DQ, Winslow RL: Classifying gene expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol 2004, 3: Article19.
Lai Y, Wu B, Chen L, Zhao H: A statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics 2004, 20(17):3146–3155.
Shedden K, Taylor J: Differential correlation detects complex associations between gene expression and clinical outcomes in lung adenocarcinomas. Methods of Microarray Data Analysis IV 2005, 121–131.
Laan MJ, Birkner MD, Hubbard AE: Empirical Bayes and resampling based multiple testing procedure controlling tail probability of the proportion of false positives. Stat Appl Genet Mol Biol 2005, 4: Article29.
Lu Y, Liu P, Xiao P, Deng H: Hotelling's T 2 multivariate profiling for detecting differential expression in microarrays. Bioinformatics 2005, 21(14):3105–3113.
Qiu X, Brooks AI, Klebanov L, Yakovlev A: The effects of normalization on the correlation structure of microarray data. BMC Bioinformatics 2005, 6: 120.
Qiu X, Klebanov L, Yakovlev A: Correlation between gene expression levels and limitations of the empirical bayes methodology for finding differentially expressed genes. Statistical Applications in Genetics and Molecular Biology 2005, 4: 34.
Almudevar A, Klebanov LB, Qiu X, Salzman P, Yakovlev AY: Utility of correlation measures in analysis of gene expression. NeuroRx 2006, 3(3):384–395.
Qiu X, Yakovlev A: Some comments on instability of false discovery rate estimation. J Bioinform Comput Biol 2006, 4(5):1057–1068.
Klebanov L, Qiu X, Yakovlev A: Testing differential expression in non-overlapping gene pairs: A new perspective for the empirical Bayes method. Journal of Bioinformatics and Computational Biology 2008, 6: 301–316.
Klebanov L, Yakovlev A: Diverse correlation structures in gene expression data and their utility in improving statistical inference. Annals of Applied Statistics 2008, 1(2):538–559.
Klebanov L, Glazko G, Salzman P, Yakovlev A, Xiao Y: A multivariate extension of the gene set enrichment analysis. J Bioinform Comput Biol 2007, 5(5):1139–1153.
Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998, 95(25):14863–14868.