MRHCA: a nonparametric statistics based method for hub and co-expression module identification in large gene co-expression network

Quantitative Biology - Tập 6 - Trang 40-55 - 2018
Yu Zhang1, Sha Cao2, Jing Zhao3, Burair Alsaihati4, Qin Ma5, Chi Zhang6
1Colleges of Computer Science and Technology, Jilin University, Changchun, China
2Department of Biostatistics, Indiana University School of Medicine, Indianapolis, USA
3Center for Health Outcomes and Population Research, Sanford Research, Sioux Falls, USA
4Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens, USA
5Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, USA
6Center for Computational Biology and Bioinformatics and Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, USA

Tóm tắt

Gene co-expression and differential co-expression analysis has been increasingly used to study cofunctional and co-regulatory biological mechanisms from large scale transcriptomics data sets. In this study, we develop a nonparametric approach to identify hub genes and modules in a large coexpression network with low computational and memory cost, namely MRHCA. We have applied the method to simulated transcriptomics data sets and demonstrated MRHCA can accurately identify hub genes and estimate size of co-expression modules. With applying MRHCA and differential coexpression analysis to E. coli and TCGA cancer data, we have identified significant condition specific activated genes in E. coli and distinct gene expression regulatory mechanisms between the cancer types with high copy number variation and small somatic mutations. Our analysis has demonstrated MRHCA can (i) deal with large association networks, (ii) rigorously assess statistical significance for hubs and module sizes, (iii) identify co-expression modules with low associations, (iv) detect small and significant modules, and (v) allow genes to be present in more than one modules, compared with existing methods.

Tài liệu tham khảo

Serin, E. A., Nijveen, H., Hilhorst, H. W. and Ligterink, W. (2016) Learning from co-expression networks: possibilities and challenges. Front. Plant Sci., 7, 444 Michalak, P. (2008) Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes. Genomics, 91, 243–248 Obayashi, T. and Kinoshita, K. (2009) Rank of correlation coefficient as a comparable measure for biological significance of gene coexpression. DNA Res., 16, 249–260 van Dam, S., Võsa, U., van der Graaf, A., Franke, L. and de Magalhães, J. P. (2017) Gene co-expression analysis for functional classification and gene-disease predictions. Brief. Bioinform., bbw139 Chen, J., Ma, M., Shen, N., Xi, J. J. and Tian, W. (2013) Integration of cancer gene co-expression network and metabolic network to uncover potential cancer drug targets. J. Proteome Res., 12, 2354–2364 Zhang, C., Liu, C., Cao, S. and Xu, Y. (2015) Elucidation of drivers of high-level production of lactates throughout a cancer development. J. Mol. Cell Biol., 7, 267–279 Langfelder, P. and Horvath, S. (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9, 559 Perkins, A. D. and Langston, M. A. (2009) Threshold selection in gene co-expression networks using spectral graph theory techniques. BMC Bioinformatics, 10, S4 Ruan, J., Dean, A. K. and Zhang, W. (2010) A general coexpression network-based approach to gene expression analysis: comparison and applications. BMC Syst. Biol., 4, 8 Li, B., Zhang, Y., Yu, Y., Wang, P., Wang, Y., Wang, Z. and Wang, Y. (2015) Quantitative assessment of gene expression network module-validation methods. Sci. Rep., 5, 15258 Zhang, C. S.T., Cao, S., Xu, Y. (2016) Autophagy in Cancer Cells vs. Cancer Tissues: Two Different Stories. In Targeting Autophagy in Cancer Therapy. Yang, J.-M. Ed. Swedish: Springer Song, W. M. and Zhang, B. (2015) Multiscale embedded gene coexpression network analysis. PLoS Comput. Biol., 11, e1004574 Qin, X., Dai, W., Jiao, P., Wang, W. and Yuan, N. (2016) A multisimilarity spectral clustering method for community detection in dynamic networks. Sci. Rep., 6, 31454 Okamura, Y., Aoki, Y., Obayashi, T., Tadaka, S., Ito, S., Narise, T. and Kinoshita, K. (2015) COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems. Nucleic Acids Res., 43, D82–D86 Song, L., Langfelder, P. and Horvath, S. (2012) Comparison of coexpression measures: mutual information, correlation, and model based indices. BMC Bioinformatics, 13, 328 Kumari, S., Nie, J., Chen, H. S., Ma, H., Stewart, R., Li, X., Lu, M. Z., Taylor, W. M. and Wei, H. (2012) Evaluation of gene association methods for coexpression network construction and biological knowledge discovery. PLoS One, 7, e50411 Ding, Z., Zhang, X., Sun, D. and Luo, B. (2016) Overlapping community detection based on network decomposition. Sci. Rep., 6, 24115 Jaenisch, R. and Bird, A. (2003) Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat. Genet., 33, 245–254 Day, D. A. and Tuite, M. F. (1998) Post-transcriptional gene regulatory mechanisms in eukaryotes: an overview. J. Endocrinol., 157, 361–371 Ma, Q., Yin, Y., Schell, M. A., Zhang, H., Li, G. and Xu, Y. (2013) Computational analyses of transcriptomic data reveal the dynamic organization of the Escherichia coli chromosome under different conditions. Nucleic Acids Res., 41, 5594–5603 Faith, J. J., Driscoll, M. E., Fusaro, V. A., Cosgrove, E. J., Hayete, B., Juhn, F. S., Schneider, S. J. and Gardner, T. S. (2008) Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic Acids Res., 36, D866–D870 Gleason, J. E., Corrigan, D. J., Cox, J. E., Reddi, A. R., McGinnis, L. A. and Culotta, V. C. (2011) Analysis of hypoxia and hypoxialike states through metabolite profiling. PLoS One, 6, e24741 Sengupta, S., Park, S. H., Patel, A., Carn, J., Lee, K. and Kaplan, D. L. (2010) Hypoxia and amino acid supplementation synergistically promote the osteogenesis of human mesenchymal stem cells on silk protein scaffolds. Tissue Eng. Part A, 16, 3623–3634 Darwin, A. J. and Stewart, V. (1995) Expression of the narX, narL, narP, and narQ genes of Escherichia coli K-12: regulation of the regulators. J. Bacteriol., 177, 3865–3869 Filenko, N., Spiro, S., Browning, D. F., Squire, D., Overton, T.W., Cole, J. and Constantinidou, C. (2007) The NsrR regulon of Escherichia coli K-12 includes genes encoding the hybrid cluster protein and the periplasmic, respiratory nitrite reductase. J. Bacteriol., 189, 4410–4417 Hasan, C. M. and Shimizu, K. (2008) Effect of temperature upshift on fermentation and metabolic characteristics in view of gene expressions in Escherichia coli. Microb. Cell Fact., 7, 35 Vemuri, G. N., Altman, E., Sangurdekar, D. P., Khodursky, A. B. and Eiteman, M. A. (2006) Overflow metabolism in Escherichia coli during steady-state growth: transcriptional regulation and effect of the redox ratio. Appl. Environ. Microbiol., 72, 3653–3661 Palatnik, J. F., Valle, E. M. and Carrillo, N. (1997) Oxidative stress causes ferredoxin-NADP+ reductase solubilization from the thylakoid membranes in methyl viologen-treated plants. Plant Physiol., 115, 1721–1727 Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA, 102, 15545–15550 Zack, T. I., Schumacher, S. E., Carter, S. L., Cherniack, A. D., Saksena, G., Tabak, B., Lawrence, M. S., Zhang, C. Z., Wala, J., Mermel, C. H., et al. (2013) Pan-cancer patterns of somatic copy number alteration. Nat. Genet., 45, 1134–1140 Kandoth, C., McLellan, M. D., Vandin, F., Ye, K., Niu, B., Lu, C., Xie, M., Zhang, Q., McMichael, J. F., Wyczalkowski, M. A., et al. (2013) Mutational landscape and significance across 12 major cancer types. Nature, 502, 333–339