Metrics to estimate differential co-expression networks
Tóm tắt
Detecting the differences in gene expression data is important for understanding the underlying molecular mechanisms. Although the differentially expressed genes are a large component, differences in correlation are becoming an interesting approach to achieving deeper insights. However, diverse metrics have been used to detect differential correlation, making selection and use of a single metric difficult. In addition, available implementations are metric-specific, complicating their use in different contexts. Moreover, because the analyses in the literature have been performed on real data, there are uncertainties regarding the performance of metrics and procedures. In this work, we compare four novel and two previously proposed metrics to detect differential correlations. We generated well-controlled datasets into which differences in correlations were carefully introduced by controlled multivariate normal correlation networks and addition of noise. The comparisons were performed on three datasets derived from real tumor data. Our results show that metrics differ in their detection performance and computational time. No single metric was the best in all datasets, but trends show that three metrics are highly correlated and are very good candidates for real data analysis. In contrast, other metrics proposed in the literature seem to show low performance and different detections. Overall, our results suggest that metrics that do not filter correlations perform better. We also show an additional analysis of TCGA breast cancer subtypes. We show a methodology to generate controlled datasets for the objective evaluation of differential correlation pipelines, and compare the performance of several metrics. We implemented in R a package called DifCoNet that can provide easy-to-use functions for differential correlation analyses.
Tài liệu tham khảo
Kim SY, Lee JW, Sohn IS. Comparison of various statistical methods for identifying differential gene expression in replicated microarray data. Stat Methods Med Res . 2006;15:3–20. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/16477945
.
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23:2507–2517. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/17720704
.
citation_journal_title=Genome Biol; citation_title=Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data; citation_author=F Rapaport, R Khanin, Y Liang, M Pirun, A Krek, P Zumbo; citation_volume=14; citation_publication_date=2013; citation_pages=R95; citation_doi=10.1186/gb-2013-14-9-r95; citation_id=CR3
citation_journal_title=Adv. Bioinformatics; citation_title=A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data; citation_author=ZM Hira, DF Gillies, ZM Hira, DF Gillies; citation_volume=2015; citation_publication_date=2015; citation_pages=1-13; citation_doi=10.1155/2015/198363; citation_id=CR4
Leek JT, Storey JD. Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis. PLoS Genet. 2007 [cited 2017 may 5];3:e161. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/17907809
.
Le Novère N. Quantitative and logic modelling of molecular and gene networks. . Nat. Rev. genet. [internet] Nat Publ Group; 2015;16:146–158. Available from:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4604653&tool=pmcentrez&rendertype=abstract
Ramos-Rodriguez R-R, Cuevas-Diaz-Duran R, Falciani F, Tamez-Peña J-G, Trevino V. COMPADRE: an R and web resource for pathway activity analysis by component decompositions. Bioinformatics. 2012 [cited 2014 Jan 21];28:2701–2702. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/22923303
.
citation_journal_title=BMC Bioinformatics; citation_title=CoXpress: differential co-expression in gene expression data; citation_author=M Watson; citation_volume=7; citation_publication_date=2006; citation_pages=509; citation_doi=10.1186/1471-2105-7-509; citation_id=CR8
de la Fuente A. From “differential expression” to “differential networking” - identification of dysfunctional regulatory networks in diseases. Trends Genet 2010;26:326–333.
Gambardella G, Moretti MN, de Cegli R, Cardone L, Peron A, di Bernardo D. Differential network analysis for the identification of condition-specific pathway activity and regulation. Bioinformatics [Internet] 2013;29:1776–1785. Available from:
http://bioinformatics.oxfordjournals.org/content/29/14/1776.short
McKenzie AT, Katsyv I, Song W-M, Wang M, Zhang B. DGCA: A comprehensive R package for Differential Gene Correlation Analysis. BMC Syst Biol. 2016;10:106. Available from:
https://www.ncbi.nlm.nih.gov/pubmed/27846853
Lareau CA, White BC, Oberg AL, Mckinney BA. Differential co-expression network centrality and machine learning feature selection for identifying susceptibility hubs in networks with scale-free structure. BioData Min. 2015:1–17.
citation_journal_title=Nat Publ Group; citation_title=High-throughput discovery of novel developmental phenotypes. Nature [internet]; citation_author=ME Dickinson, AM Flenniken, X Ji, L Teboul, MD Wong, JK White; citation_volume=537; citation_publication_date=2016; citation_pages=508-514; citation_id=CR13
citation_journal_title=BMC Bioinformatics; citation_title=Identifying differential correlation in gene/pathway combinations; citation_author=R Braun, L Cope, G Parmigiani; citation_volume=9; citation_publication_date=2008; citation_pages=488; citation_doi=10.1186/1471-2105-9-488; citation_id=CR14
citation_journal_title=BMC Bioinformatics; citation_title=DiffCoEx: a simple and sensitive method to find differentially coexpressed gene modules; citation_author=BM Tesson, R Breitling, RC Jansen; citation_volume=11; citation_publication_date=2010; citation_pages=497; citation_doi=10.1186/1471-2105-11-497; citation_id=CR15
Anglani R, Creanza TM, Liuzzi VC, Piepoli A, Panza A, Andriulli A, et al. Loss of connectivity in cancer co-expression networks. PLoS One. 2014 [cited 2015 Mar 10];9:e87075. Available from:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3904972&tool=pmcentrez&rendertype=abstract
Fukushima A. DiffCorr: An R package to analyze and visualize differential correlations in biological networks. Gene. Elsevier B.V.; 2013;518:209–214. Available from:
https://doi.org/10.1016/j.gene.2012.11.028
Kanji GK. 100 statistical tests. 3rd ed. London: SAGE Publications.
Kasturi J, Acharya R, Ramanathan M. An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics. 2003 [cited 2017 Apr 15];19:449–58. Available from;
http://www.ncbi.nlm.nih.gov/pubmed/12611799
Ross BC. Mutual Information between Discrete and Continuous Data Sets. Marinazzo D, editor. PLoS One. 2014 [cited 2017 Apr 15];9:e87357. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/24586270
.
citation_title=Identifying the Molecular Components that Matter: A Statistical Modelling Approach to Linking Functional Genomics Data to Cell Physiology; citation_inbook_title=PhD Thesis; citation_publication_date=2007; citation_id=CR21; citation_author=V Trevino; citation_publisher=University of Birmingham
Mi X, Miwa T, Hothorn T. mvtnorm: New Numerical Algorithm for Multivariate Normal Probabilities. R J. 2009 [cited 2017 Apr 15];1. Available from:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.650.5630
Ortega F, Sameith K, Turan N, Compton R, Trevino V, Vannucci M, et al. Models and computational strategies linking physiological response to molecular networks from large-scale data. Philos. Trans. A. Math. Phys. Eng. Sci. 2008 [cited 2014 Feb 8];366:3067–3089. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/18559319
.
citation_journal_title=Proc. Natl. Acad. Sci. U. S. A; citation_title=Statistical significance for genomewide studies; citation_author=JD Storey, R Tibshirani; citation_volume=100; citation_publication_date=2003; citation_pages=9440-9445; citation_doi=10.1073/pnas.1530509100; citation_id=CR24
citation_journal_title=Proc. Natl. Acad. Sci. U. S. A; citation_title=Significance analysis of microarrays applied to the ionizing radiation response; citation_author=VG Tusher, R Tibshirani, G Chu; citation_volume=98; citation_publication_date=2001; citation_pages=5116-5121; citation_doi=10.1073/pnas.091062498; citation_id=CR25
citation_journal_title=J R Stat. Soc; citation_title=Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing; citation_author=Y Benjamini, Y Hochberg; citation_volume=57; citation_publication_date=1995; citation_pages=289-300; citation_id=CR26
Parker JS, Mullins M, Cheang MCU, Leung S, Voduc D, Vickery T, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 2009 [cited 2014 Jan 22];27:1160–1167. Available from:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2667820&tool=pmcentrez&rendertype=abstract
Huang DW, Sherman BT, Lempicki RA, Sherman BT. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009 [cited 2014 Jan 21];4:44–57. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/19131956
.
Wang K, Wu F, Seo BR, Fischbach C, Chen W, Hsu L, et al. Breast cancer cells alter the dynamics of stromal fibronectin-collagen interactions. Matrix Biol. 2017 [cited 2017 may 24];60–61:86–95. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/27503584
.