TIGRESS: Trustful Inference of Gene REgulation using Stability Selection

BMC Systems Biology - Tập 6 - Trang 1-17 - 2012
Anne-Claire Haury1,2,3, Fantine Mordelet4, Paola Vera-Licona1,2,3, Jean-Philippe Vert1,2,3
1Centre for computational biology, Mines ParisTech, Fontainebleau, France
2Institut Curie, Paris, France
3Inserm, Paris, France
4Department of Computer Science, Duke University, Durham, USA

Tóm tắt

Inferring the structure of gene regulatory networks (GRN) from a collection of gene expression data has many potential applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy. In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection, for that purpose. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (for Trustful Inference of Gene REgulation with Stability Selection), was ranked among the top GRN inference methods in the DREAM5 gene network inference challenge. In particular, TIGRESS was evaluated to be the best linear regression-based method in the challenge. We investigate in depth the influence of the various parameters of the method, and show that a fine parameter tuning can lead to significant improvements and state-of-the-art performance for GRN inference, in both directed and undirected settings. TIGRESS reaches state-of-the-art performance on benchmark data, including both in silico and in vivo (E. coli and S. cerevisiae) networks. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on http://cbio.ensmp.fr/tigress . Moreover, TIGRESS can be run online through the GenePattern platform (GP-DREAM, http://dream.broadinstitute.org ).

Tài liệu tham khảo

Arkin A, Shen P, Ross J: A test case of correlation metric construction of a reaction pathway from measurements. Science. 1997, 277 (5330): 1275-1279. 10.1126/science.277.5330.1275. [http://www.sciencemag.org/cgi/reprint/277/5330/1275.pdf] 10.1126/science.277.5330.1275 Liang S, Fuhrman S, Somogyi R: REVEAL, a general reverse engineering algorithm for inference of genetic network architectures. Pac Symp Biocomput. 1998, 3: 18-29. Chen T, He HL, Church GM: Modeling gene expression with differential equations. Pac Symp Biocomput. 1999, 4: 29-40. Akutsu T, Miyano S, Kuhara S: Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function. J Comput Biol. 2000, 7 (3-4): 331-343. 10.1089/106652700750050817. Yeung MKS, Tegnér J, Collins JJ: Reverse engineering gene networks using singular value decomposition and robust regression. Proc Natl Acad Sci USA. 2002, 99 (9): 6163-6168. 10.1073/pnas.092576199. [http://www.pnas.org/content/99/9/6163.abstract] 10.1073/pnas.092576199 Tegner J, Yeung MKS, Hasty J, Collins JJ: Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proc Natl Acad Sci USA. 2003, 100 (10): 5944-5949. 10.1073/pnas.0933416100. Gardner TS, Bernardo D, Lorenz D, Collins JJ: Inferring genetic networks and identifying compound mode of action via expression profiling. Science. 2003, 301 (5629): 102-105. 10.1126/science.1081900. Chen KC, Wang TY, Tseng HH, Huang CYF, Kao CY: A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae. Bioinformatics. 2005, 21 (12): 2883-2890. 10.1093/bioinformatics/bti415. Bernardo D, Thompson MJ, Gardner TS, Chobot SE, Eastwood EL, Wojtovich AP, Elliott SJ, Schaus SE, Collins JJ: Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nat Biotechnol. 2005, 23 (3): 377-383. 10.1038/nbt1075. Bansal M, Della Gatta, Bernardo D: Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics. 2006, 22 (7): 815-822. 10.1093/bioinformatics/btl003. Zoppoli P, Morganella S, Ceccarelli M: TimeDelay-ARACNE: Reverse engineering of gene networks from time-course data by an information theoretic approach. BMC Bioinformatics. 2010, 11: 154-10.1186/1471-2105-11-154. Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS: Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci USA. 2000, 97 (22): 12182-12186. 10.1073/pnas.220392197. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular contexts. BMC Bioinformatics. 2006, 7 Suppl 1: S7-10.1186/1471-2105-7-S1-S7. Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS: Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007, 5: e8-10.1371/journal.pbio.0050008. Rice J, Tu Y, Stolovitzky G: Reconstructing biological networks using conditional correlation analysis. Bioinformatics. 2005, 21 (6): 765-773. 10.1093/bioinformatics/bti064. Friedman N, Linial M, Nachman I, Pe’er D: Using Bayesian networks to analyze expression data. J Comput Biol. 2000, 7 (3-4): 601-620. 10.1089/106652700750050961. Hartemink A, Gifford D, Jaakkola T, Young R: Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. Proceedings of the Pacific Symposium on Biocomputing 2002. Edited by: Altman RB, Dunker AK, Hunter L, Lauerdale K, Klein TE. 2002, World Scientific, 422-433. [http://helix-web.stanford.edu/psb01/abstracts/p422.html] Perrin B, Ralaivola L, Mazurie A, Bottani S, Mallet J, d’Alche Buc F: Gene networks inference using dynamic Bayesian networks. Bioinformatics. 2003, 19 (suppl 2): ii138-ii148. 10.1093/bioinformatics/btg1071. Friedman N: Inferring cellular networks using probabilistic graphical models. Science. 2004, 303 (5659): 799-10.1126/science.1094068. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P: Inferring regulatory networks from expression data using tree-based methods. PLoS One. 2010, 5 (9): e12776-10.1371/journal.pone.0012776. Markowetz F, Spang R: Inferring cellular networks - a review. BMC Bioinformatics. 2007, 8 (Suppl 6): S5-10.1186/1471-2105-8-S6-S5. [http://www.biomedcentral.com/1471-2105/8/S6/S5] 10.1186/1471-2105-8-S6-S5 Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G: Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci USA. 2010, 107 (14): 6286-6291. 10.1073/pnas.0913357107.http://www.pnas.org/content/107/14/6286.abstract, 10.1073/pnas.0913357107 Meinshausen N, Bühlmann P: High dimensional graphs and variable selection with the Lasso. Ann Stat. 2006, 34: 1436-1462. 10.1214/009053606000000281. Efron B, Hastie T, Johnstone I, Tibshirani R: Least angle regression. Ann. Stat. 2004, 32 (2): 407-499. 10.1214/009053604000000067. Bach FR: Bolasso: model consistent Lasso estimation through the bootstrap. Proceedings of theth international conference on Machine learning Volume 308 of ACM International Conference Proceeding Series. Edited by: Cohen WW, McCallum A, Roweis ST. 2008, ACM, New York, NY, USA, 33-40. Meinshausen N, Bühlmann P: Stability selection. J R Stat Soc Ser B. 2010, 72 (4): 417-473. 10.1111/j.1467-9868.2010.00740.x. Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1996, 58: 267-288. Marbach D, Costello J, Küffner R, Vega N, Prill R, Camacho D, Allison K, Kellis M, Collins J, Stolovitzky G, the DREAM5 Consortium: Wisdom of crowds for robust gene network inference. Nat Methods. 2012, 9 (8): 796-804. 10.1038/nmeth.2016. Breiman L: Random forests. Mach Learn. 2001, 45: 5-32. 10.1023/A:1010933404324. Weisberg S: Applied linear regression. 1981, New-York, Wiley Hastie T, Tibshirani R, Friedman J: The elements of statistical learning: data mining, inference, and prediction. 2001 Mairal J, Bach F, Ponce J, Sapiro G: Online Learning for Matrix Factorization and Sparse Coding. J Mach Learn Res. 2010, 11: 19-60. [http://jmlr.csail.mit.edu/papers/v11/mairal10a.html] Schaffter T, Marbach D, Floreano D: GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics. 2011, 27 (16): 2263-2270. 10.1093/bioinformatics/btr373. [http://bioinformatics.oxfordjournals.org/content/27/16/2263.abstract] 10.1093/bioinformatics/btr373 Marbach D, Schaffter T, Mattiussi C, Floreano D: Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J Comput Biol. 2009, 16 (2): 229-239. 10.1089/cmb.2008.09TT. [http://online.liebertpub.com/doi/abs/10.1089/cmb.2008.09TT] 10.1089/cmb.2008.09TT Faith J, Driscoll M, Fusaro V, Cosgrove E, Hayete B, Juhn F, Schneider S, Gardner T: Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic Acids Res. 2008, 36 (Database issue): D866—D870-10.1093/nar/gkm815. Gama-Castro S, Salgado H, Peralta-Gil M, Santos-Zavaleta A, Muñiz-Rascado L, Solano-Lira H, Jimenez-Jacinto V, Weiss V, García-Sotelo JS, López-Fuentes A, Porrón-Sotelo L, Alquicira-Hernández S, Medina-Rivera A, Martínez-Flores I, Alquicira-Hernández K, Martínez-Adame R, Bonavides-Martínez C, Miranda-Ríos J, Huerta AM, Mendoza-Vargas A, Collado-Torres L, Taboada B, Vega-Alvarado L, Olvera M, Olvera L, Grande R, Morett E, Collado-Vides J: RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units). Nucleic Acids Res. 2011, 39 (suppl 1): D98—D105-[http://nar.oxfordjournals.org/content/39/suppl_1/D98.abstract] Küffner R, Petri T, Tavakkolkhah P, Windhager L, Zimmer R: Inferring gene regulatory networks by ANOVA. Bioinformatics. 2012, 28 (10): 1376-1382. 10.1093/bioinformatics/bts143. Mordelet F, Vert JP: SIRENE: Supervised inference of regulatory networks. Bioinformatics. 2008, 24 (16): i76—i82-10.1093/bioinformatics/btn273.