The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo

Genome Biology - Tập 7 - Trang 1-16 - 2006
Richard Bonneau1,2, David J Reiss3, Paul Shannon3, Marc Facciotti3, Leroy Hood3, Nitin S Baliga3, Vesteinn Thorsson3
1New York University, Biology Department, Center for Comparative Functional Genomics, New York, USA
2Courant Institute, NYU Department of Computer Science, New York, USA
3Institute for Systems Biology, Seattle, USA

Tóm tắt

We present a method (the Inferelator) for deriving genome-wide transcriptional regulatory interactions, and apply the method to predict a large portion of the regulatory network of the archaeon Halobacterium NRC-1. The Inferelator uses regression and variable selection to identify transcriptional influences on genes based on the integration of genome annotation and expression data. The learned network successfully predicted Halobacterium's global expression under novel perturbations with predictive power similar to that seen over training data. Several specific regulatory predictions were experimentally tested and verified.

Tài liệu tham khảo

Herrgard MJ, Covert MW, Palsson BO: Reconstruction of microbial transcriptional regulatory networks. Curr Opin Biotechnol. 2004, 15: 70-77. De Jong H: Modeling and simulation of genetic regulatory systems: a literature review. J Comput Biol. 2002, 9: 67-103. Alm E, Arkin AP: Biological networks. Curr Opin Struct Biol. 2003, 13: 193-202. Hashimoto RF, Kim S, Shmulevich I, Zhang W, Bittner ML, Dougherty ER: Growing genetic regulatory networks from seed genes. Bioinformatics. 2004, 20: 1241-1247. Shmulevich I, Kauffman SA: Activities and sensitivities in Boolean network models. Phys Rev Lett. 2004, 93: 048701- Friedman N: Probabilistic models for identifying regulation networks. Bioinformatics. 2003, II57-Suppl 2 Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F, Gordon DB, Fraenkel E, Jaakkola TS, Young RA, Gifford DK: Computational discovery of gene modules and regulatory networks. Nat Biotechnol. 2003, 21: 1337-1342. Segal E, Taskar B, Gasch A, Friedman N, Koller D: Rich probabilistic models for gene expression. Bioinformatics. 2001, 17 (Suppl 1): S243-S252. Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003, 302: 249-255. Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet. 2003, 34: 166-176. Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. J Comput Biol. 2000, 7: 601-620. van Someren EP, Wessels LF, Reinders MJ: Linear modeling of genetic networks from experimental data. Proc Int Conf Intell Syst Mol Biol. 2000, 8: 355-366. van Someren EP, Wessels LF, Backer E, Reinders MJ: Genetic network modeling. Pharmacogenomics. 2002, 3: 507-525. Weaver DC, Workman CT, Stormo GD: Modeling regulatory networks with weight matrices. Pac Symp Biocomput. 1999, 112-123. D'Haeseleer P, Wen X, Fuhrman S, Somogyi R: Linear modeling of mRNA expression levels during CNS development and injury. Pac Symp Biocomput. 1999, 41-52. Cheng Y, Church GM: Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol. 2000, 8: 93-103. Kluger Y, Basri R, Chang JT, Gerstein M: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 2003, 13: 703-716. Sheng Q, Moreau Y, De Moor B: Biclustering microarray data by Gibbs sampling. Bioinformatics. 2003, 19 (Suppl 2): II196-II205. Tanay A, Sharan R, Kupiec M, Shamir R: Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc Natl Acad Sci USA. 2004, 101: 2981-2986. Tanay A, Sharan R, Shamir R: Discovering statistically significant biclusters in gene expression data. Bioinformatics. 2002, 18 (Suppl 1): S136-S144. Yang J, Wang W, Wang H, Yu P: [delta]-clusters: capturing subspace correlation in a large data set. 3rd IEEE International Symposium on BioInformatics and BioEngineering. 2002, 517-528. Yang J, Wang H, Wang W, Yu P: Enhanced biclustering on expression data. Third IEEE Symposium on BioInformatics and BioEngineering: March 10-12 2003; Bethesda. 2003, 321-327. Kanehisa M: The KEGG database. Novartis Found Symp. 2002, 247: 91-101. discussion 101-103, 119-128, 244-152. Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D: Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol. 2004, 5: R35- Mellor JC, Yanai I, Clodfelter KH, Mintseris J, DeLisi C: Predictome: a database of putative functional links between proteins,. Nucleic Acids Res. 2002, 30: 306-309. Price MN, Arkin AP, Alm EJ: OpWise: operons aid the identification of differentially expressed genes in bacterial microarray experiments. BMC Bioinformatics. 2006, 7: 19- Thorsson V, Hörnquist M, Siegel AF, Hood L: Reverse engineering galactose regulation in yeast through model selection. Stat Appl Genet Mol Biol. 2005, 1: Article 28- Bonneau R, Baliga NS, Deutsch EW, Shannon P, Hood L: Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1. Genome Biol. 2004, 5: R52- Ng WV, Kennedy SP, Mahairas GG, Berquist B, Pan M, Shukla HD, Lasky SR, Baliga NS, Thorsson V, Sbrogna J, et al: Genome sequence of Halobacterium species NRC-1. Proc Natl Acad Sci USA. 2000, 97: 12176-12181. Baliga NS, Pan M, Goo YA, Yi EC, Goodlett DR, Dimitrov K, Shannon P, Aebersold R, Ng WV, Hood L: Coordinate regulation of energy transduction modules in Halobacterium sp. analyzed by a global systems approach. Proc Natl Acad Sci USA. 2002, 99: 14913-14918. Baliga NS, Bjork SJ, Bonneau R, Pan M, Iloanusi C, Kottemann MC, Hood L, DiRuggiero J: Systems level insights into the stress response to UV radiation in the halophilic archaeon Halobacterium NRC-1. Genome Res. 2004, 14: 1025-1035. Ideker T, Thorsson V, Siegel AF, Hood LE: Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. J Comput Biol. 2000, 7: 805-817. The Inferelator Cytoscape web start. [http://halo.systemsbiology.net/inferelator] Shannon P, Reiss DJ, Bonneau R, Baliga NS: The Gaggle: a system for intergating bioinformatics and computational biology software and data sources. BMC Bioinformatics. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, et al: The Pfam protein families database. Nucleic Acids Res. 2004, D138-D141. 32 Database Ettema TJ, Huynen MA, de Vos WM, van der Oost J: TRASH: a novel metal-binding domain predicted to be involved in heavy-metal sensing, trafficking and resistance. Trends Biochem Sci. 2003, 28: 170-173. Hill PJ, Cockayne A, Landers P, Morrissey JA, Sims CM, Williams P: SirR, a novel iron-dependent repressor in Staphylococcus epidermidis. Infect Immun. 1998, 66: 4123-4129. Que Q, Helmann JD: Manganese homeostasis in Bacillus subtilis is regulated by MntR, a bifunctional regulator related to the diphtheria toxin repressor family of proteins. Mol Microbiol. 2000, 35: 1454-1468. Baliga NS, Goo YA, Ng WV, Hood L, Daniels CJ, DasSarma S: Is gene expression in Halobacterium NRC-1 regulated by multiple TBP and TFB transcription factors?. Mol Microbiol. 2000, 36: 1184-1185. Shen-Orr SS, Milo R, Mangan S, Alon U: Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 2002, 31: 64-68. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, et al: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. 2002, 298: 799-804. Shmulevich I, Lahdesmaki H, Dougherty ER, Astola J, Zhang W: The role of certain Post classes in Boolean network models of genetic networks. Proc Natl Acad Sci USA. 2003, 100: 10734-10739. Barabasi AL, Albert R: Emergence of scaling in random networks. Science. 1999, 286: 509-512. Wuchty S, Oltvai ZN, Barabasi AL: Evolutionary conservation of motif constituents in the yeast protein interaction network. Nat Genet. 2003, 35: 176-179. Dunn WB, Bailey NJ, Johnson HE: Measuring the metabolome: current analytical technologies. Analyst. 2005, 130: 606-625. Wahde M, Hertz J: Modeling genetic regulatory dynamics in neural development. J Comput Biol. 2001, 8: 429-442. von Dassow G, Meir E, Munro EM, Odell GM: The segment polarity network is a robust developmental module. Nature. 2000, 406: 188-192. Efron B, Johnstone I, Hastie T, Tibshirani R: Least angle regression. Ann Stat. 2003, 32: 407-499. Clementi C, Nymeyer H, Onuchic JN: Topological and energetic factors: what determines the structural details of the transition state ensemble and 'en-route' intermediates for protein folding? An investigation for small globular proteins. J Mol Biol. 2000, 298: 937-953. Bernstein JA, Khodursky AB, Lin PH, Lin-Chao S, Cohen SN: Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays. Proc Natl Acad Sci USA. 2002, 99: 9697-9702. Das D, Banerjee N, Zhang MQ: Interacting models of cooperative gene regulation. Proc Natl Acad Sci USA. 2004, 101: 16234-16239. Hastie T, Tibshirani R, Friedman JH: The Elements of Statistical Learning. 2001, New York: Springer-Verlag Tibshirani R: Regression shrinkage and selection via the lasso. J Royal Statist Soc B. 1996, 58: 267-288. Gustafsson M, Hornquist M, Lombardi A: Constructing and analyzing a large-scale gene-to-gene regulatory network. IEEE/ACM Trans Comput Biol. 2005, 2: 254-261. Halobacterium research at the institute for systems biology. [http://halo.systemsbiology.net] Cytoscape. [http://cytoscape.org] R-Project. [http://www.r-project.org/]