A Hypothesis Test for Equality of Bayesian Network Models

Springer Science and Business Media LLC - Tập 2010 - Trang 1-11 - 2010
Anthony Almudevar1
1Department of Computational Biology, University of Rochester, Rochester, USA

Tóm tắt

Bayesian network models are commonly used to model gene expression data. Some applications require a comparison of the network structure of a set of genes between varying phenotypes. In principle, separately fit models can be directly compared, but it is difficult to assign statistical significance to any observed differences. There would therefore be an advantage to the development of a rigorous hypothesis test for homogeneity of network structure. In this paper, a generalized likelihood ratio test based on Bayesian network models is developed, with significance level estimated using permutation replications. In order to be computationally feasible, a number of algorithms are introduced. First, a method for approximating multivariate distributions due to Chow and Liu (1968) is adapted, permitting the polynomial-time calculation of a maximum likelihood Bayesian network with maximum indegree of one. Second, sequential testing principles are applied to the permutation test, allowing significant reduction of computation time while preserving reported error rates used in multiple testing. The method is applied to gene-set analysis, using two sets of experimental data, and some advantage to a pathway modelling approach to this problem is reported.

Tài liệu tham khảo

Dougherty ER, Shmulevich I, Chen J, Wang ZJ: Genomic Signal Processing and Statistics, EURASIP Book Series on Signal Processing and Communications. Volume 2. Hindawi Publishing Corporation, New York, NY, USA; 2005. Shmulevich I, Dougherty ER: Genomic Signal Processing. Princeton University Press, Princeton, NJ, USA; 2007. Emmert-Streib F, Dehmer M: Detecting pathological pathways of a complex disease by a comparitive analysis of networks. In Analysis of Microarray Data: A Network-Based Approach. Edited by: Emmert-Streib F, Dehmer M. Wiley-VCH, Weinheim, Germany; 2008:285-305. Mootha VK, Lindgren CM, Eriksson K-F, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstråle M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC-1 α -responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics 2003, 34(3):267-273. 10.1038/ng1180 Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 2005, 102(43):15545-15550. 10.1073/pnas.0506580102 Subramanian A, Kuehn H, Gould J, Tamayo P, Mesirov JP: GSEA-P: a desktop application for gene set enrichment analysis. Bioinformatics 2007, 23(23):3251-3253. 10.1093/bioinformatics/btm369 Sebastiani P, Abad M, Ramoni MF: Bayesian networks for genomic analysis. In Genomic Signal Processing and Statistics, EURASIP Book Series on Signal Processing and Communications. Edited by: Dougherty ER, Shmulevich I, Chen J, Wang ZJ. Hindawi Publishing Corporation, New York, NY, USA; 2005. Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. Journal of Computational Biology 2000, 7(3-4):601-620. 10.1089/106652700750050961 Needham CJ, Bradford JR, Bulpitt AJ, Westhead DR: A primer on learning in Bayesian networks for computational biology. PLoS Computational Biology 2007, 3(8):e129. 10.1371/journal.pcbi.0030129 Chu T, Glymour C, Scheines R, Spirtes P: A statistical problem for inference to regulatory structure from associations of gene expression measurements with microarrays. Bioinformatics 2003, 19(9):1147-1152. 10.1093/bioinformatics/btg011 Cowell RG, Dawid P, Lauritzen SL, Spiegelhalter DJ: Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks, Information Science and Statistics. Spring, New York, NY, USA; 1999. Cowell RG: Efficient maximum likelihood pedigree reconstruction. Theoretical Population Biology 2009, 76(4):285-291. 10.1016/j.tpb.2009.09.002 Silander T, Myllymki P: A simple approach to finding the globally optimal bayesian network structure. In Proceedings of the 22nd Conference on Artificial intelligence (UAI '06), 2006. Edited by: Dechter R, Richardson T. AUAI Press; 445-452. Chickering DM: Learning Bayesian net- works is NP-complete. In Learning from Data: Artificial Intelligence and Statistics V. Edited by: Fisher D, Lenz H. Springer, New York, NY, USA; 1996:121-130. Chow CK, Liu CN: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory 1968, 14: 462-467. 10.1109/TIT.1968.1054142 Abbeel P, Koller D, Ng AY: Learning factor graphs in polynomial time and sample complexity. Journal of Machine Learning Research 2006, 7: 1743-1788. Murphy K: Software packages for graphical models bayesian networks. Bulletin of the International Society for Bayesian Analysis 2007, 14: 13-15. Teyssier M, Koller D: Ordering-based search: a simple and effective algorithm for learning bayesian networks. Proceedings of the 21st Conference on Uncertainty in AI (UAI '05), 2005 584-590. Papadimitriou CH, Steiglitz K: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ, USA; 1982. Walsh AH: Aspects of Statistical Inference. John Wiley & Sons, New York, NY, USA; 1996. Efron B: Robbins, empirical Bayes and microarrays. Annals of Statistics 2003, 31(2):366-378. 10.1214/aos/1051027871 Besag J, Clifford P: Sequential monte carlo p -values. Biometrika 1991, 78: 301-304. Lock RH: A sequential approximation to a permutation test. Communications in Statistics. Simulation and Computation 1991, 20(1):341-363. 10.1080/03610919108812956 Fay MP, Follmann DA: Designing Monte Carlo implementations of permutation or bootstrap hypothesis tests. American Statistician 2002, 56(1):63-70. 10.1198/000313002753631385 Dudoit S, van der Laan MJ: Multiple Testing Procedures with Applications to Genomics. Springer, New York, NY, USA; 2008. Wald A: Sequential Analysis. John Wiley & Sons, New York, NY, USA; 1947. Siegmund D: Sequential Analysis: Tests and Confidence Intervals. Springer, New York, NY, USA; 1985. Almudevar A: Exact confidence regions for species assignment based on DNA markers. Canadian Journal of Statistics 2000, 28(1):81-95. Zhou X, Kao M-CJ, Wong WH: Transitive functional annotation by shortest-path analysis of gene expression data. Proceedings of the National Academy of Sciences of the United States of America 2002, 99(20):12783-12788. 10.1073/pnas.192159399 Braun R, Cope L, Parmigiani G: Identifying differential correlation in gene/pathway combinations. BMC Bioinformatics 2008., 9: article no. 488 Barry WT, Nobel AB, Wright FA: Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 2005, 21(9):1943-1949. 10.1093/bioinformatics/bti260 Jiang Z, Gentleman R: Extensions to gene set enrichment. Bioinformatics 2007, 23(3):306-313. 10.1093/bioinformatics/btl599 Klebanov L, Glazko G, Salzman P, Yakovlev A, Xiao Y: A multivariate extension of the gene set enrichment analysis. Journal of Bioinformatics and Computational Biology 2007, 5(5):1139-1153. 10.1142/S0219720007003041 Goeman JJ, Bühlmann P: Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 2007, 23(8):980-987. 10.1093/bioinformatics/btm051 Allison DB, Cui X, Page GP, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics 2006, 7(1):55-65. 10.1038/nrg1749 Bild A, Febbo PG: Application of a priori established gene sets to discover biologically important differential expression in microarray data. Proceedings of the National Academy of Sciences of the United States of America 2005, 102(43):15278-15279. 10.1073/pnas.0507477102 Manoli T, Gretz N, Gröne H-J, Kenzelmann M, Eils R, Brors B: Group testing for pathway analysis improves comparability of different microarray datasets. Bioinformatics 2006, 22(20):2500-2506. 10.1093/bioinformatics/btl424 Liu Q, Dinu I, Adewale AJ, Potter JD, Yasui Y: Comparative evaluation of gene-set analysis methods. BMC Bioinformatics 2007., 8: article no. 431 Ackermann M, Strimmer K: A general modular framework for gene set enrichment analysis. BMC Bioinformatics 2009., 10: article no. 47 Efron B, Tibshirani R: On testing the significance of sets of genes. Annals of Applied Statistics 2007, 1: 107-129. 10.1214/07-AOAS101 Goeman JJ, van de Geer S, de Kort F, van Houwellingen HC: A global test for groups fo genes: testing association with a clinical outcome. Bioinformatics 2004, 20(1):93-99. 10.1093/bioinformatics/btg382 Mansmann U, Meister R: Testing differential gene expression in functional groups: goeman's global test versus an ANCOVA approach. Methods of Information in Medicine 2005, 44(3):449-453. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America 2001, 98(9):5116-5121. 10.1073/pnas.091062498 Dinu I, Potter JD, Mueller T, Liu Q, Adewale AJ, Jhangri GS, Einecke G, Famulski KS, Halloran P, Yasui Y: Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics 2007., 8: article 242 Almudevar A: A simulated annealing algorithm for maximum likelihood pedigree reconstruction. Theoretical Population Biology 2003, 63(2):63-75. 10.1016/S0040-5809(02)00048-5