Learning directed acyclic graphs from large-scale genomics data

Springer Science and Business Media LLC - Tập 2017 - Trang 1-16 - 2017
Fabio Nikolay1, Marius Pesavento1, George Kritikos2, Nassos Typas2
1Communication Systems Group, TU Darmstadt, Darmstadt, Germany
2European Molecular Biology Laboratory, Heidelberg, Heidelberg, Germany

Tóm tắt

In this paper, we consider the problem of learning the genetic interaction map, i.e., the topology of a directed acyclic graph (DAG) of genetic interactions from noisy double-knockout (DK) data. Based on a set of well-established biological interaction models, we detect and classify the interactions between genes. We propose a novel linear integer optimization program called the Genetic-Interactions-Detector (GENIE) to identify the complex biological dependencies among genes and to compute the DAG topology that matches the DK measurements best. Furthermore, we extend the GENIE program by incorporating genetic interaction profile (GI-profile) data to further enhance the detection performance. In addition, we propose a sequential scalability technique for large sets of genes under study, in order to provide statistically significant results for real measurement data. Finally, we show via numeric simulations that the GENIE program and the GI-profile data extended GENIE (GI-GENIE) program clearly outperform the conventional techniques and present real data results for our proposed sequential scalability technique.

Tài liệu tham khảo

A Shojaie, G Michailidis, Discovering graphical Granger causality using the truncating lasso penalty. 26 ECCB 2010:, i517–i523 (2010). Department of Statistics, University of Michigan, ECCB, Vol.26. A Battle, MC Jonikas, P Walter, JS Weissman, D Koller, Automated identification of pathways from quantitative genetic interaction data. Mol.Syst. Biol. 6:, 379–391 (2010). AHY Tong, et al, Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science. 294:, 2364–2368 (2001). B Snijder, P Liberali, M Frechin, T Stoeger, L Pelkmans, Predicting functional gene interactions with the hierarchical interaction score. Nat. Methods. 10(11), 1089–1094 (2013). A Baryshinkova, et al, Quantitative analysis of fitness and genetic interactions in yeast on a genome scale. Nat. Methods. 7:, 1017–1024 (2010). SR Collins, A Roguev, NJ Krogan, Quantitative genetic interaction mapping using the E-MAP approach. Methods Enzymol. 470:, 205–231 (2010). RO Linden, VP Eronen, T Aittokallio, Quantitative maps of genetic interactions in yeast—comparative evaluation and integrative analysis. BMC Syst. Biol. 5:, 45–58 (2011). SJ Dixon, M Constanzo, A Baryshinkova, B Andrews, C Boone, Systematic mapping of genetic interaction networks. Annu.Rev. Genet. 43:, 601–625 (2009). GN Brock, et al, Methods for detecting gene gene interaction in multiplex extended pedigrees. BMC Genet. 6:, 144–149 (2005). TC Hu, AB Kahng, Linear and integer programming in practice (Springer International Publishing, Schweiz, 2016). ISBN-10: 3319239996. G Sierksma, Linear and integer programming: theory and practice, second edition (CRC Press, Boca Raton, 2001). ISBN-10: 0824706730. G Sierksma, Y Zwols, Linear and integer optimization: theory and practice, third edition (CRC Press, Boca Raton, 2015). ISBN-10: 1498710166. E Demirel, N Demirel, H Gökcen, A mixed integer linear programming model to optimize reverse logistics activities of end-of-life vehicles in Turkey. J. Clean. Prod. 112:, 1813–2144 (2016). CH Antunes, MJ Alves, J Climaco, Multiobjective linear and integer programming (Springer International Publishing, Schweiz, 2016). ISBN-13: 9783319287447. M Diaby, MH Karwan, Advances in combinatorial optimization (World Scientific Publishing Co. Pte. Ltd., Singapore, 2016). ISBN-10: 9814704873. R Diestel, Graphentheorie (Springer-Verlag, Heidelberg, 2012). ISBN 978-3-642-14911-5. A Jaimovich, et al, Modularity and directionality in genetic interaction maps. Nat. Methods. 26:, 38–45 (2010). A Baryshinkova, M Constanzo, CL Myers, B Andrews, C Boone, Genetic interaction networks: toward an understanding of heritability. Annu.Rev. Genomics Hum. Genet. 14:, 111–133 (2013). A Rogueav, et al, Quantitative genetic-interaction mapping in mammalian cells. Nat. Methods. 10:, 432–437 (2013). M Constanzo, et al, The genetic landscape of a cell. Science. 327:, 425–431 (2010). F Nikolay, M Pesavento, Learning directed-acyclic-graphs from large-scale double-knockout experiments (C, Communications System Group, TU Darmstadt, EUSIPCO, 2016). Budapest, August – September 2016. V Balakrishnan, S Boyd, S Balemi, Branch and bound algorithm for computing the minimum stability degree of parameter-dependent linear systems. Int. J. Robust Nonlinear Control. 1(4), 295–317 (1991). EL Lawler, DE Wood, Branch-and-bound methods: a survey. Oper. Res. 14:, 699–719 (1966). RE Moore, Global optimization to prescribed accuracy. Comput. Math. Appl. 21(6/7), 25–39 (1991). Y Cheng, M Pesavento, Joint rate adaptation and downlink beamforming using mixed integer conic programming. IEEE Trans. Signal Process. 63:, 1750–1764 (2013). Y Cheng, M Pesavento, An optimal iterative algorithm for codebook-based downlink beamforming. IEEE Signal Process. Lett. 20:, 775–778 (2013). Y Cheng, M Pesavento, Joint optimization of source power allocation and distributed relay beamforming in multiuser peer-to-peer relay networks. IEEE Trans. Signal Process. 60(6), 2395–2404 (2012). Y Cheng, M Pesavento, A Philipp, Joint network optimization and downlink beamforming for CoMP transmissions using mixed integer conic programming. IEEE Trans. Signal Process. 61:, 3972–3987 (2013). CH Papadimitriou, K Steiglitz, Combinatorial optimization: algorithms and complexity (Dover Publications, Mineola NY, 1998). ISBN 0486402584. Supplementary Material. https://www2.spg.tu-darmstadt.de/fnikolay/supp_journal.pdf. CVX – A Matlab based convex modeling framework. http://cvxr.com. MOSEK Solver. https://www.mosek.com/. M Babu, et al, Quantitative genome-wide genetic interaction screens reveal global epistatic relationships of protein complexes in Escherichia coli. PLoS Genet. 10:, 400–414 (2014). SGD - Saccharomyces genome database. http://www.yeastgenome.org. M Costanzo, et al, DRYGIN - Data repository of yeast genetic interactions. Terence Donnelly Centre for Cellular and Biochemical Research, University of Toronto. http://drygin.ccbr.utoronto.ca/~costanzo2009/x.