Representing and querying disease networks using graph databases
Tóm tắt
Systems biology experiments generate large volumes of data of multiple modalities and this information presents a challenge for integration due to a mix of complexity together with rich semantics. Here, we describe how graph databases provide a powerful framework for storage, querying and envisioning of biological data. We show how graph databases are well suited for the representation of biological information, which is typically highly connected, semi-structured and unpredictable. We outline an application case that uses the Neo4j graph database for building and querying a prototype network to provide biological context to asthma related genes. Our study suggests that graph databases provide a flexible solution for the integration of multiple types of biological data and facilitate exploratory data mining to support hypothesis generation.
Tài liệu tham khảo
Auffray C, Charron D, Hood L. Predictive, preventive, personalized and participatory medicine: back to the future. Genome Med. 2010;2:57. doi:10.1186/gm178.
Hood L, Tian Q. Systems approaches to biology and disease enable translational systems medicine. Genomics Proteomics Bioinformatics. 2012;10:181–5. doi:10.1016/j.gpb.2012.08.004.
Callahan A, Cruz-Toledo J, Ansell P, Dumontier M. Bio2RDF release 2: improved coverage, interoperability and provenance of life science linked data. In: Cimiano P, Corcho O, Presutti V, et al., editors. Semantic Web Semant. Berlin Heidelberg: Big Data. Springer; 2013. p. 200–12.
Pareja-Tobes P, Tobes R, Manrique M, et al. Bio4j: a high-performance cloud-enabled graph-based data platform. bioRxiv 016758. 2015. doi: http://dx.doi.org/10.1101/016758.
Smoot ME, Ono K, Ruscheinski J, et al. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011;27(3):431–32. doi:110.1093/bioinformatics/btq675.
Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks. ICWSM. 2009;8:361–62.
Schult DA, Swart P. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conferences (SciPy 2008). 2008;2008:11–6.
Birkland A, Yona G. BIOZON: a system for unification, management and analysis of heterogeneous biological data. BMC Bioinformatics. 2006;7(1)1. doi:10.1186/1471-2105-7-70.
Küntzer J, Blum T, Gerasch A, Backes C, Hildebrandt A, Kaufmann M, Lenhof HP. BN++-a biological information system. Journal of Integrative Bioinformatics. 2006;3(2)34.
Köhler J, Baumbach J, Taubert J, et al. Graph-based analysis and visualization of experimental results with ONDEX. Bioinforma Oxf Engl. 2006;22:1383–90. doi:10.1093/bioinformatics/btl081.
Lysenko A, Lysenko A, Hindle MM, et al. Data integration for plant genomics–exemplars from the integration of Arabidopsis thaliana databases. Brief Bioinform. 2009;10:676–93. doi:10.1093/bib/bbp047.
Eronen L, Toivonen H. Biomine: predicting links between biological entities using network models of heterogeneous databases. BMC Bioinformatics. 2012;13:119. doi:10.1186/1471-2105-13-119.
Consortium TU. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43:D204–12. doi:10.1093/nar/gku989.
Consortium TGO. Gene ontology consortium: going forward. Nucleic Acids Res. 2015;43:D1049–56. doi:10.1093/nar/gku1179.
Tatusova T, Ciufo S, Fedorov B, et al. RefSeq microbial genomes database: new representation and annotation strategy. Nucleic Acids Res. 2014;42:D553–559. doi:10.1093/nar/gkt1274.
Sayers EW, Barrett T, Benson DA, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009;37:D5–D15. doi:10.1093/nar/gkn741.
Bairoch A. The ENZYME database in 2000. Nucleic Acids Res. 2000;28:304–5.
Kerrien S, Aranda B, Breuza L, et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 2012;40:D841–846. doi:10.1093/nar/gkr1088.
Croft D, Mundo AF, Haw R, et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 2014;42:D472–477. doi:10.1093/nar/gkt1102.
Bauer-Mehren A, Bundschus M, Rautschka M, et al. Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PLoS One. 2011;6:e20284. doi:10.1371/journal.pone.0020284.
Knox C, Law V, Jewison T, et al. DrugBank 3.0: a comprehensive resource for “Omics” research on drugs. Nucleic Acids Res. 2011;39:D1035–41. doi:10.1093/nar/gkq1126.
Uhlén M, Fagerberg L, Hallström BM, et al. Proteomics. Tissue-based map of the human proteome. Science. 2015;347:1260419. doi:10.1126/science.1260419.
Kaneko Y, Yatagai Y, Yamada H, et al. The search for common pathways underlying asthma and COPD. Int J Chron Obstruct Pulmon Dis. 2013;8:65–78. doi:10.2147/COPD.S39617.
Voraphani N, Gladwin MT, Contreras AU, et al. An airway epithelial iNOS-DUOX2-thyroid peroxidase metabolome drives Th1/Th2 nitrative stress in human severe asthma. Mucosal Immunol. 2014;7:1175–85. doi:10.1038/mi.2014.6.
Modena BD, Tedrow JR, Milosevic J, et al. Gene expression in relation to exhaled nitric oxide identifies novel asthma phenotypes with unique biomolecular pathways. Am J Respir Crit Care Med. 2014;190:1363–72. doi:10.1164/rccm.201406-1099OC.
Ritchie ME, Phipson B, Wu D, et al. (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res gkv007. doi: 10.1093/nar/gkv007.
Durrington HJ, Farrow SN, Loudon AS, Ray DW. The circadian clock and asthma. Thorax. 2014;69:90–2. doi:10.1136/thoraxjnl-2013-203482.
Ko CH, Takahashi JS. Molecular components of the mammalian circadian clock. Hum Mol Genet. 2006;15:R271–7. doi:10.1093/hmg/ddl207.
Lesk V, Taubert J, Rawlings C, et al. WIBL: Workbench for Integrative Biological Learning. J Integr Bioinforma. 2011;8:156. doi:10.2390/biecoll-jib-2011-156.
Sternberg MJE, Tamaddoni-Nezhad A, Lesk VI, et al. Gene function hypotheses for the Campylobacter jejuni glycome generated by a logic-based approach. J Mol Biol. 2013;425:186–97. doi:10.1016/j.jmb.2012.10.014.
Côté RA. Architecture of SNOMED. In: Orthner HF, Blum BI, editors. Implement. Health Care Inf. Syst. New York: Springer; 1989. p. 167–79.
Rogers FB. Communications to the Editor. Bull Med Libr Assoc. 1963;51:114–6.
Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–9. doi:10.1038/75556.
Brinkman R, Courtot M, Derom D, et al. Modeling biomedical experimental processes with OBI. J Biomed Semant. 2010;1:S7+. doi:10.1186/2041-1480-1-s1-s7.
Smith B, Ashburner M, Rosse C, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25:1251–5. doi:10.1038/nbt1346.
Noy NF, Shah NH, Whetzel PL, et al. (2009) BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res gkp440. doi: 10.1093/nar/gkp440.
Lassila O, Swick RR, Wide W, Consortium W. Resource Description Framework (RDF) model and syntax specification. 1998.