ClassyFire: automated chemical classification with a comprehensive, computable taxonomy

Yannick Djoumbou-Feunang1, Roman Eisner2, Craig Knox3, Leonid Chepelev4, Janna Hastings5, Gareth Owen5, Eoin Fahy6, Christoph Steinbeck5, Shankar Subramanian6, Evan Bolton7, Russell Greiner8, David S. Wishart9
1Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E8, Canada
2Jobber – Field Service Software, 10520 Jasper Ave, Edmonton, AB, T5J 1Z7, Canada
3Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
4Department of Medical Imaging, The Ottawa Hospital, University of Ottawa, Civic Campus, 1053 Carling Ave, Ottawa, ON, K1Y 4E9, Canada
5European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
6Department of Bioengineering, University of California, La Jolla, San Diego, CA 92093, USA
7Department of Health and Human Services, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
8Department of Computing Science 2-21 Athabasca Hall, Alberta Innovates Centre for Machine Learning (AICML), University of Alberta, Edmonton, AB, T6G 2E8, Canada
9The Metabolomics Innovation Center, University of Alberta, Edmonton, AB, T6G 2E9, Canada

Tóm tắt

Từ khóa


Tài liệu tham khảo

Fridman Noy N, Hafner CD (1997) The state of the art in ontology design. AI Mag 18:53–74

Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing? Int J Human Comput Stud 43(5–6):907–928

Hoehndorf R, Schofield PN, Gkoutos GV (2015) The role of ontologies in biological and biomedical research: a functional perspective. Brief Bioinform 16(6):1069–1080

Cain AJ (1958) Logic and memory in Linnaeus’s system of taxonomy. Proc Linn Soc Lond 169:114–163

The BGS Rock Classification Scheme (2016) http://www.bgs.ac.uk/bgsrcs/ . Accessed 20 April 2013

Gell-Mann M, Ne’eman Y (1964) The eightfold way. W.A. Benjamin, New York

Malyuto V, Shvelidze T (1989) The technique of automatic quantitative stellar spectral classification using stepwise linear regression. Astrophys Space Sci 155(1):71–83

Singh HP, Gulati RK, Gupta R (1998) Stellar spectral classification using principal component analysis and artificial neural networks. Mon Not R Astron Soc 295(2):312–318

The Anatomical Therapeutic Chemical (ATC) (2011) Classification system: structure and principles. http://www.whocc.no/atc/structure_and_principles/ . Accessed 20 April 2013

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29

Favre HA, Powell WH (eds) (2013) Nomenclature of Organic chemistry. IUPAC recommendations and preferred name 2013. http://www.acdlabs.com/iupac/nomenclature/ed. The Royal Society of Chemistry; 2013

Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36

Fahy E, Subramaniam S, Murphy RC, Nishijima M, Raetz CRH, Shimizu T et al (2009) Update of the LIPID MAPS comprehensive classification system for lipids. J Lipid Res 50:S9–S14

Fliri AF, Loging WT, Thadeio PF, Volkmann RA (2005) Biological spectra analysis: Linking biological activity profiles to molecular structure. Proc Natl Acad Sci USA 102(2):261–266

Hastings J, De Matos P, Dekker A, Ennis M, Harsha B, Kale N et al (2013) The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res 41(D1):D456–D463

Rogers FB (1963) Medical subject headings. Bull Med Libr Assoc 51:114–116

Moreno P, Beisken S, Harsha B, Muthukrishnan V, Tudose I, Dekker A et al (2015) BiNChE: a web tool and library for chemical enrichment analysis based on the ChEBI ontology. BMC Bioinform 16(1):56

Zhukova A, Sherman DJ (2014) Knowledge-based generalization of metabolic models. J Comput Biol 21(7):534–547

Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A et al (2016) PubChem substance and compound databases. Nucleic Acids Res 44(D1):D1202–D1213

Derwent World Patents Index—Reference Information (2016). http://ip-science.thomsonreuters.com/support/patents/dwpiref/

Bremser W (1978) Hose—a novel substructure code. Anal Chim Acta 103(4):355–365

Feldman HJ, Dumontier M, Ling S, Haider N, Hogue CWV (2005) CO: a chemical ontology for identification of functional groups and semantic comparison of small molecules. FEBS Lett 579(21):4685–4691

Haider N (2016) The checkmol/matchmol Homepage. http://merian.pch.univie.ac.at/~nhaider/cheminf/cmmm.html

Bobach C, Böhme T, Laube U, Püschel A, Weber L (2012) Automated compound classification using a chemical ontology. J Cheminformatics 4(12):40

Vargyas M, Papp J, Csizmadia F, Csepregi S, Papp Á, Vadász P (2008) Maximum common substructure based hierarchical clustering. http://www.chemaxon.com/library/maximum-common-substructure-based-hierarchical-clustering-2/

Rahman SA, Bashton M, Holliday GL, Schrader R, Thornton JM (2009) Small molecule subgraph detector (SMSD) toolkit. J Cheminformatics 1(1):12

Ertl P, Schuffenhauer A, Renner S (2011) The scaffold tree: an efficient navigation in the scaffold universe. Methods Mol Biol 672:245–260

Chepelev LL, Hastings J, Ennis M, Steinbeck C, Dumontier M (2012) Self-organizing ontology of biochemically relevant small molecules. BMC Bioinform 13:3

Hastings J, Magka D, Batchelor C, Duan L, Stevens R, Ennis M et al (2012) Structure-based classification and ontology in chemistry. J Cheminformatics 4:8

Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y et al (2014) DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res 42(D1):D1091–D1097

LIPID MAPS Lipidomics Gateway (2011) A free resource sponsored by the National Institute of General Medical Sciences 2016. http://www.lipidmaps.org/

Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, Liu Y et al (2013) HMDB 3.0-the human metabolome database in 2013. Nucleic Acids Res 41(D1):D801–D807

Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W et al (2007) The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11):1251–1255

Day-Richter J, Harris MA, Haendel M, Clark JI, Ireland A, Lomax J et al (2007) OBO-edit—an ontology editor for biologists. Bioinformatics 23(16):2198–2200

Goodacre SC, Street LJ, Hallett DJ, Crawforth JM, Kelly S, Owens AP et al (2006) Imidazo[1,2-a]pyrimidines as functionally selective and orally bioavailable GABAAa2/a3 binding site agonists for the treatment of anxiety disorders. J Med Chem 49(1):35–38

Markush Technology (2016) Toolkit for the analysis of virtual combinatorial library and Markush structures. https://www.chemaxon.com/products/markush-ip/

National Institute of General Medical Sciences (2016) https://www.nigms.nih.gov/Pages/default.aspx

National Institute of Health (2016) https://www.nih.gov/

Lowe DM, Corbett PT, Murray-Rust P, Glen RC (2011) Chemical name to structure: OPSIN, an open source solution. J Chem Inf Model 51(3):739–753

Introducing JSON (2012) ECMA-404 the JSON data interchange standard. http://www.json.org

Dalby A, Nourse JG, Douglas HounshellW, Gushurst AKI, Grier DL, Leland BA et al (1992) Description of several chemical structure file formats used by computer programs developed at molecular design limited. J Chem Inf Comput Sci 32(3):244–255

Shafranovich Y (2005) Common format and MIME type for comma-separated values (CSV) files. http://www.ietf.org/rfc/rfc4180.txt#page-1

Wishart DS (2014) FooDB: the food database. FooDB version 1.0. http://foodb.ca

Wishart D, Arndt D, Pon A, Sajed T, Guo AC, Djoumbou Y et al (2015) T3DB: the toxic exposome database. Nucleic Acids Res 43(D1):D928–D934

Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44(D1):D457–D462

Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM et al (2012) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 40(D1):D742–D753

PubMed Health [Internet] (2011) Bethesda (MD): National Library of Medicine (US). 2011 Jan 1. http://www.ncbi.nlm.nih.gov/pubmedhealth/

An End-to-End Search and Analytics Platform (2015) Infinitely versatile. http://www.elasticsearch.org/overview/

Guo AC, Jewison T, Wilson M, Liu Y, Knox C, Djoumbou Y et al (2013) ECMDB: the E. coli metabolome database. Nucleic Acids Res 41(D1):D625–D630

Jewison T, Knox C, Neveu V, Djoumbou Y, Guo AC, Lee J et al (2012) YMDB: the yeast metabolome database. Nucleic Acids Res 40:D815–D820