Efficient enumeration of monocyclic chemical graphs with given path frequencies

Masaki Suzuki1, Hiroshi Nagamochi1, Tatsuya Akutsu2
1Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University Yoshida, Kyoto, Japan
2Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan

Tóm tắt

The enumeration of chemical graphs (molecular graphs) satisfying given constraints is one of the fundamental problems in chemoinformatics and bioinformatics because it leads to a variety of useful applications including structure determination and development of novel chemical compounds. We consider the problem of enumerating chemical graphs with monocyclic structure (a graph structure that contains exactly one cycle) from a given set of feature vectors, where a feature vector represents the frequency of the prescribed paths in a chemical compound to be constructed and the set is specified by a pair of upper and lower feature vectors. To enumerate all tree-like (acyclic) chemical graphs from a given set of feature vectors, Shimizu et al. and Suzuki et al. proposed efficient branch-and-bound algorithms based on a fast tree enumeration algorithm. In this study, we devise a novel method for extending these algorithms to enumeration of chemical graphs with monocyclic structure by designing a fast algorithm for testing uniqueness. The results of computational experiments reveal that the computational efficiency of the new algorithm is as good as those for enumeration of tree-like chemical compounds. We succeed in expanding the class of chemical graphs that are able to be enumerated efficiently.

Từ khóa


Tài liệu tham khảo

Bytautas L, Klein DJ: Chemical combinatorics for alkane-isomer enumeration and more. J Chem Inf Comput Sci. 1998, 38: 1063-1078. 10.1021/ci980095c.

Bytautas L, Klein DJ: Formula periodic table for acyclic hydrocarbon isomer classescombinatorially averaged graph invariants. Phys Chem Chem Phys. 1999, 1: 5565-5572. 10.1039/a906137a.

Bytautas L, Klein DJ: Isomer combinatorics for acyclic conjugated polyenes: enumeration and beyond. Theoret Chem Acc. 1999, 101: 371-387. 10.1007/s002140050455.

Buchanan BG, Feigenbaum EA: DENDRAL and Meta-DENDRAL: their applications dimension. Artif Intell. 1978, 11: 5-24. 10.1016/0004-3702(78)90010-3.

Funatsu K, Sasaki S: Recent advances in the automated structure elucidation system, CHEMICS. Utilization of two-dimensional NMR spectral information and development of peripheral functions for examination of candidates. J Chem Inf Comput Sci. 1996, 36: 190-204. 10.1021/ci950152r.

Fink T, Reymond JL: Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery. J Chem Inf Comput Sci. 2007, 47: 342-353. 10.1021/ci600423u.

Mauser H, Stahl M: Chemical fragment spaces for de novo design. J Chem Inf Comput Sci. 2007, 47: 318-324. 10.1021/ci6003652.

Faulon JL, Churchwell CJ: The signature molecular descriptor. 2. Enumerating molecules from their extended valence sequences. J Chem Inf Comput Sci. 2003, 43: 721-734. 10.1021/ci020346o.

Hall LH, Dailey ES: Design of molecules from quantitative structure-activity relationship models. 3. Role of higher order path counts: Path 3. J Chem Inf Comput Sci. 1993, 33: 598-603. 10.1021/ci00014a012.

Deshpande M, Kuramochi M, Wale N, Karypis G: Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng. 2005, 17: 1036-1050.

Cayley A: On the analytic forms called trees with applications to the theory of chemical combinations. Rep Br Assoc Adv Sci. 1875, 45: 257-305.

Pólya G: Kombinatorische anzahlbestimmungen für gruppen, graphen, und chemische verbindungen. Acta Math. 1937, 68: 45-254.

Shimizu M, Nagamochi H, Akutsu T: Enumerating tree-like chemical graphs with given upper and lower bounds on path frequencies. BMC Bioinformatics. 2011, 12 (Suppl 14): 53-

Suzuki M, Nagamochi H, Akutsu T: A 2-phase algorithm for enumerating tree-like chemical graphs satisfying given upper and lower bounds. IPSJ SIG Technical Reports. BIO 2012, 2012, 28 (17): 1-8.

Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M: KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010, 38: D355-D360,

Harary F: Graph Theory. 1969, MA: Addison Wesley

Bakir GH, Weston J, Schölkopf B: Learning to find pre-images. Adv Neural Inform Process Syst. 2003, 16: 449-456.

Bakir GH, Zien A, Tsuda K: Learning to find graph pre-images. Lect Notes Comput Sci. 2004, 3175: 253-261. 10.1007/978-3-540-28649-3_31.

Kashima H, Tsuda K, Inokuchi A: Marginalized kernels between labeled graphs. Proceedings of the 20th International Conference Machine Learning. 2003, Palo Alto: AAAI Press, 321-328.

Ueda N, Akutsu T, Perret JL, Vert JP, Mahé P: Graph kernels for molecular structure-activity relationship analysis with support vector machines. J Chem Inf Model. 2005, 45: 939-951. 10.1021/ci050039t.

Byvatov E, Fechner U, Sadowski J, Schneider G: Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J Chem Inf Comput Sci. 2003, 3: 1882-1889.

Akutsu T, Fukagawa D, Jansson J, Sadakane K: Inferring a graph from path frequency. Discrete Appl Math. 2012, 160: 1416-1428. 10.1016/j.dam.2012.02.002.

Nagamochi H: A detachment algorithm for inferring a graph from path frequency. Algorithmica. 2009, 53: 207-224. 10.1007/s00453-008-9184-0.

Kier L, Hall L, Frazer J: Design of molecules from quantitative structure-activity relationship models. 1. Information transfer between path and vertex degree counts. J Chem Inf Comput Sci. 1993, 33: 143-147. 10.1021/ci00011a021.

Miyao T, Arakawa M, Funatsu K: Exhaustive structure generation for inverse-QSPR/QSAR. Mol Inform. 2010, 29: 111-125. 10.1002/minf.200900038.

Wong WWL, Burkowski FJ: A constructive approach for discovering new drug leads: Using a kernel methodology for the inverse-QSAR problem. J Cheminf. 2009, 1: 4-10.1186/1758-2946-1-4.

Gugisch R, Kerber A, Laue R, Meringer M, Rucker C: History and progress of the generation of structural formulae in chemistry and its applications. MATCH Comm Math Comput Chem. 2007, 58: 239-280.

Fujiwara H, Wang J, Zhao L, Nagamochi H, Akutsu T: Enumerating tree-like chemical graphs with given path frequency. J Chem Inf Model. 2008, 48: 1345-1357. 10.1021/ci700385a.

Nakano S, Uno T: Generating colored trees. Lect Notes Comput Sci. 2005, 3787: 249-260. 10.1007/11604686_22.

Nakano S, Uno T: Efficient generation of rooted trees. NII Technical Report. 2003, NII-2003-005E,

Ishida Y, Zhao L, Nagamochi H, Akutsu T: Improved algorithms for enumerating tree-like chemical graphs with given path frequency. Genome Inform. 2008, 21: 53-64.

Luks EM: Isomorphism of graphs of bounded valence can be tested in polynomial time. J Comput Syst Sci. 1982, 25: 42-65. 10.1016/0022-0000(82)90009-5.

Fürer M, Schnyder W, Specker E: Normal forms for trivalent graphs and graphs of bounded valence. Proceedings of 15th Annual ACM Symp Theory of Computing. 1983, NY: ACM Press, 161-170.

Faulon J-L: Isomorphism, automorphism partitioning, and canonical labeling can be solved in polynomial-time for molecular graphs. J Chem Inf Comput Sci. 1998, 38: 432-444. 10.1021/ci9702914.

Jordan C: Sur les assemblages de lignes. J Reine Angew Math. 1869, 70: 185-190.

Li J: Enumerating benzene isomers of tree-like chemical graphs. Master’s Thesis. Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University 2014,