Surge: a fast open-source chemical graph generator

Brendan D. McKay1, Mehmet Aziz Yirik2, Christoph Steinbeck2
1School of Computing, Australian National University, Canberra, ACT, 2601, Australia
2Institute of Inorganic and Analytical Chemistry, Friedrich-Schiller-University, Lessingstr. 8, 07743, Jena, Germany

Tóm tắt

AbstractChemical structure generators are used in cheminformatics to produce or enumerate virtual molecules based on a set of boundary conditions. The result can then be tested for properties of interest, such as adherence to measured data or for their suitability as drugs. The starting point can be a potentially fuzzy set of fragments or a molecular formula. In the latter case, the generator produces the set of constitutional isomers of the given input formula. Here we present the novel constitutional isomer generator based on the canonical generation path method. uses the package to compute automorphism groups of graphs. We outline the working principles of and present benchmarking results which show that is currently the fastest structure generator. is available under a liberal open-source license.

Từ khóa


Tài liệu tham khảo

Elyashberg M, Argyropoulos D (2020) Computer assisted structure elucidation (CASE): current and future perspectives. Magn Reson Chem. https://doi.org/10.1002/mrc.5115

Miyao T, Kaneko H, Funatsu K (2016) Ring system-based chemical graph generation for de novo molecular design. J Comput Aided Mol Des 30:425–446

Saldívar-González FI, Huerta-García CS, Medina-Franco JL (2020) Chemoinformatics-based enumeration of chemical libraries: a tutorial. J Cheminform 12:64

Blum LC, Reymond J-L (2009) 970 Million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc 131:8732–8733

Lindsay RK, Buchanan BG, Feigenbaum EA, Lederberg J (1993) DENDRAL: a case study of the first expert system for scientific hypothesis formation. Artif Intell 61:209–261

Gulyaeva KA, Artemieva IL (2020) The ontological approach in organic chemistry intelligent system development. Advances in Intelligent Systems and Computing. Springer, Singapore, pp 69–78

Badertscher M, Korytko A, Schulz KP, Madison M, Munk ME, Portmann P et al (2000) Assemble 2.0: a structure generator. Chemometrics Intellig Lab Syst. 51:73–79

Holt DF, Eick B, O’Brien EA (2005) Handbook of computational group theory. CRC Press, Boca Raton

Kreher DL, Stinson DR (2020) Combinatorial algorithms: generation, enumeration, and search. CRC Press, Boca Raton

Serov VV, Elyashberg ME, Gribov LA (1976) Mathematical synthesis and analysis of molecular structures. J Mol Struct 31:381–397

Molchanova MS, Shcherbukhin VV, Zefirov NS (1996) Computer generation of molecular structures by the SMOG program. J Chem Inf Comput Sci 36:888–899

Yirik MA, Steinbeck C (2021) Chemical graph generators. PLoS Comput Biol 17:e1008504

Faulon JL (1992) On using graph-equivalent classes for the structure elucidation of large molecules. J Chem Inf Comput Sci 32:338–348

Faulon JL (1994) Stochastic generator of chemical-structure. 1. Application to the structure elucidation of large molecules. J Chem Inf Comput Sci 34:1204–1218

Junker J (2011) Theoretical NMR correlations based structure discussion. J Cheminform 3:27

Nuzillard J-M, Georges M (1991) Logic for structure determination. Tetrahedron 47:3655–3664

Gugisch R, Kerber A, Kohnert A, Laue R, Meringer M, Rücker C, et al. MOLGEN 5.0, a Molecular structure generator in advances in mathematical chemistry. Advances in mathematical chemistry; Basak, SC, Restrepo, G , Villaveces, JL, Eds.

Grund R, Kerber A, Laue R (1996) Construction of discrete structures, especially isomers. Discrete Appl Math 67:115–126

Grüner T, Laue R, Meringer M (1997) Algorithms for group actions: homomorphism principle and orderly generation applied to graphs. DIMACS Ser Discrete Math Theoret Comput Sci 28:113–122

Yirik MA, Sorokina M, Steinbeck C (2021) MAYGEN: an open-source chemical structure generator for constitutional isomers based on the orderly generation principle. J Cheminform. https://doi.org/10.1186/s13321-021-00529-9

Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L (2012) Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model 52:2864–2875

Sorokina M, Merseburger P, Rajan K, Yirik MA, Steinbeck C (2021) COCONUT online: collection of open natural products database. J Cheminform 13:2

McKay BD, Piperno A (2014) Practical graph isomorphism. II J Symb Comput 60:94–112

McKay B, Piperno A. nauty and Traces User’s Guide. 2019 Sep. https://pallini.di.uniroma1.it/Guide.html

McKay BD (1998) Isomorph-free exhaustive generation. J Algorithms 26:306–324

CTFILE FORMATS BIOVIA DATABASES 2016. 2016. https://help.accelrysonline.com/ulm/onelab/1.0/content/ulm_pdfs/direct/reference/ctfileformats2016.pdf

Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36