Comparing structural fingerprints using a literature-based similarity benchmark
Tóm tắt
The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The attraction of such a definition is that it matches one of the key uses of similarity measures in early-stage drug discovery. If we make the assumption that molecules in the same compound activity table in a medicinal chemistry paper were considered similar by the authors of the paper, we can create a dataset of similar molecules from the medicinal chemistry literature. Furthermore, molecules with decreasing levels of similarity to a reference can be found by either ordering molecules in an activity table by their activity, or by considering activity tables in different papers which have at least one molecule in common. Using this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures. Compared to similar results from a virtual screen, these benchmarks are an order of magnitude more sensitive to differences between fingerprints both because of their size and because they avoid loss of statistical power due to the use of mean scores or ranks. We measure the performance of 28 different fingerprints on the benchmark sets and compare the results to those from the Riniker and Landrum (J Cheminf 5:26, 2013. doi:
10.1186/1758-2946-5-26
) ligand-based virtual screening benchmark. Extended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested. When ranking diverse structures or carrying out a virtual screen, we find that the performance of the ECFP fingerprints significantly improves if the bit-vector length is increased from 1024 to 16,384.
Tài liệu tham khảo
Johnson MA, Maggiora GM (1990) Concepts and applications of molecular similarity. Wiley, New York
Maggiora GM (2006) On outliers and activity cliffs—why QSAR often disappoints. J Chem Inf Model 46:1535. doi:10.1021/ci060117s
Willett P (2014) The calculation of molecular structural similarity: principles and practice. Mol Inform 33:403–413. doi:10.1002/minf.201400024
Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert Opin Drug Discov 11:137–148. doi:10.1517/17460441.2016.1117070
Stumpfe D, Bajorath J (2011) Similarity searching. Wiley Interdiscip Rev Comput Mol Sci 1:260–282. doi:10.1002/wcms.23
Cereto-Massagué A, Ojeda MJ, Valls C et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63. doi:10.1016/j.ymeth.2014.08.005
McGaughey GB, Sheridan RP, Bayly CI et al (2007) Comparison of topological, shape, and docking methods in virtual screening. J Chem Inf Model 47:1504–1519. doi:10.1021/ci700052x
Venkatraman V, Pérez-Nueno VI, Mavridis L, Ritchie DW (2010) Comprehensive comparison of ligand-based virtual screening tools against the DUD data set reveals limitations of current 3D methods. J Chem Inf Model 50:2079–2093. doi:10.1021/ci100263p
Riniker S, Landrum GA (2013) Open-source platform to benchmark fingerprints for ligand-based virtual screening. J Cheminf 5:26. doi:10.1186/1758-2946-5-26
Tiikkainen P, Markt P, Wolber G et al (2009) Critical comparison of virtual screening methods against the MUV data set. J Chem Inf Model 49:2168–2178. doi:10.1021/ci900249b
Heikamp K, Bajorath J (2011) Large-scale similarity search profiling of ChEMBL compound data sets. J Chem Inf Model 51:1831–1839. doi:10.1021/ci200199u
Patterson DE, Cramer RD, Ferguson AM et al (1996) Neighborhood behavior: a useful concept for validation of “molecular diversity” descriptors. J Med Chem 39:3049–3059. doi:10.1021/jm960290n
Horvath D, Jeandenans C (2003) Neighborhood behavior of in silico structural spaces with respect to in vitro activity spaces—a novel understanding of the molecular similarity principle in the context of multiple receptor binding profiles. J Chem Inf Comput Sci 43:680–690. doi:10.1021/ci025634z
Papadatos G, Cooper AWJ, Kadirkamanathan V et al (2009) Analysis of neighborhood behavior in lead optimization and array design. J Chem Inf Model 49:195–208. doi:10.1021/ci800302g
Steffen A, Kogej T, Tyrchan C, Engkvist O (2009) Comparison of molecular fingerprint methods on the basis of biological profile data. J Chem Inf Model 49:338–347. doi:10.1021/ci800326z
Hert J, Willett P, Wilton DJ et al (2004) Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Org Biomol Chem 2:3256–3266. doi:10.1039/B409865J
Bender A, Jenkins JL, Scheiber J et al (2009) How similar are similarity searching methods? a principal component analysis of molecular descriptor space. J Chem Inf Model 49:108–119. doi:10.1021/ci800249s
Sastry M, Lowrie JF, Dixon SL, Sherman W (2010) Large-scale systematic analysis of 2D fingerprint methods and parameters to improve virtual screening enrichments. J Chem Inf Model 50:771–784. doi:10.1021/ci100062n
Briem H, Lessel UF (2000) In vitro and in silico affinity fingerprints: finding similarities beyond structural classes. Perspect Drug Discov Des 20:231–244. doi:10.1023/A:1008793325522
Duan J, Dixon SL, Lowrie JF, Sherman W (2010) Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods. J Mol Graph Model 29:157–170. doi:10.1016/j.jmgm.2010.05.008
Franco P, Porta N, Holliday JD, Willett P (2014) The use of 2D fingerprint methods to support the assessment of structural similarity in orphan drug legislation. J Cheminf 6:5. doi:10.1186/1758-2946-6-5
Maggiora G, Vogt M, Stumpfe D, Bajorath J (2014) Molecular similarity in medicinal chemistry. J Med Chem 57:3186–3204. doi:10.1021/jm401411z
Bento AP, Gaulton A, Hersey A et al (2014) The ChEMBL bioactivity database: an update. Nucleic Acids Res 42:D1083–D1090. doi:10.1093/nar/gkt1031
Riniker S, Landrum G (2016) Code repository for benchmarking platform. https://github.com/rdkit/benchmarking_platform. Accessed 15 Jan 2016
RDKit (2016) Cheminformatics and machine learning software. http://rdkit.org/. Accessed 15 Jan 2016
Nilakantan R, Bauman N, Dixon JS, Venkataraghavan R (1987) Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors. J Chem Inf Model 27:82–85. doi:10.1021/ci00054a008
Carhart RE, Smith DH, Venkataraghavan R (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci 25:64–73. doi:10.1021/ci00046a002
Gedeck P, Rohde B, Bartels C (2006) QSAR—how good is it in practice? comparison of descriptor sets on an unbiased cross section of corporate data sets. J Chem Inf Model 46:1924–1936. doi:10.1021/ci050413p
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754. doi:10.1021/ci100050t1 comment on PubPeer
Ertl P, Patiny L, Sander T et al (2015) Wikipedia chemical structure explorer: substructure and similarity searching of molecules from Wikipedia. J Cheminf 7:10. doi:10.1186/s13321-015-0061-y
Wikipedia structure search (2016) http://www.cheminfo.org/wikipedia/. Accessed 15 Jan 2016
Jones E, Oliphant T, Peterson P (2001) others. SciPy, Open source scientific tools for Python
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49:6789–6801. doi:10.1021/jm0608356
Rohrer SG, Baumann K (2009) Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data. J Chem Inf Model 49:169–184. doi:10.1021/ci8002649
Truchon J-F, Bayly CI (2007) Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. J Chem Inf Model 47:488–508. doi:10.1021/ci600426e
Sheridan RP (2008) Alternative global goodness metrics and sensitivity analysis: heuristics to check the robustness of conclusions from studies comparing virtual screening methods. J Chem Inf Model 48:426–433. doi:10.1021/ci700380x
Peprah K, Zhu XY, Eyunni SVK et al (2012) Multi-receptor drug design: Haloperidol as a scaffold for the design and synthesis of atypical antipsychotic agents. Bioorg Med Chem 20:1291–1297. doi:10.1016/j.bmc.2011.12.019
Maurya SK, Gollapalli DR, Kirubakaran S et al (2009) Triazole inhibitors of Cryptosporidium parvum inosine 5′-monophosphate dehydrogenase. J Med Chem 52:4623–4630. doi:10.1021/jm900410u
Sard H, Kumaran G, Morency C et al (2005) SAR of psilocybin analogs: discovery of a selective 5-HT2C agonist. Bioorg Med Chem Lett 15:4555–4559. doi:10.1016/j.bmcl.2005.06.104
Meng H, Liu Y, Zhai Y, Lai L (2013) Optimization of 5-hydroxytryptamines as dual function inhibitors targeting phospholipase A2 and leukotriene A4 hydrolase. Eur J Med Chem 59:160–167. doi:10.1016/j.ejmech.2012.10.057
DeFalco J, Steiger D, Dourado M et al (2010) 5-Benzyloxytryptamine as an antagonist of TRPM8. Bioorg Med Chem Lett 20:7076–7079. doi:10.1016/j.bmcl.2010.09.099
Matzen L, van Amsterdam C, Rautenberg W et al (2000) 5-HT reuptake inhibitors with 5-HT1B/1D antagonistic activity: a new approach toward efficient antidepressants. J Med Chem 43:1149–1157. doi:10.1021/jm9811054
Conway RJ, Valant C, Christopoulos A et al (2012) Synthesis and SAR study of 4-arylpiperidines and 4-aryl-1,2,3,6-tetrahydropyridines as 5-HT2C agonists. Bioorg Med Chem Lett 22:2560–2564. doi:10.1016/j.bmcl.2012.01.122
Palmer AM, Münch G, Brehm C et al (2008) 5-Substituted 1H-pyrrolo[3,2-b]pyridines as inhibitors of gastric acid secretion. Bioorg Med Chem 16:1511–1530. doi:10.1016/j.bmc.2007.10.017
Palmer AM, Grobbel B, Brehm C et al (2007) Preparation of tetrahydroimidazo[2,1-a]isoquinolines and their use as inhibitors of gastric acid secretion. Bioorg Med Chem 15:7647–7660. doi:10.1016/j.bmc.2007.08.065
Kaminski JJ, Wallmark B, Briving C, Andersson BM (1991) Antiulcer agents. 5. Inhibition of gastric H+/K+-ATPase by substituted imidazo[1,2-a]pyridines and related analogs and its implication in modeling the high affinity potassium ion binding site of the gastric proton pump enzyme. J Med Chem 34:533–541. doi:10.1021/jm00106a008
Panchal T, Bailey N, Bamford M et al (2009) Evaluation of basic, heterocyclic ring systems as templates for use as potassium competitive acid blockers (pCABs). Bioorg Med Chem Lett 19:6813–6817. doi:10.1016/j.bmcl.2009.07.002
DeMarinis RM, Shah DH, Hall RF et al (1982) α-adrenergic agents. 2. Synthesis and α1-agonist activity of 2-aminotetralins. J Med Chem 25:136–141. doi:10.1021/jm00344a009
Grunewald GL, Bartlett WJ, Reitz TJ et al (1986) Conformationally defined adrenergic agents. 9. Binding requirements of phenolic phenylethylamines in the benzonorbornene skeleton at the active site of phenylethanolamine N-methyltransferase. J Med Chem 29:1972–1982. doi:10.1021/jm00160a029
Ye Q, Grunewald GL (1989) Conformationally defined adrenergic agents. 15. Conformationally restricted and conformationally defined tyramine analogs as inhibitors of phenylethanolamine N-methyltransferase. J Med Chem 32:478–486. doi:10.1021/jm00122a032
Burn P, Crooks PA, Heatley F et al (1982) Synthesis and dopaminergic properties of some exo- and endo-2-aminobenzonorbornenes designed as rigid analogs of dopamine. J Med Chem 25:363–368. doi:10.1021/jm00346a007
Nasr RJ, Swamidass SJ, Baldi PF (2009) Large scale study of multiple-molecule queries. J Cheminf 1:7. doi:10.1186/1758-2946-1-7
Whittle M, Gillet VJ, Willett P, Loesel J (2006) Analysis of data fusion methods in virtual screening: similarity and group fusion. J Chem Inf Model 46:2206–2219. doi:10.1021/ci0496144