Rethinking the applicability domain analysis in QSAR models

Journal of Computer-Aided Molecular Design - Tập 38 - Trang 1-9 - 2024
Jose R. Mora1, Edgar A. Marquez2,3, Noel Pérez-Pérez4, Ernesto Contreras-Torres5, Yunierkis Perez-Castillo6, Guillermin Agüero-Chapin7,8, Felix Martinez-Rios9, Yovani Marrero-Ponce5,9,10, Stephen J. Barigye11
1Departamento de Ingeniería Química, Universidad San Francisco de Quito (USFQ), Instituto de Simulación Computacional (ISC- USFQ), Diego de Robles y Vía Interoceánica, Quito, Ecuador
2Grupo de Investigaciones en Química Y Biología, Departamento de Química Y Biología, Facultad de Ciencias Básicas, Universidad del Norte, Barranquilla, Colombia
3Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Cátedras Conacyt, Ensenada, México
4Colegio de Ciencias e Ingenierías “El Politécnico”, Universidad San Francisco de Quito (USFQ), Quito, Ecuador
5Grupo de Medicina Molecular y Traslacional (MeM&T), Universidad San Francisco de Quito, Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Quito, Ecuador
6Bio-Chemoinformatics Research Group, Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito, Ecuador
7CIIMAR-Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal
8Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal
9Facultad de Ingeniería, Universidad Panamericana, CDMX, Ciudad de México, México
10Computer-Aided Molecular “Biosilico” Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Ecuador
11Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid (UAM), Madrid, Spain

Tóm tắt

Notwithstanding the wide adoption of the OECD principles (or best practices) for QSAR modeling, disparities between in silico predictions and experimental results are frequent, suggesting that model predictions are often too optimistic. Of these OECD principles, the applicability domain (AD) estimation has been recognized in several reports in the literature to be one of the most challenging, implying that the actual reliability measures of model predictions are often unreliable. Applying tree-based error analysis workflows on 5 QSAR models reported in the literature and available in the QsarDB repository, i.e., androgen receptor bioactivity (agonists, antagonists, and binders, respectively) and membrane permeability (highest membrane permeability and the intrinsic permeability), we demonstrate that predictions erroneously tagged as reliable (AD prediction errors) overwhelmingly correspond to instances in subspaces (cohorts) with the highest prediction error rates, highlighting the inhomogeneity of the AD space. In this sense, we call for more stringent AD analysis guidelines which require the incorporation of model error analysis schemes, to provide critical insight on the reliability of underlying AD algorithms. Additionally, any selected AD method should be rigorously validated to demonstrate its suitability for the model space over which it is applied. These steps will ultimately contribute to more accurate estimations of the reliability of model predictions. Finally, error analysis may also be useful in “rational” model refinement in that data expansion efforts and model retraining are focused on cohorts with the highest error rates.

Tài liệu tham khảo

Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R (2014) J Med Chem 57(12):4977 Neves BJ, Braga RC, Melo-Filho CC, Moreira-Filho JT, Muratov EN, Andrade CH (2018) Front pharmacol 9 Sheridan RP (2013) J Chem Inf Model 53(4):783 Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A (2020) Chem Soc Rev 49(11):3525 Tropsha A (2010) Mol Inf 29(6–7):476 Mathea M, Klingspohn W, Baumann K (2016) Mol Inf 35(5):160 Sahigara F, Mansouri K, Ballabio D, Mauri A, Consonni V, Todeschini R (2012) Molecules 17(5):4791 Tropsha A, Golbraikh A (2007) Curr Pharm Des 13(34):3494 Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) ATLA Altern Lab Anim 33(5):445 Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, Varnek A (2008) J Chem Inf Model 48(9):1733 Sheridan RP (2012) J Chem Inf Model 52(3):814 Sheridan RP (2013) J Chem Inf Model 53(11):2837 Norinder U, Carlsson L, Boyer S, Eklund M (2014) J Chem Inf Model 54(6):1596 Bosc N, Atkinson F, Felix E, Gaulton A, Hersey A, Leach AR (2019) J Cheminformatics 11:1 Cortes-Ciriano I, Murrell DS, van Westen GJ, Bender A, Malliavin TE (2015) J Cheminformatics 7(1):1 Oršolić D, Šmuc T (2023) Bioinformatics 39(8):btad465 Ruusmann V, Sild S, Maran U (2015) J Cheminformatics 7(1):32 Oja M, Sild S, Maran U (2019) J Chem Inf Model 59(5):2442 Piir G, Sild S, Maran U (2021) Chemosphere 262:128313 Wolpert DH, Macready WG (1997) IEEE T Evolut Comput 1(1):67 Sullivan K, Manuppello J, Willett C (2014) SAR QSAR Environ Res 25(5):357 Dearden JC, Rowe PH (2015) Use of artificial neural networks in the QSAR prediction of physicochemical properties and toxicities for REACH legislation. In: Cartwright H (ed) Artificial neural networks. Methods in Molecular Biology. Springer, New York, NY, p 65 Pavan M, Worth A (2008) SAR QSAR Environ Res 19(7–8):785 Miller TH, Gallidabino MD, MacRae JI, Hogstrand C, Bury NR, Barron LP, Snape JR, Owen SF (2018) Environ Sci Technol 52(22):12953 Gouin T (2010) Environ Sci Policy 13(3):175 Syberg K, Hansen SF (2016) Sci Total Environ 541:784 Scior T, Medina-Franco J, Do Q-T, Martínez-Mayorga K, Yunes Rojas J, Bernard P (2009) Curr Med Chem 16(32):4297 Martin YC (2012) Wiley Interdisciplinary Reviews. Comput Mol Sci 2(3):435 Gini G (2018) QSAR: what else? Computational toxicology: methods and protocols, vol 1800. Humana, New York, NY, p 79