The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models

Wiley - Tập 22 Số 1 - Trang 69-77 - 2003
Alexander Tropsha1, Paola Gramatica2, Vijay K. Gombar3
1Laboratory for Molecular Modeling, School of Pharmacy, CB# 7360 Beard Hall, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, U.S.A.
2QSAR and Environmental Chemistry Research Unit, Department of Structural and Functional Biology, University of Insubria, Via J. H. Dunant 3 – 21100 Varese, Italy
3GlaxoSmithKline, Metabolic and Viral Diseases' Center of Excellence for Drug Discovery (MV CEDD), Department of Drug Metabolism and Pharmacokinetics (DMPK), 3030 Cornwallis Road, Research Triangle Park, NC 27709, U.S.A.

Tóm tắt

AbstractThis paper emphasizes the importance of rigorous validation as a crucial, integral component of Quantitative Structure Property Relationship (QSPR) model development. We consider some examples of published QSPR models, which in spite of their high fitted accuracy for the training sets and apparent mechanistic appeal, fail rigorous validation tests, and, thus, may lack practical utility as reliable screening tools. We present a set of simple guidelines for developing validated and predictive QSPR models. To this end, we discuss several validation strategies including (1) randomization of the modelled property, also called Y‐scrambling, (2) multiple leave‐many‐out cross‐validations, and (3) external validation using rational division of a dataset into training and test sets. We also highlight the need to establish the domain of model applicability in the chemical space to flag molecules for which predictions may be unreliable, and discuss some algorithms that can be used for this purpose. We advocate the broad use of these guidelines in the development of predictive QSPR models.

Từ khóa


Tài liệu tham khảo

10.1080/10629360290002316

10.1016/S0887-2333(02)00003-6

10.1016/S0097-8485(99)00093-5

10.1016/S0141-1136(00)00136-7

Schultz T. W. Aptula A. O. Netzeva T. I. and Cronin M. T. D. Quantitative Structure‐ Activity Relationships for the Toxicity of AliphaticCompounds toTetrahymena pyriformis. A Mechanism of Action Approach Presented at the QSAR 2002 meeting May 25–29 Ottawa Canada (2002).

10.1016/S1382-6689(98)00048-9

10.1021/cr9901079

10.1080/10629360108039832

1999, Water Qual. Res. J. Canada, 34, 179, 10.2166/wqrj.1999.006

ECOSAR Version 0.99f Jan 2000 (http://www.epa.gov/oppt/newchems/21ecosar.htm).

10.1021/ci0004558

10.1021/jf990395

10.1021/jm0005151

10.1021/ci990115q

10.1021/ci000333f

10.1080/10629360108039830

10.1016/S0045-6535(01)00148-5

10.1021/jm970732a

10.1016/S1093-3263(01)00123-1

10.1021/ja00226a005

10.1016/0031-6865(95)00014-Z

10.1021/ci980033m

2002, Chemom. Intell. Lab. Systems, 54, 35

Clark R. D. Sprous D. G. and Leonard J. M. Validating Models Based on Large Dataset in: Höltje H.‐D. and Sippl W. (Eds.) Rational Approaches to Drug Design Proceedings of the 13th European Symposium on Quantitative Structure‐Activity Relationship Aug 27–Sept  1 2000 Prous Science Düsseldorf (Germany) 2001 pp. 475–485.

10.1021/jm990526y

10.1016/0169-7439(95)00077-1

10.1021/ci010291a

10.1021/ci010073h

10.1021/ci0100696

10.1021/jm980697n

10.1021/ci000450a

10.1021/jm9700878

10.1021/ci00023a009

10.1016/S1093-3263(98)00008-4

10.1002/anie.199305031

10.1016/S0045-6535(98)00539-6

10.1080/00401706.1969.10490666

10.1093/chromsci/32.4.144

10.1021/ac00078a022

10.1016/0169-7439(96)00023-8

10.1021/cc9800024

10.2307/2986264

10.2307/1267940

10.2307/1271431

J. Comput.‐Aided Mol. Des.

10.1021/ci990437u

Quant. Struct‐Act. Relat.

Quant. Struct.‐Act. Relat.

Marengo E. and Todeschini R. A New Algorithm for Optimal Distance‐based Experimental Design Chemom. Intell. Lab. Syst. 37–44 (1992).

10.1002/(SICI)1099-128X(199603)10:2<95::AID-CEM407>3.0.CO;2-M

10.1021/ci0001637

10.1016/S0045-6535(99)00463-4

10.6028/jres.090.043

10.1021/ac00255a014

10.1021/ci9700945