Correlation weighted successive projections algorithm as a novel method for variable selection in QSAR studies: investigation of anti‐HIV activity of HEPT derivatives

Journal of Chemometrics - Tập 21 Số 5-6 - Trang 239-250 - 2007
Mohsen Kompany‐Zareh1,2, Yousef Akhlaghi1,2
1Department of Chemistry Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan 45195, P. O. Box 45195-1159, Iran
2Medicinal & Natural Products Chemistry Research Center, Shiraz University of Medical Sciences, Shiraz, Iran

Tóm tắt

AbstractCorrelation weighted successive projections algorithm (CWSPA), as a modified version of successive projections algorithm (SPA), is proposed for selection of descriptors in the non‐linear quantitative structure‐activity relationship (QSAR) study of a series of 1‐[2‐hydroxyethoxy‐methyl]‐6‐(phenylthio)thymine] (HEPT) derivatives, as non‐nucloside reverse transcriptase inhibitors (NNRTIs). In the proposed procedure the correlation coefficient of each descriptor with the activities (rg) was an additional criterion for selection of descriptors. The extent of contribution of r in the selection of variables, m, was also optimized and r‐CWSPA was the selected condition (m = 4). Three layer radial basis function networks (RBFNs) and molecular descriptors derived solely from molecular structure were used to construct the non‐linear QSAR models. Utilizing r‐CWSPA a limited number of uncorrelated and informative descriptors were selected. The relative standard error percent in anti‐HIV activity predictions for the training set by the application of cross‐validation (RSECV%) was 9.77%, and for prediction set (RSEP%) was 8.61% when the selected number of descriptors were 20. The obtained model outperforms those given in the literature in both the fitting and predicting stages. RBFN analysis yielded predicted activities in an acceptable agreement with the experimentally obtained values (cross‐validation r = 0.924, prediction r = 0.939). Compared to SPA, r‐CWSPA resulted in a lower RSECV% and RSEP% values using lower number of selected variables. The results show that considering the correlation of variables to the independent variables increase the performance of selection, as a result, the quality of the set of selected variables. Finally, a simple procedure for selection of variables using r‐CWSPA was proposed in which there was no need to test all possible initial descriptors. The results from the simple procedure were comparable to the procedure in which all of the possible initial descriptors were tested. The proposed method was successfully validated by five different training and test sets. Copyright © 2007 John Wiley & Sons, Ltd.

Từ khóa


Tài liệu tham khảo

Hansh C, 1995, Exploring QSAR. Fundamentals and Applications in Chemistry and Biology

10.1021/ci990314

10.1016/S0003-2670(97)00288-2

10.1016/S0169-7439(99)00056-8

10.1016/S0169-7439(98)00035-5

10.1007/s002160050539

10.1016/S0169-7439(99)00035-0

10.1016/0169-7439(95)00088-7

10.1021/ci00015a012

10.1021/ci960487o

10.1021/cr9703358

10.1021/jm970110p

10.1021/ci0001278

10.1080/1062936021000020035

10.1021/ci0342066

10.1007/s001800200105

10.1016/j.aca.2004.05.067

10.1016/S0169-7439(01)00119-8

10.1002/cem.1180090104

10.1366/0003702981942843

10.1016/S0003-2670(00)00893-X

10.1021/jm00014a001

10.1128/jvi.69.5.2729-2736.1995

10.1126/science.1699273

Baba M, 1992, Highly potent and selective inhibition of human immunodeficiency virus type 1 (HIV‐1) by the HIV‐1‐specific reverse transcriptase inhibitors, Drugs Future, 17, 891

10.1016/0006-291X(89)92756-3

10.1021/jm00132a002

10.1016/S0014-827X(98)00103-7

10.1038/nsb0495-303

10.1021/jm00080a020

10.1021/jm00103a009

10.1002/cem.730

10.1021/ci034047q

10.1002/cem.971

OrrMJL.MATLAB Routines for Subset Selection and Ridged Regression in Linear Neural Networks Centre for Cognitive Science: Edinburgh University 1996.

10.1023/A:1020280627193

10.1002/9783527613106

Frisch MJ, 1998, Gaussian 98

Kier LB, 1986, Molecular Connectivity in Structure Activity Analysis

Hansch C, 1995, Hydrophobic Electronic and Steric Constants, Vol. 2 of Exploring QSAR, 217

Kier LB, 1986, Moleculatr Connectivity in Chemistry and Drug Research, 43

10.1021/ci0001031

10.1021/ci970424l

HyperChem™ version 7.0. Molecular Modeling System Hypercube: Waterloo Ont. Canada 2000.

10.1021/ci960049h

10.1021/ci9502461