Genetic algorithm optimization in drug design QSAR: Bayesian-regularized genetic neural networks (BRGNN) and genetic algorithm-optimized support vectors machines (GA-SVM)

Michael Fernández1, Julio Caballero2, Leyden Fernández3, Akinori Sarai1
1Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), Iizuka, Japan
2Centro de Bioinformatica y Simulacion Molecular, Universidad de Talca, Talca, Chile
3Barcelona Supercomputing Center—Centro Nacional de Supercomputación, Barcelona, Spain

Tóm tắt

Từ khóa


Tài liệu tham khảo

Gasteiger J (2006) Chemoinformatics: a new field with a long tradition. Anal Bioanal Chem 384: 57–64. doi: 10.1007/s00216-005-0065-y

Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110: 5959–5967. doi: 10.1021/ja00226a005

Klebe G, Abraham U, Mietzner T (1994) Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. J Med Chem 37: 4130–4146. doi: 10.1021/jm00050a010

Folkers G, Merz A, Rognan D (1993) CoMFA: scope and limitations. In: Kubinyi H (eds) 3D-QSAR in drug design. Theory, methods and applications. ESCOM Science Publishers BV, Leiden, pp 583–618

Hansch C, Kurup A, Garg R, Gao H (2001) Chem-bioinformatics and QSAR: a review of QSAR lacking positive hydrophobic terms. Chem Rev 101: 619–672. doi: 10.1021/cr0000067

Sabljic A (1990) Topological indices and environmental chemistry. In: Karcher W, Devillers J (eds) Practical applications of quantitative structure–activity relationships (QSAR) in environmental chemistry and toxicology. Kluwer, Dordrecht, pp 61–82

Karelson M, Lobanov VS, Katritzky AR (1996) Quantum-chemical descriptors in QSAR/QSPR studies. Chem Rev 96: 1027–1043. doi: 10.1021/cr950202r

Livingstone DJ, Manallack DT, Tetko IV (1997) Data modelling with neural networks: advantages and limitations. J Comput Aid Mol Des 11: 135–142. doi: 10.1023/A:1008074223811

Burbidge R, Trotter M, Buxton B, Holden S (2001) Drug design by machine learning: support vector machines for pharmaceutical data analysis. Comput Chem 26: 5–14. doi: 10.1016/S0097-8485(01)00094-8

Caballero J, Fernández M (2006) Linear and non-linear modeling of antifungal activity of some heterocyclic ring derivatives using multiple linear regression and Bayesian-regularized neural networks. J Mol Model 12: 168–181. doi: 10.1007/s00894-005-0014-x

Holland H (1975) Adaption in natural and artificial systems. The University of Michigan Press, Ann Arbor, MI

Cartwright HM (1993) Applications of artificial intelligence in chemistry. Oxford University Press, Oxford

Cho SJ, Hermsmeier MA (2002) Genetic algorithm guided selection: variable selection and subset selection. J Chem Inf Comput Sci 42: 927–936. doi: 10.1021/ci010247v

Duchowicz PR, Vitale MG, Castro EA, Fernandez M, Caballero J (2007) QSAR analysis for heterocyclic antifungals. Bioorg Med Chem 15: 2680–2689. doi: 10.1016/j.bmc.2007.01.039

Fernández M, Caballero J, Fernández L, Abreu JI, Garriga M (2007) Protein radial distribution function (P-RDF) and Bayesian-regularized genetic neural networks for modeling protein conformational stability: chymotrypsin inhibitor 2 mutants. J Mol Graph Model 26: 748–759. doi: 10.1016/j.jmgm.2007.04.011

Caballero J, Garriga M, Fernández M (2005) Genetic neural network modeling of the selective inhibition of the intermediate-conductance Ca2+-activated K+ channel by some triarylmethanes using topological charge indexes descriptors. J Comput Aid Mol Des 19: 771–789. doi: 10.1007/s10822-005-9025-z

Caballero J, Garriga M, Fernández M (2006) 2D Autocorrelation modeling of the negative inotropic activity of calcium entry blockers using Bayesian-regularized genetic neural networks. Bioorg Med Chem 14: 3330–3340. doi: 10.1016/j.bmc.2005.12.048

Caballero J, Tundidor-Camba A, Fernández M (2007) Modeling of the inhibition constant (K i ) of some Cruzain ketone-based inhibitors using 2D spatial autocorrelation vectors and data-diverse ensembles of Bayesian-regularized genetic neural networks. QSAR Comb Sci 26: 27–40. doi: 10.1002/qsar.200610001

Fernández M, Tundidor-Camba A, Caballero J (2005) Modeling of cyclin-dependent kinase inhibition by 1H-pyrazolo [3,4-d] pyrimidine derivatives using artificial neural networks ensembles. J Chem Inf Model 45: 1884–1895. doi: 10.1021/ci050263i

Fernández M, Caballero J (2006) Bayesian-regularized genetic neural networks applied to the modeling of non-peptide antagonists for the human luteinizing hormone-releasing hormone receptor. J Mol Graph Model 25: 410–422. doi: 10.1016/j.jmgm.2006.02.005

Fernandez M, Carreiras MC, Marco JL, Caballero J (2006) Modeling of acetylcholinesterase inhibition by tacrine analogues using Bayesian-regularized genetic neural networks and ensemble averaging. J Enzym Inhib Med Chem 21: 647–661. doi: 10.1080/14756360600862366

Fernández M, Caballero J (2006) Modeling of activity of cyclic urea HIV-1 protease inhibitors using regularized-artificial neural networks. Bioorg Med Chem 14: 280–294. doi: 10.1016/j.bmc.2005.08.022

Fernández M, Caballero J, Tundidor-Camba A (2006) Linear and nonlinear QSAR study of N-hydroxy-2-[(phenylsulfonyl)amino]acetamide derivatives as matrix metalloproteinase inhibitors. Bioorg Med Chem 14: 4137–4150. doi: 10.1016/j.bmc.2006.01.072

Fernández M, Caballero J (2006) Ensembles of Bayesian-regularized genetic neural networks for modeling of acetylcholinesterase inhibition by huprines. Chem Biol Drug Des 68: 201–212. doi: 10.1111/j.1747-0285.2006.00435.x

González MP, Caballero J, Tundidor-Camba A, Helguera AM, Fernández M (2006) Modeling of farnesyltransferase inhibition by some thiol and non-thiol peptidomimetic inhibitors using genetic neural networks and RDF approaches. Bioorg Med Chem 14: 200–213. doi: 10.1016/j.bmc.2005.08.009

Di Fenza A, Alagona G, Ghio C, Leonardi R, Giolitti A, Madami A (2007) Caco-2 cell permeability modelling: a neural network coupled genetic algorithm approach. J Comput Aid Mol Des 21: 207–221. doi: 10.1007/s10822-006-9098-3

So S, Karplus M (1996) Evolutionary optimization in quantitative structure–activity relationship: an application of genetic neural networks. J Med Chem 39: 1521–1530. doi: 10.1021/jm9507035

Gao H (2001) Application of BCUT metrics and genetic algorithm in binary QSAR analysis. J Chem Inf Comput Sci 41: 402–407. doi: 10.1021/ci000306p

Fernández M, Fernández L, Abreu JI, Garriga M (2008) Classification of voltage-gated K(+) ion channels from 3D pseudo-folding graph representation of protein sequences using genetic algorithm-optimized support vector machines. J Mol Graph Model 26: 1306–1314. doi: 10.1016/j.jmgm.2008.01.001

Caballero J, Fernández L, Garriga M, Abreu JI, Collina S, Fernández M (2007) Proteometric study of ghrelin receptor function variations upon mutations using amino acid sequence autocorrelation vectors and genetic algorithm-based least square support vector machines. J Mol Graph Model 26: 166–178. doi: 10.1016/j.jmgm.2006.11.002

Hemmateenejad B, Miri R, Akhond M, Shamsipur M (2002) QSAR study of the calcium channel antagonist activity of some recently synthesized dihydropyridine derivatives. An application of genetic algorithm for variable selection in MLR and PLS methods. Chemom Intell Lab 64: 91–99. doi: 10.1016/S0169-7439(02)00068-0

Hemmateenejad B, Akhond M, Miri R, Shamsipur M (2003) Genetic algorithm applied to the selection of factors in principal component-artificial neural networks: application to QSAR study of calcium channel antagonist activity of 1,4-dihydropyridines (nifedipine analogous). J Chem Inf Comput Sci 43: 1328–1334. doi: 10.1021/ci025661p

Hemmateenejad B (2004) Optimal QSAR analysis of the carcinogenic activity of drugs by correlation ranking and genetic algorithm-based PCR. J Chemom 18: 475–485. doi: 10.1002/cem.891

Yamashita F, Wanchana S, Hashida M (2002) Quantitative structure/property relationship analysis of caco-2 permeability using a genetic algorithm-based partial least squares method. J Pharm Sci 91: 2230–2238. doi: 10.1002/jps.10214

Selwood DL, Livingstone DJ, Comley JCW, O’Dowd AB, Hudson AT, Jackson P, Jandu KS, Rose VS, Stables JN (1990) Structure–activity relationships of antifilarial antimycin analogues: a multivariate pattern recognition study. J Med Chem 33: 136–142. doi: 10.1021/jm00163a023

Ren Y, Liu H, Li S, Yao X, Liu M (2007) Prediction of binding affinities to b1 isoform of human thyroid hormone receptor by genetic algorithm and projection pursuit regression. Bioorg Med Chem Lett 17: 2474–2482. doi: 10.1016/j.bmcl.2007.02.025

Turner DB, Willett P (2000) Evaluation of the EVA descriptor for QSAR studies: 3. The use of a genetic algorithm to search for models with enhanced predictive properties (EVA_GA). J Comput Aid Mol Des 14: 1–21. doi: 10.1023/A:1008180020974

Xue L, Bajorath J (2000) Molecular descriptors for effective classification of biologically active compounds based on principal component analysis identified by a genetic algorithm. J Chem Inf Comput Sci 40: 801–809. doi: 10.1021/ci000322m

Kamphausen S, Höltge N, Wirsching F, Morys-Wortmann C, Riester D, Goetz R, Thürk M, Schwienhorst A (2002) Genetic algorithm for the design of molecules with desired properties. J Comput Aid Mol Des 16: 551–567. doi: 10.1023/A:1021928016359

Guo W, Cai W, Shao X, Pan Z (2005) Application of genetic stochastic resonance algorithm to quantitative structure–activity relationship study. Chemom Intell Lab 75: 181–188. doi: 10.1016/j.chemolab.2004.07.004

Teixido M, Belda I, Rosello X, Gonzalez S, Fabrec M, Llora X, Bacardite J, Garrelle JM, Vilaro S, Albericio F, Giralta E (2003) Development of a genetic algorithm to design and identify peptides that can cross the blood–brain barrier 1. Design and validation in silico. QSAR Comb Sci 22: 745–753. doi: 10.1002/qsar.200320004

So SS, Karplus M (1997) Three-dimensional quantitative structure–activity relationships from molecular similarity matrices and genetic neural networks: 1. Method and validations. J Med Chem 40: 4347–4359. doi: 10.1021/jm970487v

So SS, Karplus M (1997) Three-dimensional quantitative structure–activity relationships from molecular similarity matrices and genetic neural networks: 2. Applications. J Med Chem 40: 4360–4371. doi: 10.1021/jm970488n

Chiu TL, So SS (2003) Genetic neural networks for functional approximation. QSAR Comb Sci 22: 519–526. doi: 10.1002/qsar.200310004

Patankar SJ, Jurs PC (2000) Prediction of IC50 values for ACAT inhibitors from molecular structure. J Chem Inf Comput Sci 40: 706–723. doi: 10.1021/ci990125r

Kauffman GW, Jurs PC (2000) Prediction of inhibition of the sodium ion-proton antiporter by benzoylguanidine derivatives from molecular structure. J Chem Inf Comput Sci 40: 753–761. doi: 10.1021/ci9901237

Kauffman GW, Jurs PC (2001) QSAR and k-nearest neighbor classification analysis of selective cyclooxygenase-2 inhibitors using topologically-based numerical descriptors. J Chem Inf Comput Sci 41: 1553–1560. doi: 10.1021/ci010073h

Mattioni BE, Jurs PC (2002) Development of quantitative structure–activity relationship and classification models for a set of carbonic anhydrase inhibitors. J Chem Inf Comput Sci 42: 94–102. doi: 10.1021/ci0100696

Bakken GA, Jurs PC (2001) QSARs for 6-azasteroids as inhibitors of human type 1 5alpha-reductase: prediction of binding affinity and selectivity relative to 3-BHSD. J Chem Inf Comput Sci 41: 1255–1265. doi: 10.1021/ci010036q

Patankar SJ, Jurs PC (2002) Prediction of glycine/NMDA receptor antagonist inhibition from molecular structure. J Chem Inf Comput Sci 42: 1053–1068. doi: 10.1021/ci010114+

Burden FR, Winkler DA (1999) Robust QSAR models using Bayesian regularized neural networks. J Med Chem 42: 3183–3187. doi: 10.1021/jm980697n

Winkler DA, Burden R (2004) Bayesian neural nets for modeling in drug discovery. Biosilico 2: 104–111. doi: 10.1016/S1741-8364(04)02393-5

MATLAB 7.0. Program (2004) MathWorks Inc., Natick. http://www.mathworks.com

The MathWorks Inc: (2004) Genetic algorithm and direct search toolbox user’s guide for use with MATLAB. The Mathworks Inc., Natick

The MathWorks Inc.: (2004) Neural network toolbox user’s guide for use with MATLAB. The Mathworks Inc., Natick

Mackay DJC (1992) A practical Bayesian framework for backpropagation networks. Neural Comput 4: 448–472. doi: 10.1162/neco.1992.4.3.448

Mackay DJC (1992) Bayesian interpolation. Neural Comput 4: 415–447

Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20: 273–297

Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discipl 2: 1–47. doi: 10.1023/A:1009715923555

Fröhlich H, Chapelle O, Schölkopf B (2003) Feature selection for support vector machines by means of genetic algorithms. In: Proceedings of 15th IEEE international conference on tools with AI, Sacramento, CA, USA, pp 142–148. doi: 10.1109/TAI.2003.1250182

Yang SY, Huang Q, Li LL, Ma CY, Zhang H, Bai R, Teng QZ, Xiang ML, Wei YQ (2009) An integrated scheme for feature selection and parameter setting in the support vector machine modeling and its application to the prediction of pharmacokinetic properties of drugs. Artif Intell Med 46: 155–163. doi: 10.1016/j.artmed.2008.07.001

Ma CY, Yang SY, Zhang H, Xiang ML, Huang Q, Wei YQ (2008) Prediction models of human plasma protein binding rate and oral bioavailability derived by using GA-CG-SVM method. J Pharmaceut Biomed 47: 677–682. doi: 10.1016/j.jpba.2008.03.023

Zhang H, Chen QY, Xiang ML, Ma CY, Huang Q, Yang SY (2009) in silico prediction of mitochondrial toxicity by using GA-CG-SVM approach. Toxicol in Vitro 23:134–140. doi: 10.1016/j.tiv.2008.09.017

Chih-Chung C, Chih-Jen L (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm

Golbraikh A, Tropsha A (2002) Beware of q2!. J Mol Graph Model 20: 269–276. doi: 10.1016/S1093-3263(01)00123-1

Afantitis A, Melagraki G, Sarimveis H, Igglessi-Markopoulou O, Kollias G (2009) A novel QSAR model for predicting the inhibition of CXCR3 receptor by 4-N-aryl-[1,4] diazepane ureas. Eur J Med Chem 44: 877–884. doi: 10.1016/j.ejmech.2008.05.028

Agrafiotis DK, Cedeño W, Lobanov VS (2002) On the use of neural network ensembles in QSAR and QSPR. J Chem Inf Comput Sci 42: 903–911. doi: 10.1021/ci0203702

Caballero J, Fernández L, Abreu JI, Fernández M (2006) Amino acid sequence autocorrelation vectors and ensembles of Bayesian-regularized genetic neural networks for prediction of conformational stability of human lysozyme mutants. J Chem Inf Model 46: 1255–1268. doi: 10.1021/ci050507z

Fernández L, Caballero J, Abreu JI, Fernández M (2007) Amino acid sequence autocorrelation vectors and bayesian-regularized genetic neural networks for modeling protein conformational stability: gene V protein mutants. Proteins 67: 834–852. doi: 10.1002/prot.21349

MOPAC 6.0. (1993) Frank J. Seiler Research Laboratory. US Air Force Academy, Springs, CO

Fernández M, Caballero J (2007) QSAR models for predicting the activity of non-peptide luteinizing hormone-releasing hormone (LHRH) antagonists derived from erythromycin A using quantum chemical properties. J Mol Model 13: 465–476. doi: 10.1007/s00894-006-0163-6

Fernández M, Caballero J (2007) QSAR modeling of matrix metalloproteinase inhibition by N-hydroxy-α-phenylsulfonylacetamide derivatives. Bioorg Med Chem 15: 6298–6310. doi: 10.1016/j.bmc.2007.06.014

Fatemi MH, Gharaghani S (2007) A novel QSAR model for prediction of apoptosis-inducing activity of 4-aryl-4-H-chromenes based on support vector machine. Bioorg Med Chem 15: 7746–7754. doi: 10.1016/j.bmc.2007.08.057

Todeschini R, Consonni V, Pavan M (2002) DRAGON, version 2.1. Talete SRL, Milan, Italy

Cerius2, Version 4.11, http://www.accelrys.com

VCCLAB, Virtual Computational Chemistry Laboratory (2005) http://www.vcclab.org

Fernandez M, Abreu JI (2006) PROTMETRICS; version 1.0. Molecular Modeling Group University of Matanzas, Matanzas, Cuba

Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28: 374–374. doi: 10.1093/nar/28.1.374

Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43: 59–69. doi: 10.1007/BF00337288

Dykens JA, Will Y (2007) The significance of mitochondrial toxicity testing in drug development. Drug Discov Today 12: 777–785. doi: 10.1016/j.drudis.2007.07.013

Foye WO (1995) Cancer chemotherapeutic agents. American Chemical Society, Washington, DC

Ashkenazi A, Dixit VM (1998) Death receptors: signaling and modulation. Science 281: 1305–1308. doi: 10.1126/science.281.5381.1305

Nagata S (1997) Apoptosis by death factor. Cell 88: 355–365. doi: 10.1016/S0092-8674(00)81874-7

Bartus RT, Dean RL, Beer B, Lippa AS (1982) The cholinergic hypothesis of geriatric memory dysfunction. Science 217: 408–417. doi: 10.1126/science.7046051

Radic Z, Reiner E, Taylor P (1991) Role of the peripheral anionic site on acetylcholinesterase: inhibition by substrates and coumarin derivatives. Mol Pharmacol 39: 98–104

Pang YP, Quiram P, Jelacic T, Hong F, Brimijoin S (1996) Highly potent, selective, and low cost bis-tetrahydroaminacrine inhibitors of acetylcholinesterase: steps towar novel drugs for treating Alzheimer’s disease. J Biol Chem 271: 23646–23649. doi: 10.1074/jbc.271.39.23646

Katz RA, Skalka AM (1994) The retroviral enzymes. Annu Rev Biochem 63: 133–173. doi: 10.1146/annurev.bi.63.070194.001025

Kempf DJ, Marsh KC, Denissen JF, McDonald E, Vasavanonda S, Flentge CA, Green BE, Fino L, Park CH, Kong XP, Wideburg NE, Saldivar A, Ruiz L, Kati WM, Sham HL, Robins T, Stewart KD, Hsu A, Plattner JJ, Leonard JM, Norbeck DW (1995) ABT-538 is a potent inhibitor of human immunodeficiency virus protease and has high oral bioavailability in humans. Proc Natl Acad Sci USA 92: 2484–2488. doi: 10.1073/pnas.92.7.2484

Reddy P, Ross J (1999) Amprenavir: a protease inhibitor for the treatment of patients with HIV-1 infection. Formulary 34: 567–577

Vacca JP, Dorsey BD, Schleif WA, Levin RB, McDaniel SL, Darke PL, Zugay J, Quintero JC, Blahy OM, Roth E, Sardana VV, Schlabach AJ, Graham PI, Condra JH, Gotlib L, Holloway MK, Lin J, Chen IW, Vastag K, Ostovic D, Anderson PS, Emini EA, Huff JR (1994) L-735,524: an orally bioavailable human immunodeficiency virus type 1 protease inhibitor. Proc Natl Acad Sci USA 91: 4096–4100. doi: 10.1073/pnas.91.9.4096

Castle NA (1999) Recent advances in the biology of small conductance calcium-activated potassium channels. Perspect Drug Discov Des 15: 131–154. doi: 10.1023/A:1017095519863

Vergara C, LaTorre R, Marrion NV, Adelman JP (1998) Calcium-activated potassium channels. Curr Opin Neurobiol 8: 321–329. doi: 10.1016/S0959-4388(98)80056-1

Wulff H, Miller MJ, Hänsel W, Grissmer S, Cahalan MD, Chandy KG (2000) Design of a potent and selective inhibitor of the intermediate-conductance Ca2+-activated K+ channel, IKCa1: a potential immunosuppressant. Proc Natl Acad Sci USA 97: 8151–8156. doi: 10.1073/pnas.97.14.8151

Engel JC, Doyle PS, Palmer J, Hsieh I, Bainton DF, McKerrow JH (1998) Growth arrest of T. cruzi by cysteine protease inhibitors is accompanied by alterations in Golgi complex and ER ultrastructure. J Cell Sci 111: 597–606

Zhang H, Xiang ML, Ma CY, Huang Q, Li W, Xie Y, Wei YQ, Yang SY (2009) Three-class classification models of logS and logP derived by using GA-CG-SVM approach. Mol Divers 13: 261–268. doi: 10.1007/s11030-009-9108-1

Ramosde Armas R, Gonzalez-Dıaz H, Molina R, Uriarte E (2004) Markovian backbone negentropies: molecular descriptors for protein research. I. predicting protein stability in Arc repressor mutants. Proteins 56: 715–723. doi: 10.1002/prot.20159

Gonzalez-Diaz H, Molina R, Uriarte E (2005) Recognition of stable protein mutants with 3D stochastic average electrostatic potentials. FEBS Lett 579: 4297–4301. doi: 10.1016/j.febslet.2005.06.065

González-Díaz H, Vilar S, Santana L, Uriarte E (2007) Medicinal chemistry and bioinformatics-current trends in drugs discovery with networks topological indices. Curr Top Med Chem 7: 1015–1029. doi: 10.2174/156802607780906771

Vilar S, Gonzalez-Diaz H, Santana L, Uriarte E (2008) QSAR model for alignment-free prediction of human breast cancer biomarkers based on electrostatic potentials of protein pseudofolding HP-lattice networks. J Comput Chem 29: 2613–2622. doi: 10.1002/jcc.21016

Munteanua CR, González-Díaz H, Magalhãesa AL (2008) Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. J Theor Biol 254: 476–482. doi: 10.1016/j.jtbi.2008.06.003

Fernández M, Caballero J, Fernández L, Abreu JI, Acosta G (2008) Classification of conformational stability of protein mutants from 3D pseudo folding graph representation of protein sequences using support vector machines. Proteins 70: 167–175. doi: 10.1002/prot.21524

Li ZC, Zhou XB, Lin YR, Zou XY (2008) Prediction of protein structure class by coupling improved genetic algorithm and support vector machine. Amino Acids 35: 581–590. doi: 10.1007/s00726-008-0084-z

Huang WL, Tung CW, Huang HL, Hwang SF, Ho SY (2007) ProLoc: prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features. BioSystems 90: 573–581. doi: 10.1016/j.biosystems.2007.01.001

Block P, Paern J, Huallermeier E, Sanschagrin P, Sotriffer CA, Klebe G (2006) Physicochemical descriptors to discriminate protein–protein interactions in permanent and transient complexes selected by means of machine learning algorithms. Proteins 65: 607–622. doi: 10.1002/prot.21104

Kernytsky A, Rost B (2009) Using genetic algorithms to select most predictive protein features. Proteins 75: 75–88. doi: 10.1002/prot.22211