Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

BMC Bioinformatics - Tập 10 - Trang 1-24 - 2009
Marcin J Mizianty1, Lukasz Kurgan1
1Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada

Tóm tắt

Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/ .

Tài liệu tham khảo

Chou KC, Wei D, Du Q, Sirois S, Zhong W: Progress in computational approach to drug development against SARS. Curr Med Chem 2006, 13(32):63–70. Chou KC: Structural bioinformatics and its impact to biomedical science. Curr Med Chem 2004, 11(21):05–34. Bujnicki JM: Protein-structure prediction by recombination of fragments. Chembiochem 2006, 7(1):19–27. 10.1002/cbic.200500235 Floudas CA: Computational methods in protein structure prediction. Biotechnol Bioeng 2007, 97(2):207–213. 10.1002/bit.21411 Kurgan LA, Cios KJ, Zhang H, Zhang T, Chen K, Shen S, Ruan J: Sequence-based methods for real value predictions of protein structure. Current Bioinformatics 2008, 3(3):183–196. 10.2174/157489308785909197 Rost B: Prediction in 1D: secondary structure, membrane helices, and accessibility. Methods Biochem Anal 2003, 44: 559–587. full_text Chou KC: Progress in protein structural class prediction and its impact to bioinformatics and proteomics. Curr Protein Pept Sci 2005, 6(5):423–436. 10.2174/138920305774329368 Gromiha MM, Selvaraj S, Thangakani AM: Statistical Method for Predicting Protein Unfolding Rates from Amino Acid Sequence. J Chem Inf Model 2006, 46(3):1503–1508. 10.1021/ci050417u Galzitskaya OV, Reifsnyder DC, Bogatyreva NS, Ivankov DN, Garbuzynskiy SO: More compact protein globules exhibit slower folding rates. Proteins 2008, 70(2):329–332. 10.1002/prot.21619 Gromiha MM: A statistical model for predicting protein folding rates from amino acid sequence with structural class information. J Chem Inf Model 2005, 45(2):494–501. 10.1021/ci049757q Huang LT, Gromiha MM: Analysis and prediction of protein folding rates using quadratic response surface models. J Comput Chem 2008, 29(10):1675–1683. 10.1002/jcc.20925 Istomin AY, Jacobs DJ, Livesay DR: On the role of structural class of a protein with two-state folding kinetics in determining correlations between its size, topology, and folding rate. Protein Sci 2007, 16(11):2564–2569. 10.1110/ps.073124507 Kuznetsov IB, Gou Z, Li R, Hwang S: Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins. Proteins 2006, 64(1):19–27. 10.1002/prot.20977 Gromiha MM, Suwa M: A simple statistical method for discriminating outer membrane proteins with better accuracy. Bioinformatics 2005, 21(7):961–968. 10.1093/bioinformatics/bti126 He H, McAllister G, Smith TF: Triage protein fold prediction. Proteins 2002, 48(4):654–663. 10.1002/prot.10194 Ding YS, Zhang TL, Gu Q, Zhao PY, Chou KC: Using Maximum Entropy Model to Predict Protein Secondary Structure with Single Sequence. Protein Pept Lett 2009, 16: 552–560. 10.2174/092986609788167833 Zhang Z, Sun ZR, Zhang CT: A new approach to predict the helix/strand content of globular proteins. J Theor Biol 2001, 208(1):65–78. 10.1006/jtbi.2000.2201 Chou KC: Energy-optimized structure of antifreeze protein and its binding mechanism. J Mol Biol 1992, 223: 509–517. 10.1016/0022-2836(92)90666-8 Carlacci L, Chou KC, Maggiora GM: A heuristic approach to predicting the tertiary structure of bovine somatotropin. Biochemistry 1991, 30: 4389–4398. 10.1021/bi00232a004 Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 2004, 32: D226–229. 10.1093/nar/gkh039 Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: A structural classification of protein database for the investigation of sequence and structures. J Mol Biol 1995, 247: 536–540. Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2007, (35 Database):D61–65. 10.1093/nar/gkl842 Ginalski K: Comparative modeling for protein structure prediction. Curr Opin Struct Biol 2006, 16(2):172–177. 10.1016/j.sbi.2006.02.003 Ruan J, Chen K, Tuszynski JA, Kurgan LA: Quantitative analysis of the conservation of the tertiary structure of protein segments . Protein J 2006, 25: 301–315. 10.1007/s10930-006-9016-5 Xiang Z: Advances in homology protein structure modeling. Curr Protein Pept Sci 2006, 7(3):217–227. 10.2174/138920306777452312 Altschul SF, Madden TL, Schäffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 17: 3389–3402. 10.1093/nar/25.17.3389 Yu YK, Gertz EM, Agarwala R, Schaffer AA, Altschul SF: Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches. Nucleic Acids Res 2006, 34: 5966–5973. 10.1093/nar/gkl731 Rost B: Twilight zone of protein sequence alignments. Protein Eng 1999, 2: 85–94. 10.1093/protein/12.2.85 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235 Kurgan LA, Chen K: Prediction of protein structural class for the twilight zone sequences. Biochem Biophys Res Co 2007, 357(2):453–460. 10.1016/j.bbrc.2007.03.164 Reinhardt A, Eisenberg D: DPANN: improved sequence to structure alignments following fold recognition. Proteins 2004, 56: 528–538. 10.1002/prot.20144 Tomii K, Hirokawa T, Motono C: Protein structure prediction using a variety of profile libraries and 3D verification. Proteins 2005, 61(S7):114–121. 10.1002/prot.20727 Chou KC, Watenpaugh KD, Heinrikson RL: A Model of the complex between cyclin-dependent kinase 5(Cdk5) and the activation domain of neuronal Cdk5 activator. Biochem Biophys Res Commun 1999, 259: 420–428. 10.1006/bbrc.1999.0792 Paiardini A, Bossa F, Pascarella S: Evolutionarily conserved regions and hydrophobic contacts at the superfamily level: The case of the fold-type I, pyri-doxal-5'-phosphate-dependent enzymes. Protein Sci 2004, 13: 2992–3005. 10.1110/ps.04938104 Zhang Y, Skolnick J: The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci USA 2005, 102: 1029–1034. 10.1073/pnas.0407152101 Dunbrack RL: Sequence comparison and protein structure prediction. Curr Opin Struct Biol 2006, 16(3):374–384. 10.1016/j.sbi.2006.05.006 Wu S, Zhang Y: MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 2008, 72(2):547–556. 10.1002/prot.21945 Cuff AL, Sillitoe I, Lewis T, Redfern OC, Garratt R, Thornton J, Orengo CA: The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Res 2009, (37 Database):D310-D314. 10.1093/nar/gkn877 Orengo C, Michie A, Jones D, Swindells M, Thornton J: CATH: a hierarchic classification of protein domain structures. Structure 1997, 5(8):1093–1108. 10.1016/S0969-2126(97)00260-8 Carpenter EP, Beis K, Cameron AD, Iwata S: Overcoming the challenges of membrane protein crystallography. Curr Opin Struct Biol 2008, 18(5):581–586. 10.1016/j.sbi.2008.07.001 Homaeian L, Kurgan L, Cios KJ, Ruan J, Chen K: Prediction of protein secondary structure content for the twilight zone sequences. Proteins 2007, 69(3):486–498. 10.1002/prot.21527 Lee S, Lee BC, Kim D: Prediction of protein secondary structure content using amino acid composition and evolutionary information. Proteins 2006, 62: 1107–1114. 10.1002/prot.20821 Kurgan LA, Zhang T, Zhang H, Shen S, Ruan J: Secondary structure based assignment of the protein structural classes. Amino Acids 2008, 35(3):551–564. 10.1007/s00726-008-0080-3 Chou KC: Prediction of protein cellular attributes using pseudo amino acid composition. Protein Struct Funct Gene 2001, 43: 246–255. 10.1002/prot.1035 Anand A, Pugalenthi G, Suganthan PN: Predicting protein structural class by SVM with class-wise optimized features and decision probabilities. J Theor Biol 2008, 253(2):375–380. 10.1016/j.jtbi.2008.02.031 Cai Y, Feng K, Lu W, Chou K: Using LogitBoost classifier to predict protein structural classes. J Theor Biol 2006, 238: 172–176. 10.1016/j.jtbi.2005.05.034 Cai Y, Liu X, Xu X, Chou K: Support vector machines for prediction of protein domain structural class. J Theor Biol 2003, 221: 115–120. 10.1006/jtbi.2003.3179 Cai YD, Liu XJ, Xu XB, Chou KC: Prediction of protein structural classes by support vector machines. J Comput Chem 2002, 26(3):293–296. 10.1016/S0097-8485(01)00113-9 Cao Y, Liu S, Zhang L, Qin J, Wang J, Tang K: Prediction of protein structural class with Rough Sets. BMC Bioinformatics 2006, 7: 20. 10.1186/1471-2105-7-20 Chen C, Tian YX, Zou XY, Cai PX, Mo JY: Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 2006, 243(3):444–448. 10.1016/j.jtbi.2006.06.025 Chou KC: A Novel Approach to Predicting Protein Structural Classes in a (20–1)-D Amino Acid Composition Space. Proteins 1995, 21: 319–344. 10.1002/prot.340210406 Costantini S, Facchiano AM: Prediction of the protein structural class by specific peptide frequencies. Biochimie 2009, 91(2):226–229. 10.1016/j.biochi.2008.09.005 Ding YS, Zhang TL, Chou KC: Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein Pept Lett 2007, 14(8):811–815. 10.2174/092986607781483778 Dong L, Yuan Y, Cai T: Using Bagging classifier to predict protein domain structural class. J Biomol Struct Dyn 2006, 24: 239–242. Du QS, Jiang ZQ, He WZ, Li DP, Chou KC: Amino Acid Principal Component Analysis (AAPCA) and its applications in protein structural class prediction. J Biomol Struct Dyn 2006, 23(6):635–640. Gu F, Chen H, Ni J: Protein structural class prediction based on an improved statistical strategy. BMC Bioinformatics 2008, 9(Suppl 6):5. 10.1186/1471-2105-9-S6-S5 Jahandideh S, Abdolmaleki P, Jahandideh M, Asadabadi EB: Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophys Chem 2007, 128(1):87–93. 10.1016/j.bpc.2007.03.006 Jahandideh S, Abdolmaleki P, Jahandideh M, Hayatshahi SHS: Novel hybrid method for the evaluation of parameters contributing in determination of protein structural classes. J Theor Biol 2007, 244: 275–281. 10.1016/j.jtbi.2006.08.011 Jin L, Fang W, Tang H: Prediction of protein structural classes by a new measure of information discrepancy. Comput Biol and Chem 2003, 27: 373–380. 10.1016/S1476-9271(02)00087-7 Kedarisetti K, Kurgan LA, Dick S: A Comment on 'Prediction of protein structural classes by a new measure of information discrepancy'. Comput Biol and Chem 2006, 30: 393–394. 10.1016/j.compbiolchem.2006.06.003 Li ZC, Zhou XB, Dai Z, Zou XY: Prediction of protein structural classes by Chou's pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis. Amino Acids 2009, 37(2):415–425. 10.1007/s00726-008-0170-2 Li ZC, Zhou XB, Lin YR, Zou XY: Prediction of protein structure class by coupling improved genetic algorithm and support vector machine. Amino Acids 2008, 35(3):581–590. 10.1007/s00726-008-0084-z Niu B, Cai YD, Lu WC, Li GZ, Chou KC: Predicting protein structural class with AdaBoost Learner. Protein Pept Lett 2006, 13(5):489–492. 10.2174/092986606776819619 Xiao X, Lin WZ, Chou KC: Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes. J Comput Chem 2008, 29(12):2018–2024. 10.1002/jcc.20955 Xiao X, Shao SH, Huang ZD, Chou KC: Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor. J Comput Chem 2006, 27(4):478–482. 10.1002/jcc.20354 Zhang TL, Ding YS: Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids 2007, 33(4):623–629. 10.1007/s00726-007-0496-1 Zhang TL, Ding YS, Chou KC: Prediction protein structural classes with pseudo amino acid composition: approximate entropy and hydrophobicity pattern. J Theor Biol 2008, 250: 186–193. 10.1016/j.jtbi.2007.09.014 Zheng X, Li C, Wang J: An information-theoretic approach to the prediction of protein structural class. J Comput Chem 2009, in press. Liu T, Zheng X, Wang J: Prediction of protein structural class using a complexity-based distance measure. Amino Acids 2009, in press. Zhou GP: An intriguing controversy over protein structural class prediction. J Protein Chem 1998, 17: 729–738. 10.1023/A:1020713915365 Feng KY, Cai YD, Chou KC: Boosting classifier for predicting protein domain structural class. Biochem Biophys Res Commun 2005, 334(1):213–217. 10.1016/j.bbrc.2005.06.075 Kedarisetti K, Kurgan LA, Dick S: Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Co 2006, 348(3):981–988. 10.1016/j.bbrc.2006.07.141 Kurgan LA, Homaeian L: Prediction of structural classes for protein sequences and domains - impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Pattern Recogn 2006, 39: 2323–2343. 10.1016/j.patcog.2006.02.014 Chen C, Chen LX, Zou XY, Cai PX: Predicting protein structural class based on multi-features fusion. J Theor Biol 2008, 253(2):388–392. 10.1016/j.jtbi.2008.03.009 Chen K, Kurgan L, Ruan J: Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J Comput Chem 2008, 29: 1596–1604. 10.1002/jcc.20918 Gupta R, Mittal A, Singh K: A time-series-based feature extraction approach for prediction of protein structural class. EURASIP J Bioinform Syst Biol 2008, 35451. Xiao X, Wang P, Chou KC: Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image. J Theor Biol 2008, 254(3):691–696. 10.1016/j.jtbi.2008.06.016 Yang JY, Peng ZL, Yu ZG, Zhang RJ, Anh V, Wang D: Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. J Theor Biol 2009, 257(4):618–626. 10.1016/j.jtbi.2008.12.027 Kurgan LA, Cios KJ, Chen K: SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences. BMC Bioinformatics 2008, 9: 226. 10.1186/1471-2105-9-226 Cai Y, Zhou G: Prediction of protein structural classes by neural network. Biochimie 2000, 82: 783–785. 10.1016/S0300-9084(00)01161-5 Wang ZX, Yuan Z: How good is the prediction of protein structural class by the component-coupled method. Proteins 2000, 38: 165–175. 10.1002/(SICI)1097-0134(20000201)38:2<165::AID-PROT5>3.0.CO;2-V Cai Y, Liu X, Xu X, Zhou G: Support vector machines for predicting protein structural class. BMC Bioinformatics 2001, 2: 3. 10.1186/1471-2105-2-3 Chen W, Zhang S, Yang H, Zhao K, Chou K: Prediction of seven protein structural classes by fusing multi-feature information including protein evolutionary conservation information. Proceedings of the Second International Conference on Bioinformatics and Biomedical Engineering: 16–18 May 2008; Shanghai 2008, 17–20. full_text Chou KC, Cai Y: Predicting protein structural class by functional domain composition. Biochem Biophys Res Commun 2004, 321: 1007–1009. 10.1016/j.bbrc.2004.07.059 Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L, Corpet F, Croning MD, Durbin R, Falquet L, Fleischmann W, Gouzy J, Hermjakob H, Hulo N, Jonassen I, Kahn D, Kanapin A, Karavidopoulou Y, Lopez R, Marx B, Mulder NJ, Oinn TM, Pagni M, Servant F, Sigrist CJ, Zdobnov EM: The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res 2001, 29: 37–40. 10.1093/nar/29.1.37 Elofsson A, von Heijne G: Membrane protein structure: prediction versus reality. Annu Rev Biochem 2007, 76: 125–140. 10.1146/annurev.biochem.76.052705.163539 Punta M, Forrest LR, Bigelow H, Kernytsky A, Liu J, Rost B: Membrane protein prediction methods. Methods 2007, 41(4):460–74. 10.1016/j.ymeth.2006.07.026 Bigelow H, Rost B: Online tools for predicting integral membrane proteins. In Membrane Proteomics: Methods and Protocols. Volume 528. Edited by: Peirce MJ, Wait R. New York: Humana Press; 2009:3–23. Marsden RL, McGuffin LJ, Jones DT: Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Sci 2002, 11(12):2814–2824. 10.1110/ps.0209902 Bryson K, Cozzetto D, Jones DT: Computer-assisted protein domain boundary prediction using the DomPred server. Curr Protein Pept Sci 2007, 8(2):181–8. 10.2174/138920307780363415 Raman P, Cherezov V, Caffrey M: The membrane protein data bank. Cell Mol Life Sci 2006, 63(1):36–51. 10.1007/s00018-005-5350-6 Majumdar I, Kinch LN, Grishin NV: A database of domain definitions for proteins with complex interdomain geometry. PLoS ONE 2009, 4(4):e5084. 10.1371/journal.pone.0005084 Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 2001, 305(3):567–580. 10.1006/jmbi.2000.4315 Jones D: Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics 2007, 23(5):538–544. 10.1093/bioinformatics/btl677 Jones D: Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999, 292(2):195–202. 10.1006/jmbi.1999.3091 Kim H, Park H: Prediction of protein relative solvent accessibility with support vector machines and long-range interaction 3D local descriptor. Protein Struct Funct Bioinformatics 2003, 54(3):557–562. 10.1002/prot.10602 Brenner S, Koehl P, Levitt M: The ASTRAL compendium for sequence and structure analysis. Nucleic Acids Res 2000, 28: 254–256. 10.1093/nar/28.1.254 Smith TF, Waterman MS: Identification of common molecular subsequences. J Mol Biol 1981, 147: 195–197. 10.1016/0022-2836(81)90087-5 Gotoh O: An improved algorithm for matching biological sequences. J Mol Biol 1982, 162: 705–708. 10.1016/0022-2836(82)90398-9 Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT: Protein structure prediction servers at University College London. Nucleic Acids Res 2005, (33 Web server):W36–38. 10.1093/nar/gki410 Zheng C, Kurgan LA: Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments. BMC Bioinformatics 2008, 9: 430. 10.1186/1471-2105-9-430 Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan LA: Sequence based residue depth prediction using evolutionary information and predicted secondary structure. BMC Bioinformatics 2008, 9: 388. 10.1186/1471-2105-9-388 Song J, Burrage K: Predicting residue-wise contact orders in proteins by support vector regression. BMC Bioinformatics 2006, 7: 425. 10.1186/1471-2105-7-425 Witten I, Frank E: Data Mining: Practical machine learning tools and techniques. 2nd edition. San Francisco: Morgan Kaufmann; 2005. Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2007, (35 Database):D61–5. 10.1093/nar/gkl842 Przybylski D, Rost B: Alignments grow, secondary structure prediction improves. Proteins 2002, 46: 197–205. 10.1002/prot.10029 Liu H, Setiono R: A probabilistic approach to feature selection - A filter solution. In Proceedings of the 13th International Conference on Machine Learning: 3–6 July 1996; Bari. Edited by: Saitta L. San Francisco: Morgan Kaufmann; 1996:319–327. Kohavi R, John GH: Wrappers for feature subset selection. Arti Intell 1997, 97(1–2):273–324. 10.1016/S0004-3702(97)00043-X Hall MA: Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of the Seventeenth International Conference on Machine Learning: 29 June - 2 July 2000; San Francisco. Edited by: Langley P. San Francisco: Morgan Kaufmann; 2000:359–366. Robnik-Sikonja M, Kononenko I: An adaptation of Relief for attribute estimation in regression. In Proceedings of the 14th International Conference on Machine Learning: 8–12 July 1997; Nashville. Edited by: Fisher DH. San Francisco: Morgan Kaufmann; 1997:296–304. Langley P: Selection of relevant features in machine learning. In Proceedings of the AAAI Fall Symposium on Relevance: 4–6 November 1994; New Orleans. Menlo Park: AAAI Press; 1994:140–144. Keerthi S, Shevade S, Bhattacharyya C, Murthy K: Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Comput 2001, 13(3):637–649. 10.1162/089976601300014493 John G, Langley P: Estimating Continuous Distributions in Bayesian Classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence: 18–20 August 1995; Montreal. Edited by: Besnard P, Hanks S. San Mateo: Morgan Kaufmann Publishers; 1995:338–345. Cessie S, Houwelingen J: Ridge estimators in logistic regression. Appl Stat 1992, 41(1):191–201. 10.2307/2347628 Aha D, Kibler D: Instance-based learning algorithms. Mach Learn 1991, 6: 37–66. Baldi P, Brunak S, Chauvin Y, Andersen C, Nielsen H: Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics 2000, 16: 412–424. 10.1093/bioinformatics/16.5.412 Costantini S, Colonna G, Facchiano AM: Amino acid propensities for secondary structures are influenced by the protein structural class. Biochem Biophys Res Co 2006, 342(2):441–451. 10.1016/j.bbrc.2006.01.159 von Heijne G: Principles of membrane protein assembly and structure. Prog Biophys Mol Biol 1996, 66(2):113–139. 10.1016/S0079-6107(97)85627-1 Amirova SR, Milchevsky JV, Filatov IV, Esipova NG, Tumanyan VG: Study and prediction of secondary structure for membrane proteins. J Biomol Struct Dyn 2007, 24(4):421–428. Punta M, Maritan A: A knowledge-based scale for amino acid membrane propensity. Proteins 2003, 50(1):114–121. 10.1002/prot.10247 Chou KC, Shen HB: Recent progresses in protein subcellular location prediction. Anal Biochem 2007, 370: 1–16. 10.1016/j.ab.2007.07.006