Proposing a highly accurate protein structural class predictor using segmentation-based features

Springer Science and Business Media LLC - Tập 15 - Trang 1-13 - 2014
Abdollah Dehzangi1,2, Kuldip Paliwal3, James Lyons3, Alok Sharma1,4, Abdul Sattar1,2
1Institute for Integrated and Intelligent Systems (IIIS), Griffith University, Brisbane, Australia
2National ICT Australia (NICTA), Brisbane, Australia
3School of Engineering, Griffith University, Brisbane, Australia
4School of Engineering, The University of the South Pacific, Suva, Fiji, Fiji

Tóm tắt

Prediction of the structural classes of proteins can provide important information about their functionalities as well as their major tertiary structures. It is also considered as an important step towards protein structure prediction problem. Despite all the efforts have been made so far, finding a fast and accurate computational approach to solve protein structural class prediction problem still remains a challenging problem in bioinformatics and computational biology. In this study we propose segmented distribution and segmented auto covariance feature extraction methods to capture local and global discriminatory information from evolutionary profiles and predicted secondary structure of the proteins. By applying SVM to our extracted features, for the first time we enhance the protein structural class prediction accuracy to over 90% and 85% for two popular low-homology benchmarks that have been widely used in the literature. We report 92.2% and 86.3% prediction accuracies for 25PDB and 1189 benchmarks which are respectively up to 7.9% and 2.8% better than previously reported results for these two benchmarks. By proposing segmented distribution and segmented auto covariance feature extraction methods to capture local and global discriminatory information from evolutionary profiles and predicted secondary structure of the proteins, we are able to enhance the protein structural class prediction performance significantly.

Tài liệu tham khảo

Chothia C: The nature of the accessible and buried surfaces in proteins. Journal of Molecular Biology. 1976, 105 (1): 1-12. 10.1016/0022-2836(76)90191-1. Chou KC: Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics. 2005, 21 (1): 10-19. 10.1093/bioinformatics/bth466. Chou KC, Zhang CT: Prediction of protein structural classes. Critical Reviews in Biochemistry and Molecular Biology. 1995, 30: 275-349. 10.3109/10409239509083488. Chou KC: Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology. 2011, 273 (1): 236-247. 10.1016/j.jtbi.2010.12.024. Zhang S, Ding S, Wang T: High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure. Biochimie. 2011, 93 (4): 710-714. 10.1016/j.biochi.2011.01.001. Ding S, Zhang S, Li Y, Wang T: A novel protein structural classes prediction method based on predicted secondary structure. Biochimie. 2012, 94 (5): 1166-1171. 10.1016/j.biochi.2012.01.022. Li ZC, Zhou XB, Dai Z, Zou XY: Prediction of protein structural classes by chou's pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis. Amino Acids. 2009, 37: 415-425. 10.1007/s00726-008-0170-2. Wang ZX, Yuan Z: How good is prediction of protein structural class by the component-coupled method?. Proteins: Structure, Function, and Bioinformatics. 2000, 38 (2): 165-175. 10.1002/(SICI)1097-0134(20000201)38:2<165::AID-PROT5>3.0.CO;2-V. Cai YD, Feng K, Lu W, Chou K: Using logitboost classifier to predict protein structural classes. Theoretical Biollogy. 2006, 238: 172-176. Feng KY, Cai YD, Chou KC: Boosting classifier for predicting protein domain structural class. Biochemical and Biophysical Research Communications. 2005, 334 (1): 213-217. 10.1016/j.bbrc.2005.06.075. Niu B, Cai YD, Lu WC, Li GZ, Chou KC: Predicting protein structural class with adaboost learner. Protein and Peptide Letters. 2006, 13 (5): 489-492. 10.2174/092986606776819619. Dehzangi A, Karamizadeh S: Solving protein fold prediction problem using fusion of heterogeneous classifiers. INFORMATION, An International Interdisciplinary Journal. 2011, 14 (11): 3611-3622. Dehzangi A, Phon-Amnuaisuk S, Manafi M, Safa S: Using rotation forest for protein fold prediction problem: An empirical study. Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. 2010, 217-227. Anand A, Pugalenthi G, Suganthan PN: Predicting protein structural class by svm with class-wise optimized features and decision probabilities. Journal of Theoretical Biology. 2008, 253 (2): 375-380. 10.1016/j.jtbi.2008.02.031. Li ZC, Zhou XB, Lin YR, Zou XY: Prediction of protein structure class by coupling improved genetic algorithm and support vector machine. Amino Acids. 2008, 35 (3): 581-590. 10.1007/s00726-008-0084-z. Liu T, Zheng X, Wang J: Prediction of protein structural class for low-similarity sequences using support vector machine and psi-blast profile. Biochimie. 2010, 92 (10): 1330-1334. 10.1016/j.biochi.2010.06.013. Dehzangi A, Sattar A: Protein fold recognition using segmentation-based feature extraction model. Proceedings of the 5th Asian Conference on Intelligent Information and Database Systems. 2013, ACIIDS05 Springer ???, 345-354. Cai YD, Zhou GP: Prediction of protein structural classes by neural network. Biochimie. 2000, 82 (8): 783-785. 10.1016/S0300-9084(00)01161-5. Jahandideh S, Abdolmaleki P, Jahandideh M, Asadabadi EB: Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophysical Chemistry. 2007, 128 (1): 87-93. 10.1016/j.bpc.2007.03.006. Jahandideh S, Abdolmaleki P, Jahandideh M, Hayatshahi SHS: Novel hybrid method for the evaluation of parameters contributing in determination of protein structural classes. Journal of Theoretical Biology. 2007, 244 (2): 275-281. 10.1016/j.jtbi.2006.08.011. Chen K, Kurgan LA, Ruan J: Prediction of protein structural class using novel evolutionary collocation-based sequence representation. Journal of Computational Chemistry. 2008, 29 (10): 1596-1604. 10.1002/jcc.20918. Dehzangi A, Paliwal KK, Sharma A, Dehzangi O, Sattar A: A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem. IEEE Transaction on Computational Biology and Bioinformatics (TCBB). 2013 Kedarisetti KD, Kurgan LA, Dick S: Classifier ensembles for protein structural class prediction with varying homology. Biochemical and Biophysical Research Communications. 2006, 348 (3): 981-988. 10.1016/j.bbrc.2006.07.141. Yang JY, Peng ZL, Chen X: Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC Bioinformatics. 2010, 11 (Suppl 1): 9-10.1186/1471-2105-11-S1-S9. Dehzangi A, Phon-Amnuaisuk S, Dehzangi O: Enhancing protein fold prediction accuracy by using ensemble of different classifiers. Australian Journal of Intelligent Information Processing Systems. 2010, 26 (4): 32-40. Liu T, Geng X, Zheng X, Li R, Wang J: Accurate prediction of protein structural class using auto covariance transformation of psi-blast profiles. Amino Acids. 2012, 42: 2243-2249. 10.1007/s00726-011-0964-5. Mizianty M, Kurgan LA: Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences. BMC Bioinformatics. 2009, 10 (1): 414-10.1186/1471-2105-10-414. Cai YD, Liu XJ, Xu XB, Zhou GP: Support vector machines for predicting protein structural class. BMC Bioinformatics. 2001, 2 (1): 3-10.1186/1471-2105-2-3. Deschavanne P, Tuffery P: Exploring an alignment free approach for protein classification and structural class prediction. Biochimie. 2008, 90 (4): 615-625. 10.1016/j.biochi.2007.11.004. Zhou GP: An intriguing controversy over protein structural class prediction. Journal of Protein Chemistry. 1998, 17: 729-738. 10.1023/A:1020713915365. Chou KC: Prediction of protein structural classes and subcellular locations. Current Protein and Peptide Science. 2000, 1: 171-208. 10.2174/1389203003381379. Ding YS, Zhang TL, Chou KC: Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein and Peptide Letters. 2007, 14 (8): 811-815. 10.2174/092986607781483778. Kurgan LA, Chen K: Prediction of protein structural class for the twilight zone sequences. Biochemical and Biophysical Research Communications. 2007, 357 (2): 453-460. 10.1016/j.bbrc.2007.03.164. Cao YF, Liu S, Zhang L, Qin J, Wang J, Tang K: Prediction of protein structural class with rough sets. BMC Bioinformatics. 2006, 7 (1): 20-10.1186/1471-2105-7-20. Sharma A, Paliwal KK, Dehzangi A, Lyons J, Imoto S, Miyano S: A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition. BMC Bioinformatics. 2013, 14 (233): 11- Dehzangi A, Phon-Amnuaisuk S: Fold prediction problem: The application of new physical and physicochemical-based features. Protein and Peptide Letters. 2011, 18 (2): 174-185. 10.2174/092986611794475101. Kurgan LA, Zhang T, Zhang H, Shen S, Ruan J: Secondary structure-based assignment of the protein structural classes. Amino Acids. 2008, 35: 551-564. 10.1007/s00726-008-0080-3. Yang JY, Peng ZL, Yu ZG, Zhang RJ, Anh V, Wang D: Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. Journal of Theoretical Biology. 2009, 257 (4): 618-626. 10.1016/j.jtbi.2008.12.027. Dehzangi A, Paliwal KK, Lyons J, Sharma A, Sattar A: Enhancing protein fold prediction accuracy using evolutionary and structural features. Proceeding of the Eighth IAPR International Conference on Pattern Recognition in Bioinformatics. PRIB. 2013, 196-207. Dehzangi A, Paliwal KK, Lyons J, Sharma A, Sattar A: Exploring potential discriminatory information embedded in pssm to enhance protein structural class prediction accuracy. Proceeding of the Eighth IAPR International Conference on Pattern Recognition in Bioinformatics. PRIB. 2013, 208-219. Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research. 1997, 17: 3389-3402. Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology. 1999, 292 (2): 195-202. 10.1006/jmbi.1999.3091. Shen HB, Song JN, Chou KC: Prediction of protein folding rates from primary sequence by fusing multiple sequential features. Biomedical Science and Engineering. 2009, 2: 136-143. 10.4236/jbise.2009.23024. Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y: Spine x: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. Journal of Computational Chemistry. 2012, 33 (3): 259-267. 10.1002/jcc.21968. Kurgan LA, Homaeian L: Prediction of structural classes for protein sequences and domains - impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Pattern Recognition. 2006, 39: 2323-2343. 10.1016/j.patcog.2006.02.014. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The protein data bank. Nucleic Acids Research. 2000, 28 (1): 235-242. 10.1093/nar/28.1.235. Murzin AG, Brenner SE, Hubbard T, Chothia C: Scop: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology. 1995, 247 (4): 536-540. Liu T, Jia C: A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. Journal of Theoretical Biology. 2010, 267 (3): 272-275. 10.1016/j.jtbi.2010.09.007. Sharma A, Lyons J, Dehzangi A, Paliwal KK: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. Journal of Theoretical Biology. 2013, 320 (0): 41-46. Chou KC: Progress in protein structural class prediction and its impact to bioinformatics and proteomics. Current Protein and Peptide Science. 2005, 6: 423-436. 10.2174/138920305774329368. Vapnik VN: The Nature of Statistical Learning Theory. 1995, Springer, ??? Chang CC, Lin CJ: Libsvm: a library for support vector machines. 2001 Costantini S, Facchiano AM: Prediction of the protein structural class by specific peptide frequencies. Biochimie. 2009, 91 (2): 226-229. 10.1016/j.biochi.2008.09.005. Zhang S, Ye F, Yuan X: Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via pssm. Journal of Biomolecular Structure and Dynamics. 2012, 29 (6): 1138-1146. 10.1080/07391102.2011.672627. Kurgan LA, Cios KJ, Chen K: Scpred: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences. BMC Bioinformatics. 2008, 9: 226-10.1186/1471-2105-9-226. Zhang TL, Ding YS, Chou KC: Prediction protein structural classes with pseudo amino acid composition: approximate entropy and hydrophobicity pattern. Theoretical Biology. 2008, 250: 186-193. 10.1016/j.jtbi.2007.09.014. Qiu JD, Luo SH, Huang JH, Liang RP: Using support vector machines for prediction of protein structural classes based on discrete wavelet transform. Journal of Computational Chemistry. 2009, 30 (8): 1344-1350. 10.1002/jcc.21115. Chen C, Zhou X, Tian Y, Zou X, Cai P: Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Analytical Biochemistry. 2006, 357 (1): 116-121. 10.1016/j.ab.2006.07.022.