Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms

Computational Biology and Chemistry - Tập 91 - Trang 107456 - 2021
Xinyi Qin1, Min Liu1, Lu Zhang1, Guangzhong Liu1
1College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China

Tài liệu tham khảo

Altschul, 1997, Gapped blast and PSI-blast: a new generation of protein database search programs, Nucleic Acids Res., 25, 3389, 10.1093/nar/25.17.3389 Andreeva, 2020, The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures, Nucleic Acids Res., D1 Baldwin, 1991, Crystal structure of interleukin 8: symbiosis of NMR and crystallography, Proc. Natl. Acad. Sci., 88, 502, 10.1073/pnas.88.2.502 Berman, 2000, The protein data bank, Int. Tables Crystallogr., 67, 675 Bragg, 1976, The development of X-ray analysis, Contemp. Phys., 17, 103, 10.1080/00107517608210844 Breiman, 2001, Random forests, Mach. Learn., 45, 5, 10.1023/A:1010933404324 Chandonia, 2019, Scope: classification of large macromolecular structures in the structural classification of proteins-extended database, Nucleic Acids Res., 10.1093/nar/gky1134 Chandonia, 2004, The ASTRAL compendium in 2004, Nucleic Acids Res., 32, D189, 10.1093/nar/gkh034 Chen, 1986, Polynomial regression, Springer Texts Stat., 235 Chen, 2016, Profold: Protein fold classification with additional structural features and a novel ensemble classifier, BioMed Res. Int., 2016, 1 Chen, 2019, Classification of widely and rarely expressed genes with recurrent neural network, Comput. Struct. Biotechnol. J., 17, 49, 10.1016/j.csbj.2018.12.002 Chen, 2016, 785 Chou, 2010, Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins-Struct. Funct. Bioinf., 43, 246, 10.1002/prot.1035 Cohen, 1987, Prediction of the three-dimensional structure of human growth hormone, Proteins Struct. Funct. Bioinf., 22, 162, 10.1002/prot.340020209 Ding, 2001, Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics, 17, 349, 10.1093/bioinformatics/17.4.349 Dubchak, 1995, Prediction of protein folding class using global description of amino acid sequence, Proc. Natl. Acad. Sci. U. S. A., 92, 8700, 10.1073/pnas.92.19.8700 Feng, 2016, The recognition of multi-class protein folds by adding average chemical shifts of secondary structure elements, Saudi J. Biol. Sci., 23, 189, 10.1016/j.sjbs.2015.10.008 Friedman, 2001, Greedy function approximation: a gradient boosting machine, Ann. Stat., 29, 1189, 10.1214/aos/1013203451 Graves, 2013, Speech recognition with deep recurrent neural networks, Acoust. Speech Signal Process. Hearst, 1998, Support vector machines, IEEE Intell. Syst., 13, 18, 10.1109/5254.708428 Heffernan, 2015, Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning, Reports, 5, 11476 Hervé, 2010, Principal component analysis, Wiley Interdiscip. Rev. Comput. Stat., 2, 433, 10.1002/wics.101 Ibrahim, 2018, Protein fold recognition using deep kernelized extreme learning machine and linear discriminant analysis, Neural Comput. Appl., 1 Kabsch, 1983, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, Biopolymers, 22, 2577, 10.1002/bip.360221211 Kavousi, 2011, A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM, Comput. Biol. Chem., 35, 1, 10.1016/j.compbiolchem.2010.12.001 Ke, 2017, Lightgbm: a highly efficient gradient boosting decision tree, Conference on Neural Information Processing Systems Keller, 1985, A fuzzy k-nearest neighbor algorithm, IEEE Trans. Syst., Man, Cybern., SMC-15, 580, 10.1109/TSMC.1985.6313426 Li, 2018, Identification of synthetic lethality based on a functional network by using machine learning algorithms, J. Cell. Biochem., 120, 405, 10.1002/jcb.27395 Liang, 2015, Prediction of protein structural classes for low-similarity sequences based on consensus sequence and segmented PSSM, Comput. Math. Methods Med., 2015, 1, 10.1155/2015/370756 Lin, 2013, Hierarchical classification of protein folds using a novel ensemble classifier, PLoS One, 8, e56499, 10.1371/journal.pone.0056499 Liu, 1998, Incremental feature selection, Appl. Intell., 9, 217, 10.1023/A:1008363719778 Lyons, 2016, Protein fold recognition using HMM–HMM alignment and dynamic programming, J. Theor. Biol., 393, 67, 10.1016/j.jtbi.2015.12.018 Mehta, 2019, Predicting structural class for protein sequences of random forest algorithm, Comput. Biol. Chem., 84, 107164 Murzin, 1995, SCOP: a structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol., 247, 10.1016/S0022-2836(05)80134-2 Orengo, 1997, CATH – a hierarchic classification of protein domain structures, Structure, 5, 1093, 10.1016/S0969-2126(97)00260-8 Paliwal, 2014, A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition, NanoBioscience, 13, 44, 10.1109/TNB.2013.2296050 Powers, 2011, Evaluation: from Precision, Recall and F-measure to ROC, informedness, markedness and correlation, J. Mach. Learn. Technol., 2, 37 Remmert, 2012, HHblits: lightning-fast iterative protein sequence searching by HMM–HMM alignment, Nat. Methods, 9, 173, 10.1038/nmeth.1818 Renaux, 2018, UniProt: the universal protein knowledgebase, Nucleic Acids Res., 45, D158 Riffenburgh, 2013, Linear discriminant analysis, Chicago, 3, 27 Sela, 1957, The correlation of ribonuclease activity with specific aspects of tertiary structure, Biochim. Biophys. Acta, 26, 502, 10.1016/0006-3002(57)90096-3 Snoek, 2012, Practical Bayesian optimization of machine learning algorithms, Adv. Neural Inf. Process. Syst., 4 Stuart, 1976 Touw, 2014, A series of PDB-related databanks for everyday needs, Nucleic Acids Res., 43, 10.1093/nar/gku1028 Wei, 2015, Enhanced protein fold prediction method through a novel feature extraction technique, IEEE Trans. Nanobiosci., 14, 649, 10.1109/TNB.2015.2450233 Wei, 2016, Recent progress in machine learning-based methods for protein fold recognition, Int. J. Mol. Sci., 17, 10.3390/ijms17122118 Yan, 2019, Protein fold recognition based on multi-view modeling, Bioinformatics, 35, 2982, 10.1093/bioinformatics/btz040 Yan, 2020, Protein fold recognition by combining support vector machines and pairwise sequence similarity scores, IEEE/ACM Trans. Comput. Biol. Bioinf., PP