Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms
Tài liệu tham khảo
Altschul, 1997, Gapped blast and PSI-blast: a new generation of protein database search programs, Nucleic Acids Res., 25, 3389, 10.1093/nar/25.17.3389
Andreeva, 2020, The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures, Nucleic Acids Res., D1
Baldwin, 1991, Crystal structure of interleukin 8: symbiosis of NMR and crystallography, Proc. Natl. Acad. Sci., 88, 502, 10.1073/pnas.88.2.502
Berman, 2000, The protein data bank, Int. Tables Crystallogr., 67, 675
Bragg, 1976, The development of X-ray analysis, Contemp. Phys., 17, 103, 10.1080/00107517608210844
Breiman, 2001, Random forests, Mach. Learn., 45, 5, 10.1023/A:1010933404324
Chandonia, 2019, Scope: classification of large macromolecular structures in the structural classification of proteins-extended database, Nucleic Acids Res., 10.1093/nar/gky1134
Chandonia, 2004, The ASTRAL compendium in 2004, Nucleic Acids Res., 32, D189, 10.1093/nar/gkh034
Chen, 1986, Polynomial regression, Springer Texts Stat., 235
Chen, 2016, Profold: Protein fold classification with additional structural features and a novel ensemble classifier, BioMed Res. Int., 2016, 1
Chen, 2019, Classification of widely and rarely expressed genes with recurrent neural network, Comput. Struct. Biotechnol. J., 17, 49, 10.1016/j.csbj.2018.12.002
Chen, 2016, 785
Chou, 2010, Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins-Struct. Funct. Bioinf., 43, 246, 10.1002/prot.1035
Cohen, 1987, Prediction of the three-dimensional structure of human growth hormone, Proteins Struct. Funct. Bioinf., 22, 162, 10.1002/prot.340020209
Ding, 2001, Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics, 17, 349, 10.1093/bioinformatics/17.4.349
Dubchak, 1995, Prediction of protein folding class using global description of amino acid sequence, Proc. Natl. Acad. Sci. U. S. A., 92, 8700, 10.1073/pnas.92.19.8700
Feng, 2016, The recognition of multi-class protein folds by adding average chemical shifts of secondary structure elements, Saudi J. Biol. Sci., 23, 189, 10.1016/j.sjbs.2015.10.008
Friedman, 2001, Greedy function approximation: a gradient boosting machine, Ann. Stat., 29, 1189, 10.1214/aos/1013203451
Graves, 2013, Speech recognition with deep recurrent neural networks, Acoust. Speech Signal Process.
Hearst, 1998, Support vector machines, IEEE Intell. Syst., 13, 18, 10.1109/5254.708428
Heffernan, 2015, Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning, Reports, 5, 11476
Hervé, 2010, Principal component analysis, Wiley Interdiscip. Rev. Comput. Stat., 2, 433, 10.1002/wics.101
Ibrahim, 2018, Protein fold recognition using deep kernelized extreme learning machine and linear discriminant analysis, Neural Comput. Appl., 1
Kabsch, 1983, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, Biopolymers, 22, 2577, 10.1002/bip.360221211
Kavousi, 2011, A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM, Comput. Biol. Chem., 35, 1, 10.1016/j.compbiolchem.2010.12.001
Ke, 2017, Lightgbm: a highly efficient gradient boosting decision tree, Conference on Neural Information Processing Systems
Keller, 1985, A fuzzy k-nearest neighbor algorithm, IEEE Trans. Syst., Man, Cybern., SMC-15, 580, 10.1109/TSMC.1985.6313426
Li, 2018, Identification of synthetic lethality based on a functional network by using machine learning algorithms, J. Cell. Biochem., 120, 405, 10.1002/jcb.27395
Liang, 2015, Prediction of protein structural classes for low-similarity sequences based on consensus sequence and segmented PSSM, Comput. Math. Methods Med., 2015, 1, 10.1155/2015/370756
Lin, 2013, Hierarchical classification of protein folds using a novel ensemble classifier, PLoS One, 8, e56499, 10.1371/journal.pone.0056499
Liu, 1998, Incremental feature selection, Appl. Intell., 9, 217, 10.1023/A:1008363719778
Lyons, 2016, Protein fold recognition using HMM–HMM alignment and dynamic programming, J. Theor. Biol., 393, 67, 10.1016/j.jtbi.2015.12.018
Mehta, 2019, Predicting structural class for protein sequences of random forest algorithm, Comput. Biol. Chem., 84, 107164
Murzin, 1995, SCOP: a structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol., 247, 10.1016/S0022-2836(05)80134-2
Orengo, 1997, CATH – a hierarchic classification of protein domain structures, Structure, 5, 1093, 10.1016/S0969-2126(97)00260-8
Paliwal, 2014, A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition, NanoBioscience, 13, 44, 10.1109/TNB.2013.2296050
Powers, 2011, Evaluation: from Precision, Recall and F-measure to ROC, informedness, markedness and correlation, J. Mach. Learn. Technol., 2, 37
Remmert, 2012, HHblits: lightning-fast iterative protein sequence searching by HMM–HMM alignment, Nat. Methods, 9, 173, 10.1038/nmeth.1818
Renaux, 2018, UniProt: the universal protein knowledgebase, Nucleic Acids Res., 45, D158
Riffenburgh, 2013, Linear discriminant analysis, Chicago, 3, 27
Sela, 1957, The correlation of ribonuclease activity with specific aspects of tertiary structure, Biochim. Biophys. Acta, 26, 502, 10.1016/0006-3002(57)90096-3
Snoek, 2012, Practical Bayesian optimization of machine learning algorithms, Adv. Neural Inf. Process. Syst., 4
Stuart, 1976
Touw, 2014, A series of PDB-related databanks for everyday needs, Nucleic Acids Res., 43, 10.1093/nar/gku1028
Wei, 2015, Enhanced protein fold prediction method through a novel feature extraction technique, IEEE Trans. Nanobiosci., 14, 649, 10.1109/TNB.2015.2450233
Wei, 2016, Recent progress in machine learning-based methods for protein fold recognition, Int. J. Mol. Sci., 17, 10.3390/ijms17122118
Yan, 2019, Protein fold recognition based on multi-view modeling, Bioinformatics, 35, 2982, 10.1093/bioinformatics/btz040
Yan, 2020, Protein fold recognition by combining support vector machines and pairwise sequence similarity scores, IEEE/ACM Trans. Comput. Biol. Bioinf., PP