SEMal: Accurate protein malonylation site predictor using structural and evolutionary information

Computers in Biology and Medicine - Tập 125 - Trang 104022 - 2020
Shubhashis Roy Dipta1, Ghazaleh Taherzadeh2, MD. Wakil Ahmad1, MD. Easin Arafat3, Swakkhar Shatabda1, Abdollah Dehzangi4,5
1Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
2Institute for Bioscience and Biotechnology Research, University of Maryland, College Park, MD, 20742, USA
3Institute of Information Technology, Jahangirnagar University, Savar, Dhaka, Bangladesh
4Department of Computer Science, Rutgers University, Camden, NJ 08102, USA
5Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA

Tài liệu tham khảo

Gallego, 2007, Post-translational modifications regulate the ticking of the circadian clock, Nat. Rev. Mol. Cell Biol., 8, 139, 10.1038/nrm2106 Westermann, 2003, Post-translational modifications regulate microtubule function, Nat. Rev. Mol. Cell Biol., 4, 938, 10.1038/nrm1260 Harmel, 2018, Features and regulation of nonenzymatic post-translational modifications, Nat. Chem. Biol., 14, 244, 10.1038/nchembio.2575 Johnson, 2009, The regulation of protein phosphorylation, Biochem. Soc. Trans., 37, 627, 10.1042/BST0370627 Qiu, 2015, iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model, J. Biomol. Struct. Dyn., 33, 1731, 10.1080/07391102.2014.968875 Qiu, 2014, iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach, BioMed Res. Int., 2014 Hou, 2014, LAceP: lysine acetylation site prediction using logistic regression classifiers, PloS One, 9, 10.1371/journal.pone.0089575 Consortium, 2018, Uniprot: a worldwide hub of protein knowledge, Nucleic Acids Res., 47, D506, 10.1093/nar/gky1049 Xie, 2012, Lysine succinylation and lysine malonylation in histones, Mol. Cell. Proteomics, 11, 100, 10.1074/mcp.M111.015875 Harmel, 2018, Features and regulation of non-enzymatic post-translational modifications, Nat. Chem. Biol., 14, 244, 10.1038/nchembio.2575 Oughtred, 2016, BioGRID: a resource for studying biological interactions in yeast, Cold Spring Harb. Protoc., 2016, 10.1101/pdb.top080754 Xu, 2014, Prediction of posttranslational modification sites from amino acid sequences with kernel methods, J. Theor. Biol., 344, 78, 10.1016/j.jtbi.2013.11.012 Xu, 2016, Mal-Lys: prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection, Nat. Publ. Gr., 1–7 Du, 2016, Prediction of protein lysine acylation by integrating primary sequence information with multiple functional features, J. Proteome Res., 15, 4234, 10.1021/acs.jproteome.6b00240 Wang, 2017, Computational prediction of species-specific malonylation sites via enhanced characteristic strategy, Bioinformatics, 33, 1457, 10.1093/bioinformatics/btw755 Xiang, 2017, Prediction of lysine malonylation sites based on pseudo amino acid compositions, Comb. Chem. High Throughput Screen., 20, 1, 10.2174/1386207320666170314102647 Taherzadeh, 2018, Predicting lysine malonylation sites of proteins using sequence and predicted structural features, J. Comput. Chem., 10.1002/jcc.25353 Zhang, 2018, Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework, Briefings Bioinf., 5 Chen, 2018, Integration of a deep learning classifier with a random forest approach for predicting malonylation sites, Dev. Reprod. Biol., 16, 451 Dehzangi, 2018, Improving succinylation prediction accuracy by incorporating the secondary structure via helix, strand and coil, and evolutionary information from profile bigrams, PloS One, 13, 10.1371/journal.pone.0191900 Islam, 2018, iProtGly‐SS: identifying protein glycation sites using sequence and structure based features, Proteins: Struct. Funct. Bioinfor., 86, 777, 10.1002/prot.25511 Reddy, 2019, GlyStruct: glycation prediction using structural properties of amino acid residues, BMC Bioinf., 19, 547, 10.1186/s12859-018-2547-x Dehzangi, 2013, Enhancing protein fold prediction accuracy using evolutionary and structural features, 196 Dehzangi, 2015, Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳ s general PseAAC, J. Theor. Biol., 364, 284, 10.1016/j.jtbi.2014.09.029 Chowdhury, 2017, iDNAprot-es: identification of DNA-binding proteins using evolutionary and structural features, Sci. Rep., 7, 14938, 10.1038/s41598-017-14945-1 Dehzangi, 2013, Protein fold recognition using an overlapping segmentation approach and a mixture of feature extraction models, 32 Shatabda, 2017, iPHLoc-ES: identification of bacteriophage protein locations using evolutionary and structural features, J. Theor. Biol., 435, 229, 10.1016/j.jtbi.2017.09.022 Dehzangi, 2017, PSSM-Suc: accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction, J. Theor. Biol., 425, 97, 10.1016/j.jtbi.2017.05.005 Sharma, 2013, A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition, J. Theor. Biol., 320, 41, 10.1016/j.jtbi.2012.12.008 Ahmad, 2020, Mal-light: enhancing lysine malonylation sites prediction problem using evolutionary-based features, IEEE Access Wang, 2020, Gps 5.0: an update on the prediction of kinase-specific phosphorylation sites in proteins, Dev. Reprod. Biol. Steentoft, 2013, Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology, EMBO J., 32, 1478, 10.1038/emboj.2013.79 Ren, 2008, CSS-Palm 2.0: an updated software for palmitoylation sites prediction, Protein Eng. Des. Sel., 21, 639, 10.1093/protein/gzn039 Julenius, 2007, NetCGlyc 1.0: prediction of mammalian C-mannosylation sites, Glycobiology, 17, 868, 10.1093/glycob/cwm050 Juncker, 2003, Prediction of lipoprotein signal peptides in Gram‐negative bacteria, Protein Sci., 12, 1652, 10.1110/ps.0303703 Xu, 2017, PLMD: an updated data resource of protein lysine modifications, J. Genetics Genom., 44, 243, 10.1016/j.jgg.2017.03.007 Heffernan, 2015, Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning, Sci. Rep., 5, 11476, 10.1038/srep11476 Yang, 2017, Spider2: a package to predict secondary structure, accessible surface area, and main-chain torsional angles by deep neural networks, 55 Rodriguez, 2006, Rotation forest: a new classifier ensemble method, IEEE Trans. Pattern Anal. Mach. Intell., 28, 1619, 10.1109/TPAMI.2006.211 Fu, 2012, CD-HIT: accelerated for clustering the next-generation sequencing data, Bioinformatics, 28, 3150, 10.1093/bioinformatics/bts565 Li, 2006, Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658, 10.1093/bioinformatics/btl158 Huang, 2010, CD-HIT Suite: a web server for clustering and comparing biological sequences, Bioinformatics, 26, 680, 10.1093/bioinformatics/btq003 Chawla, 2002, SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16, 321, 10.1613/jair.953 He, 2008, ADASYN: adaptive synthetic sampling approach for imbalanced learning Faraggi, 2012, Spine X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles, J. Comput. Chem., 33, 259, 10.1002/jcc.21968 Xu, 2008, A spine X-ray image retrieval system using partial shape matching, IEEE Trans. Inf. Technol. Biomed., 12, 100, 10.1109/TITB.2007.904149 Lyons, 2014, Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto‐encoder deep neural network, J. Comput. Chem., 35, 2040, 10.1002/jcc.23718 Altschul, 1998, Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases, Trends Biochem. Sci., 23, 444, 10.1016/S0968-0004(98)01298-5 Dehzangi, 2015, Gram-positive and gram-negative subcellular localization using rotation forest and physicochemical-based features, BMC Bioinf., 16, S1, 10.1186/1471-2105-16-S4-S1 Dehzangi, 2010, Using rotation forest for protein fold prediction problem: an empirical study, 217 Bustamam, 2019, Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences, BMC Genom., 20, 950, 10.1186/s12864-019-6304-y Wang, 2018, Using two-dimensional principal component analysis and rotation forest for prediction of protein-protein interactions, Sci. Rep., 8, 12874, 10.1038/s41598-018-30694-1 Wang, 2018, Rfdt: a rotation forest-based predictor for predicting drug-target interactions using drug structure and protein sequence information, Curr. Protein Pept. Sci., 19, 445, 10.2174/1389203718666161114111656 You, 2017, An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers, Neurocomputing, 228, 277, 10.1016/j.neucom.2016.10.042 Geurts, 2006, Extremely randomized trees, Mach. Learn., 63, 3, 10.1007/s10994-006-6226-1