Prediction of heme binding residues from protein sequences with integrative sequence profiles

Springer Science and Business Media LLC - Tập 10 Số S1 - 2012
Yi Xiong1, Juan Liu1, Wen Zhang1, Tao Zeng1
1School of Computer, Wuhan University, Wuhan, China

Tóm tắt

Abstract Background The heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand these activities and aid in functional annotation, however, insufficient work has been done on the research of heme binding residues from protein sequence information. Methods We propose a sequence-based approach for accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. In order to select the informative physicochemical properties, we design an intuitive feature selection scheme by combining a greedy strategy with correlation analysis. Results Our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent tests. Conclusions The novel feature of an integrative sequence profile achieves good performance using a reduced set of feature vector elements.

Từ khóa


Tài liệu tham khảo

Schneider S, Marles-Wright J, Sharp KH, Paoli M: Diversity and conservation of interactions for binding heme in b-type heme proteins. Nat Prod Rep 2007, 24: 621–630. 10.1039/b604186h

Smith LJ, Kahraman A, Thornton JM: Heme proteins--diversity in structural characteristics, function, and folding. Proteins 2010, 78: 2349–2368. 10.1002/prot.22747

Liu R, Hu J: HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information. BMC Bioinformatics 2011, 12: 207. 10.1186/1471-2105-12-207

Tung CW, Ho SY: Computational identification of ubiquitylation sites from protein sequences. BMC Bioinformatics 2008, 9: 310. 10.1186/1471-2105-9-310

Tung CW, Ho SY: POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties. Bioinformatics 2007, 23: 942–949. 10.1093/bioinformatics/btm061

Huang HL, Lin IC, Liou YF, Tsai CT, Hsu KT, Huang WL, Ho SJ, Ho SY: Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties. BMC Bioinformatics 2011,12(Suppl 1):S47. 10.1186/1471-2105-12-S1-S47

Xia JF, Zhao XM, Huang DS: Predicting protein-protein interactions from protein sequences using meta predictor. Amino Acids 2010, 39: 1595–1599. 10.1007/s00726-010-0588-1

Xia JF, Wang SL, Lei YK: Computational methods for the prediction of protein-protein interactions. Protein Pept Lett 2010, 17: 1069–1078. 10.2174/092986610791760405

Xiong Y, Liu J, Wei DQ: An accurate feature-based method for identifying DNA-binding residues on protein surfaces. Proteins 2011, 79: 509–517. 10.1002/prot.22898

Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M: AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 2008, 36: D202–205. 10.1093/nar/gkn255

Kawashima S, Kanehisa M: AAindex: amino acid index database. Nucleic Acids Res 2000, 28: 374. 10.1093/nar/28.1.374

Fufezan C, Zhang J, Gunner MR: Ligand preference and orientation in b-and c-type heme-binding proteins. Proteins 2008, 73: 690–704. 10.1002/prot.22097

Mishra NK, Raghava GP: Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information. BMC Bioinformatics 2010,11(Suppl 1):S48. 10.1186/1471-2105-11-S1-S48

Chauhan JS, Mishra NK, Raghava GP: Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information. BMC Bioinformatics 2010, 11: 301. 10.1186/1471-2105-11-301

Ansari HR, Raghava GP: Identification of NAD interacting residues in proteins. BMC Bioinformatics 2010, 11: 160. 10.1186/1471-2105-11-160

Chauhan JS, Mishra NK, Raghava GP: Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinformatics 2009, 10: 434. 10.1186/1471-2105-10-434

Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389

Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999, 292: 195–202. 10.1006/jmbi.1999.3091

Maetschke SR, Yuan Z: Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinformatics 2009, 10: 341. 10.1186/1471-2105-10-341

Ma X, Guo J, Wu J, Liu H, Yu J, Xie J, Sun X: Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature. Proteins 2011, 79: 1230–1239. 10.1002/prot.22958

Shimizu K, Hirose S, Noguchi T: POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix. Bioinformatics 2007, 23: 2337–2338. 10.1093/bioinformatics/btm330

Su CT, Chen CY, Ou YY: Protein disorder prediction by condensed PSSM considering propensity for order or disorder. BMC Bioinformatics 2006, 7: 319. 10.1186/1471-2105-7-319

Cortes C, Vapnik V: Support-vector networks. Machine learning 1995, 20: 273–297.

Xia JF, Zhao XM, Song J, Huang DS: APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics 2010, 11: 174. 10.1186/1471-2105-11-174

Liu R, Jiang W, Zhou Y: Identifying protein-protein interaction sites in transient complexes with temperature factor, sequence profile and accessible surface area. Amino Acids 2010, 38: 263–270. 10.1007/s00726-009-0245-8

Xiong Y, Xia J, Zhang W, Liu J: Exploiting a Reduced Set of Weighted Average Features to Improve Prediction of DNA-Binding Residues from 3D Structures. PLoS One 2011, 6: e28440. 10.1371/journal.pone.0028440

Chen K, Mizianty MJ, Kurgan L: ATPsite: sequence-based prediction of ATP-binding residues. Proteome Sci 2011,9(Suppl 1):S4. 10.1186/1477-5956-9-S1-S4

Aurora R, Rose GD: Helix capping. Protein Science 1998, 7: 21–38.

Qian N, Sejnowski TJ: Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 1988, 202: 865–884. 10.1016/0022-2836(88)90564-5

Suyama M, Ohara O: DomCut: prediction of inter-domain linker regions in amino acid sequences. Bioinformatics 2003, 19: 673–674. 10.1093/bioinformatics/btg031

Chen P, Li J: Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinformatics 2010, 11: 402. 10.1186/1471-2105-11-402