A novel sequence-based method of predicting protein DNA-binding residues, using a machine learning approach

Elsevier BV - Tập 30 - Trang 99-105 - 2010

Yudong Cai^1,2, ZhiSong He³, Xiaohe Shi⁴, Xiangying Kong^4,5, Lei Gu⁶, Lu Xie⁷

¹Institute of System Biology, Shanghai University, Shanghai, People’s Republic of China

²Centre for Computational Systems Biology, Fudan University, Shanghai, People’s Republic of China

³Department of Bioinformatics, College of Life Sciences, Zhejiang University, ZheJiang, People’s Republic of China

⁴Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS) and Shanghai Jiao Tong University School of Medicine, Shanghai, People’s Republic of China

⁵State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai, People’s Republic of China

⁶Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Bonn, Germany

⁷Shanghai Center for Bioinformation Technology, Shanghai, People’s Republic of China

Tóm tắt

Protein-DNA interactions play an essential role in transcriptional regulation, DNA repair, and many vital biological processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.

Tài liệu tham khảo

Ahmad, S., Gromiha, M.M., and Sarai, A. (2004). Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics 20, 477–486. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. (2000). The protein data bank. Nucleic Acids Res. 28, 235–242. Bullock, A.N., and Fersht, A.R. (2001). Rescuing the function of mutant p53. Nat. Rev. Cancer 1, 68–76. Cai, Y., He, J., Li, X., Lu, L., Yang, X., Feng, K., Lu, W., and Kong, X. (2009). A novel computational approach to predict transcription factor DNA binding preference. J. Proteome Res. 8, 999–1003. Cao, X., Kambe, F., Lu, X., Kobayashi, N., Ohmori, S., and Seo, H. (2005). Glutathionylation of two cysteine residues in paired domain regulates DNA binding activity of Pax-8. J. Biol. Chem. 280, 25901–25906. Fugmann, S.D., and Schatz, D.G. (2001). Identification of basic residues in RAG2 critical for DNA binding by the RAG1-RAG2 complex. Mol. Cell 8, 899–910. Gao, M., and Skolnick, J. (2008). DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions. Nucleic Acids Res. 36, 3978–3992. Gromiha, M.M., Siebers, J.G., Selvaraj, S., Kono, H., and Sarai, A. (2005). Role of inter and intramolecular interactions in protein-DNA recognition. Gene 364, 108–113. Ho, S.Y., Yu, F.C., Chang, C.Y., and Huang, H.L. (2007). Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method. Biosystems 90, 234–241. Horton, P., Park, K.J., Obayashi, T., Fujita, N., Harada, H., Adams-Collier, C.J., and Nakai, K. (2007). WoLF PSORT: protein localization predictor. Nucleic Acids Res. 35, W585–587. Hwang, S., Gou, Z., and Kuznetsov, I.B. (2007). DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23, 634–636. Jamal Rahi, S., Virnau, P., Mirny, L.A., and Kardar, M. (2008). Predicting transcription factor specificity with all-atom models. Nucleic Acids Res. 36, 6209–6217. Jones, S., and Thornton, J.M. (2004). Searching for functional sites in protein structures. Curr. Opin. Chem. Biol. 8, 3–7. Kaplan, T., Friedman, N., and Margalit, H. (2005). Ab initio prediction of transcription factor targets using structural knowledge. PLoS Comput. Biol. 1, e1. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., et al. (2007). Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948. Luscombe, N.M., Austin, S.E., Berman, H.M., and Thornton, J.M. (2000). An overview of the structures of protein-DNA complexes. Genome Biol. 1, REVIEWS001. Noyes, M.B., Christensen, R.G., Wakabayashi, A., Stormo, G.D., Brodsky, M.H., and Wolfe, S.A. (2008). Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133, 1277–1289. Ofran, Y., Mysore, V., and Rost, B. (2007). Prediction of DNAbinding residues from sequence. Bioinformatics 23, i347–353. Peng, H., Long, F., and Ding, C. (2005). Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1226–1238. Pietsch, E.C., Perchiniak, E., Canutescu, A.A., Wang, G., Dunbrack, R.L., and Murphy, M.E. (2008). Oligomerization of BAK by p53 utilizes conserved residues of the p53 DNA binding domain. J. Biol. Chem. 283, 21294–21304. Qian, Z., Cai, Y.D., and Li, Y. (2006). A novel computational method to predict transcription factor DNA binding preference. Biochem. Biophys. Res. Commun. 348, 1034–1037. Salamov, A.A., and Solovyev, V.V. (1997). Protein secondary structure prediction using local alignments. J. Mol. Biol. 268, 31–36. Sim, J., Kim, S.Y., and Lee, J. (2005). Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method. Bioinformatics 21, 2844–2849. Sinha, S., van Nimwegen, E., and Siggia, E.D. (2003). A probabilistic method to detect regulatory modules. Bioinformatics 19, i292–301. Tan, K., McCue, L.A., and Stormo, G.D. (2005). Making connections between novel transcription factors and their DNA motifs. Genome Res. 15, 312–320. Valdar, W.S. (2002). Scoring residue conservation. Proteins 48, 227–241. Vavouri, T., and Elgar, G. (2005). Prediction of cis-regulatory elements using binding site matrices—the successes, the failures and the reasons for both. Curr. Opin. Genet. Dev. 15, 395–402. Wang, L., and Brown, S.J. (2006). Prediction of DNA-binding residues from sequence features. J. Bioinform Comput. Biol. 4, 1141–1158. Warner, J.B., Philippakis, A.A., Jaeger, S.A., He, F.S., Lin, J., and Bulyk, M.L. (2008). Systematic identification of mammalian regulatory motifs’ target genes and functions. Nat. Methods 5, 347–353. Whitington, T., Perkins, A.C., and Bailey, T.L. (2009). High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites. Nucleic Acids Res. 37, 14–25. Wong, W.S., and Nielsen, R. (2007). Finding cis-regulatory modules in Drosophila using phylogenetic hidden Markov models. Bioinformatics 23, 2031–2037. Wu, J., Liu, H., Duan, X., Ding, Y., Wu, H., Bai, Y., and Sun, X. (2009). Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature. Bioinformatics 25, 30–35.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA