Protein–ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment

Bioinformatics - Tập 29 Số 20 - Trang 2588-2595 - 2013
Jianyi Yang1, Ambrish Roy1, Yang Zhang1
11 Department of Computational Medicine and Bioinformatics and 2Department of Biological Chemistry, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218, USA

Tóm tắt

Abstract Motivation: Identification of protein–ligand binding sites is critical to protein function annotation and drug discovery. However, there is no method that could generate optimal binding site prediction for different protein types. Combination of complementary predictions is probably the most reliable solution to the problem. Results: We develop two new methods, one based on binding-specific substructure comparison (TM-SITE) and another on sequence profile alignment (S-SITE), for complementary binding site predictions. The methods are tested on a set of 500 non-redundant proteins harboring 814 natural, drug-like and metal ion molecules. Starting from low-resolution protein structure predictions, the methods successfully recognize >51% of binding residues with average Matthews correlation coefficient (MCC) significantly higher (with P-value <10–9 in student t-test) than other state-of-the-art methods, including COFACTOR, FINDSITE and ConCavity. When combining TM-SITE and S-SITE with other structure-based programs, a consensus approach (COACH) can increase MCC by 15% over the best individual predictions. COACH was examined in the recent community-wide COMEO experiment and consistently ranked as the best method in last 22 individual datasets with the Area Under the Curve score 22.5% higher than the second best method. These data demonstrate a new robust approach to protein–ligand binding site recognition, which is ready for genome-wide structure-based function annotations. Availability:  http://zhanglab.ccmb.med.umich.edu/COACH/ Contact:  [email protected] Supplementary information:  Supplementary data are available at Bioinformatics online.

Từ khóa


Tài liệu tham khảo

Altschul, 1997, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res., 25, 3389, 10.1093/nar/25.17.3389

An, 2005, Pocketome via comprehensive identification and classification of ligand binding envelopes, Mol. Cell. Proteomics, 4, 752, 10.1074/mcp.M400159-MCP200

Brylinski, 2008, A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation, Proc. Natl Acad. Sci. USA, 105, 129, 10.1073/pnas.0707684105

Capra, 2009, Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure, PLoS Comput. Biol., 5, e1000585, 10.1371/journal.pcbi.1000585

Capra, 2007, Predicting functionally important residues from sequence conservation, Bioinformatics, 23, 1875, 10.1093/bioinformatics/btm270

Fischer, 2008, Prediction of protein functional residues from sequence by probability density estimation, Bioinformatics, 24, 613, 10.1093/bioinformatics/btm626

Ginalski, 2003, 3D-Jury: a simple approach to improve protein structure predictions, Bioinformatics, 19, 1015, 10.1093/bioinformatics/btg124

Greer, 1994, Application of the 3-dimensional structures of protein target molecules in structure-based drug design, J. Med. Chem., 37, 1035, 10.1021/jm00034a001

Heinig, 2004, STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins, Nucleic Acids Res., 32, W500, 10.1093/nar/gkh429

Hendlich, 1997, LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins, J. Mol. Graph Model, 15, 359, 10.1016/S1093-3263(98)00002-3

Hubbard, 2006

Joachims, 2006, Training linear SVMs in linear time, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 217, 10.1145/1150402.1150429

Jones, 1999, Protein secondary structure prediction based on position-specific scoring matrices, J. Mol. Biol., 292, 195, 10.1006/jmbi.1999.3091

Laskowski, 1995, SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions, J. Mol. Graph, 13, 323, 10.1016/0263-7855(95)00073-9

Lopez, 2011, Firestar–advances in the prediction of functionally important residues, Nucleic Acids Res., 39, W235, 10.1093/nar/gkr437

Needleman, 1970, A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol., 48, 443, 10.1016/0022-2836(70)90057-4

Rausell, 2010, Protein interactions and ligand binding: from protein subfamilies to functional specificity, Proc. Natl Acad. Sci. USA, 107, 1995, 10.1073/pnas.0908044107

Roche, 2011, FunFOLD: an improved automated method for the prediction of ligand binding residues using 3D models of proteins, BMC Bioinformatics, 12, 160, 10.1186/1471-2105-12-160

Roy, 2010, I-TASSER: a unified platform for automated protein structure and function prediction, Nat. Protoc., 5, 725, 10.1038/nprot.2010.5

Roy, 2012, COFACTOR: an accurate comparative algorithm for structure-based protein function annotation, Nucleic Acids Res., 40, W471, 10.1093/nar/gks372

Roy, 2012, Recognizing protein-ligand binding sites by global structural alignment and local geometry refinement, Structure, 20, 987, 10.1016/j.str.2012.03.009

Schmidt, 2011, Assessment of ligand-binding residue predictions in CASP9, Proteins, 79, 126, 10.1002/prot.23174

Skolnick, 2004, Development and large scale benchmark testing of the PROSPECTOR 3.0 threading algorithm, Protein, 56, 502, 10.1002/prot.20106

Wass, 2010, 3DLigandSite: predicting ligand-binding sites using similar structures, Nucleic Acids Res., 38, W469, 10.1093/nar/gkq406

Wu, 2007, LOMETS: a local meta-threading-server for protein structure prediction, Nucleic Acids Res., 35, 3375, 10.1093/nar/gkm251

Xu, 2010, How significant is a protein structure similarity with TM-score = 0.5?, Bioinformatics, 26, 889, 10.1093/bioinformatics/btq066

Yang, 2013, BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions, Nucleic Acids Res., 41, D1096, 10.1093/nar/gks966

Zhang, 2007, Template-based modeling and free modeling by I-TASSER in CASP7, Proteins, 69, 108, 10.1002/prot.21702

Zhang, 2008, I-TASSER server for protein 3D structure prediction, BMC Bioinformatics, 9, 40, 10.1186/1471-2105-9-40

Zhang, 2009, Protein structure prediction: when is it useful? Curr, Opin. Struct. Biol., 19, 145, 10.1016/j.sbi.2009.02.005

Zhang, 2005, TM-align: a protein structure alignment algorithm based on the TM-score, Nucleic Acids Res., 33, 2302, 10.1093/nar/gki524