A collaborative environment for developing and validating predictive tools for protein biophysical characteristics

Michael A. Johnston1, Damien Farrell1, Jens Erik Nielsen1
1School of Biomolecular and Biomedical Science, Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland

Tóm tắt

The exchange of information between experimentalists and theoreticians is crucial to improving the predictive ability of theoretical methods and hence our understanding of the related biology. However many barriers exist which prevent the flow of information between the two disciplines. Enabling effective collaboration requires that experimentalists can easily apply computational tools to their data, share their data with theoreticians, and that both the experimental data and computational results are accessible to the wider community. We present a prototype collaborative environment for developing and validating predictive tools for protein biophysical characteristics. The environment is built on two central components; a new python-based integration module which allows theoreticians to provide and manage remote access to their programs; and PEATDB, a program for storing and sharing experimental data from protein biophysical characterisation studies. We demonstrate our approach by integrating PEATSA, a web-based service for predicting changes in protein biophysical characteristics, into PEATDB. Furthermore, we illustrate how the resulting environment aids method development using the Potapov dataset of experimentally measured ΔΔGfold values, previously employed to validate and train protein stability prediction algorithms.

Từ khóa


Tài liệu tham khảo

Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10(12):980 Porter CT, Bartlett GJ, Thornton JM (2004) The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32(Database issue):D129–D133 Kumar MDS, Bava KA, Gromiha MM, Prabakaran P, Kitajima K, Uedaira H, Sarai A (2006) ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Res 34(Database issue):D204–D206 Barthelmes J, Ebeling C, Chang A, Schomburg I, Schomburg D (2007) BRENDA, AMENDA and FRENDA: the enzyme information system in 2007. Nucleic Acids Res 35(Database issue):D511–D514 Toseland CP, McSparron H, Davies MN, Flower DR (2006) PPD v1.0–an integrated, web-accessible database of experimentally determined protein pKa values. Nucleic Acids Res 34(Database issue):D199–D203 Farrell D, Miranda ES, Webb H, Georgi N, Crowley PB, McIntosh LP, Nielsen JE (2010) Titration_DB: storage and analysis of NMR-monitored protein pH titration curves. Proteins 78(4):843–857 Block P, Sotriffer CA, Dramburg I, Klebe G (2006) AffinDB: a freely accessible database of affinities for protein-ligand complexes from the PDB. Nucleic Acids Res 34(Database issue):D522–D526 Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK (2007) BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res 35(Database issue):D198–D201 Wang R, Fang X, Lu Y, Yang C-Y, Wang S (2005) The PDBbind database: methodologies and updates. J Med Chem 48(12):4111–4119 Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C (2009) STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 37(Database issue):D412–D416 Rohl CA, Strauss CEM, Misura KMS, Baker D (2004) Protein Structure Prediction Using Rosetta. Methods Enzymol 383:66–93 Zhang Y (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9:40 Sham YY, Chu ZT, Tao H, Warshel A (2000) Examining methods for calculations of binding free energies: LRA, LIE, PDLD-LRA, and PDLD/S-LRA calculations of ligands binding to an HIV protease. Proteins 39(4):393–407 Wang R, Lu Y, Wang S (2003) Comparative evaluation of 11 scoring functions for molecular docking. J Med Chem 46(12):2287–2303 Li H, Robertson AD, Jensen JH (2005) Very fast empirical prediction and rationalization of protein pKa values. Proteins 61(4):704–721 Tynan-Connolly BM, Nielsen JE (2006) pKD: re-designing protein pKa values. Nucleic Acids Res 34(Web Server issue):W48–W51 Korkegian A, Black ME, Baker D, Stoddard BL (2005) Computational thermostabilization of an enzyme. Science 308(5723):857–860 Potapov V, Cohen M, Schreiber G (2009) Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng Des Sel 22(9):553–560 Aloy P, Russell RB (2004) Ten thousand interactions for the molecular biologist. Nat Biotechnol 22(10):1317–1321 Aloy P, Böttcher B, Ceulemans H, Leutwein C, Mellwig C, Fischer S, Gavin A-C, Bork P, Superti-Furga G, Serrano L, Russell RB (2004) Structure-based assembly of protein complexes in yeast. Science 303(5666):2026–2029 Olsson MHM, Parson WW, Warshel A (2006) Dynamical contributions to enzyme catalysis: critical tests of a popular hypothesis. Chem Rev 106(5):1737–1756 Simonson T (2002) Gaussian fluctuations and linear response in an electron transfer protein. Proc Natl Acad Sci U S A, 99(10):6544–6549 Carstensen T, Farrell D, Huang Y, Baker NA, Nielsen JE (2011) On the development of protein pKa calculation algorithms. Proteins. doi:10.1002/prot.23091 Benson G (2010) Editorial. Nucleic Acids Res 38(suppl 2):W1–W2 Farrell D, O’Meara F, Johnston M, Bradley J, Søndergaard CR, Georgi N, Webb H, Tynan-Connolly BM, Bjarnadottir U, Carstensen T, Nielsen JE (2010) Capturing, sharing and analysing biophysical data from protein engineering and protein characterization studies. Nucleic Acids Res 38(20):e186 Johnston MA, Søndergaard CR, Nielsen JE (2011) Integrated prediction of the effect of mutations on multiple protein characteristics. Proteins 79(1):165–178 Tynan-Connolly BM, Nielsen JE (2007) Redesigning protein pKa values. Protein Sci 16(2):239–249 Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G, Baker NA (2007) PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res 35(Web Server issue):W522–W525 Guerois G, Nielsen JE, Serrano L (2002) Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 320(2):369–387 Bode B, Halstead DM, Kendall R, Lei Z, Jackson D (2000) The portable batch scheduler and the Maui scheduler on Linux clusters. In: ALS’00: Proceedings of the 4th Annual Linux Showcase & Conference. Berkeley, CA, USA: USENIX Association, pp 27–27 Johnston MA, Nielsen JE (2011) Constructing and evaluating predictive models for protein biophysical characteristics. Ann Rep Comput Chem 7:101–122. doi:10.1016/B978-0-444-53835-2.00012-2 Serrano L, Kellis JT Jr, Cann P, Matouschek A, Fersht AR (1992) The folding of an enzyme. II. Substructure of barnase and the contribution of different interactions to protein stability. J Mol Biol 224(3):783–804 Serrano L, Sancho J, Hirshberg M, Fersht AR (1992) Alpha-helix stability in proteins. I. Empirical correlations concerning substitution of side-chains at the N and C-caps and the replacement of alanine by glycine or serine at solvent-exposed surfaces. J Mol Biol 227(2):544–559 Horovitz A, Matthews JM, Fersht AR (1992) Alpha-helix stability in proteins. II. Factors that influence stability at an internal position. J Mol Biol 227(2):560–568 Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453 Farrell D, Webb H, Johnston MA, Poulsen TA, Christensen LB, Borchert TV, Nielsen JE (2012) Towards fast determination of protein stability maps: experimental and theoretical analysis of mutants of a Nocardiopsis prasina serine protease. Biochemistry, Accepted for publication.