Improving consensus structure by eliminating averaging artifacts
Tóm tắt
Common structural biology methods (i.e., NMR and molecular dynamics) often produce ensembles of molecular structures. Consequently, averaging of 3D coordinates of molecular structures (proteins and RNA) is a frequent approach to obtain a consensus structure that is representative of the ensemble. However, when the structures are averaged, artifacts can result in unrealistic local geometries, including unphysical bond lengths and angles. Herein, we describe a method to derive representative structures while limiting the number of artifacts. Our approach is based on a Monte Carlo simulation technique that drives a starting structure (an extended or a 'close-by' structure) towards the 'averaged structure' using a harmonic pseudo energy function. To assess the performance of the algorithm, we applied our approach to Cα models of 1364 proteins generated by the TASSER structure prediction algorithm. The average RMSD of the refined model from the native structure for the set becomes worse by a mere 0.08 Å compared to the average RMSD of the averaged structures from the native structure (3.28 Å for refined structures and 3.36 A for the averaged structures). However, the percentage of atoms involved in clashes is greatly reduced (from 63% to 1%); in fact, the majority of the refined proteins had zero clashes. Moreover, a small number (38) of refined structures resulted in lower RMSD to the native protein versus the averaged structure. Finally, compared to PULCHRA [1], our approach produces representative structure of similar RMSD quality, but with much fewer clashes. The benchmarking results demonstrate that our approach for removing averaging artifacts can be very beneficial for the structural biology community. Furthermore, the same approach can be applied to almost any problem where averaging of 3D coordinates is performed. Namely, structure averaging is also commonly performed in RNA secondary prediction [2], which could also benefit from our approach.
Tài liệu tham khảo
Rotkiewicz P, Skolnick J: Fast procedure for reconstruction of full-atom protein models from reduced representations. Journal of Computational Chemsitry 2008, 29: 1460–1465. 10.1002/jcc.20906
Ding Y, Chan CY, Lawrence CE: RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. Rna 2005, 11: 1157–1166. 10.1261/rna.2500605
Furnham N, de Bakker PI, Gore S, Burke DF, Blundell TL: Comparative modelling by restraint-based conformational sampling. BMC structural biology 2008, 8: 7. 10.1186/1472-6807-8-7
Zagrovic B, Snow CD, Khaliq S, Shirts MR, Pande VS: Native-like mean structure in the unfolded ensemble of small proteins. Journal of molecular biology 2002, 323: 153–164. 10.1016/S0022-2836(02)00888-4
Huang ES, Samudrala R, Ponder JW: Distance geometry generates native-like folds for small helical proteins using the consensus distances of predicted protein structures. Protein Sci 1998, 7: 1998–2003. 10.1002/pro.5560070916
Zagrovic B, Pande VS: How does averaging affect protein structure comparison on the ensemble level? Biophys J 2004, 87: 2240–2246. 10.1529/biophysj.104.042184
Murshudov GN, Vagin AA, Dodson EJ: Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr 1997, 53(pt3):240–255. 10.1107/S0907444996012255
Betancourt MR, Skolnick J: Finding the needle in a haystack: Educing native folds from ambiguous ab initio protein structure. Journal of Computational Chemistry 2001, 22: 339–353. Publisher Full Text 10.1002/1096-987X(200102)22:3<339::AID-JCC1006>3.0.CO;2-R
Zhang Y, Skolnick J: SPICKER: a clustering approach to identify near-native protein folds. Journal of Computational Chemistry 2004, 25: 865–871. 10.1002/jcc.20011
Zhou H, Pandit SB, Lee SY, Borreguero J, Chen H, Wroblewska L, Skolnick J: Analysis of TASSER-based CASP7 protein structure prediction results. Proteins 2007, 69(Suppl 8):90–97. 10.1002/prot.21649
Zhang Y, Arakaki AK, Skolnick J: TASSER: an automated method for the prediction of protein tertiary structures in CASP6. Proteins 2005, 61(Suppl 7):91–98. 10.1002/prot.20724
Kolinski A, Bujnicki JM: Generalized protein structure prediction based on combination of fold-recognition with de novo folding and evaluation of models. Proteins 2005, 61(Suppl 7):84–90. 10.1002/prot.20723
Zhang Y, Devries ME, Skolnick J: Structure modeling of all identified G protein-coupled receptors in the human genome. PLoS Comput Biol 2006, 2: e13. 10.1371/journal.pcbi.0020013
Gront D, Kmiecik S, Kolinski A: Backbone building from quadrilaterals: a fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates. Journal of Computational Chemistry 2007, 28: 1593–1597. 10.1002/jcc.20624
Oldfield TJ, Hubbard RE: Analysis of C alpha geometry in protein structures. Proteins 1994, 18: 324–337. 10.1002/prot.340180404
James F: MINUIT Function Minimization and Error Analysis. CERN Program Library Long Writeup 1998., D506:
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E: Equation-of-state calculations by fast computing machines. Journal of Chemical Physics 1953, 21: 1087–1092. 10.1063/1.1699114
Reva BA, Finkelstein AV, Skolnick J: What is the probability of a chance prediction of a protein structure with an rmsd of 6 A? Fold Des 1998, 3: 141–147. 10.1016/S1359-0278(98)00019-4
Zhang Y, Skolnick J: Scoring function for automated assessment of protein structure template quality. Proteins 2004, 57: 702–710. 10.1002/prot.20264
Milik M, Kolinski A, Skolnick J: Algorithm for rapid reconstruction of protein backbone from alpha carbon coordinates. Journal of Computational Chemistry 1997, 18: 80–85. Publisher Full Text 10.1002/(SICI)1096-987X(19970115)18:1<80::AID-JCC8>3.0.CO;2-W
Dukka Bahadur KC, Tomita E, Suzuki J, Akutsu T: Protein side-chain packing problem: a maximum edge-weight clique algorithmic approach. J Bioinform Comput Biol 2005, 3: 103–126. 10.1142/S0219720005000904
Canutescu AA, Shelenkov AA, Dunbrack RL Jr: A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci 2003, 12: 2001–2014. 10.1110/ps.03154503