Multiple structure alignment and consensus identification for proteins

BMC Bioinformatics - Tập 11 - Trang 1-8 - 2010
Ivaylo Ilinkin1, Jieping Ye2, Ravi Janardan3
1Department of Computer Science, Gettysburg College, Gettysburg, USA
2Department of Computer Science and Engineering, Arizona State University, Tempe, USA
3Department of Computer Science and Engineering, University of Minnesota, Minneapolis, USA

Tóm tắt

An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein which captures common substructures present in the given proteins. The algorithm represents each protein as a sequence of triples of coordinates of the alpha-carbon atoms along the backbone. It then computes iteratively a sequence of transformation matrices (i.e., translations and rotations) to align the proteins in space and generate the consensus. The algorithm is a heuristic in that it computes an approximation to the optimal alignment that minimizes the sum of the pairwise distances between the consensus and the transformed proteins. Experimental results show that the algorithm converges quite rapidly and generates consensus structures that are visually similar to the input proteins. A comparison with other coordinate-based alignment algorithms (MAMMOTH and MATT) shows that the proposed algorithm is competitive in terms of speed and the sizes of the conserved regions discovered in an extensive benchmark dataset derived from the HOMSTRAD and SABmark databases. The algorithm has been implemented in C++ and can be downloaded from the project's web page. Alternatively, the algorithm can be used via a web server which makes it possible to align protein structures by uploading files from local disk or by downloading protein data from the RCSB Protein Data Bank. An algorithm is presented to compute a multiple structure alignment for a set of proteins, together with their consensus structure. Experimental results show its effectiveness in terms of the quality of the alignment and computational cost.

Tài liệu tham khảo

Guda C, Scheeff ED, Bourne PE, Shindyalov IN: A new algorithm for the alignment of multiple protein structures using Monte Carlo optimization. Proceedings of the Pacific Symposium on Biocomputing: 3–7 January 2001; Hawaii 2001, 275–286. Lupyan D, Leo-Macias A, Ortiz AR: A new progressive-iterative algorithm for multiple structure alignment. Bioinformatics 2005, 21: 3255–3263. 10.1093/bioinformatics/bti527 Menke M, Berger B, Cowen L: Matt: Local Flexibility Aids Protein Multiple Structure Alignment. PLoS Computational Biology 2008, 4: 0088–0099. 10.1371/journal.pcbi.0040010 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Research 2000, 28: 235–242. 10.1093/nar/28.1.235 Ye J, Janardan R: Approximate multiple protein structure alignment using the Sum-of-Pairs distance. Journal of Computational Biology 2004, 11(5):986–1000. 10.1089/cmb.2004.11.986 Gusfield D: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge; 1997. Ye J, Janardan R, Liu S: Pairwise protein structure alignment based on an orientation-independent backbone representation. Journal of Bioinformatics and Computational Biology 2004, 2(4):699–717. 10.1142/S021972000400082X Waterhouse AM, Procter JB, A MDM, M C, Barton GJ: Jalview Version 2 - a multiple sequence alignment editor and analysis workbench. Bioinformatics 2009, 25(9):1189–1191. 10.1093/bioinformatics/btp033 Chemis3D: Molecular Viewer Applet[http://chemis.free.fr/mol3d/] Mizuguchi K, Deane CM, Blundell TL, Overington JP: HOMSTRAD: a database of protein structure alignments for homologous families. Protein Science 1998, 7: 2469–2471. 10.1002/pro.5560071126 VanWalle I, Lasters I, Wyns L: SABmark - A benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 2005, 21: 1267–1268. 10.1093/bioinformatics/bth493 Madhusudhanm MS, Webb BM, Marti-Renom MA, Eswar N, Sali A: Alignment of multiple protein structures based on sequence and structure features. Protein Engineering, Design & Selection 2009, 22(9):569–574. 10.1093/protein/gzp040 Venclovas C, Zemla A, Fidelis K, Moult J: Comparison of performance in successive CASP experiments. Proteins 2001, 45(S5):163–170. 10.1002/prot.10053 Holm L, Sander C: Protein Structure Comparison by Alignment of Distance Matrices. Journal of Molecular Biology 1993, 233: 123–138. 10.1006/jmbi.1993.1489 Singh AP, Brutlag DL: Hierarchical protein structure superposition using both secondary structure and atomic representation. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology: 21–26 June, 1997; Halkidiki 1997, 284–293. Golub GH, Van Loan CF: Matrix Computations. Johns Hopkins University Press, Baltimore; 1996. Chew LP, Kedem K: Finding the consensus shape of a protein family. Proceedings of the Eighteenth Annual ACM Symposium on Computational Geometry: 5–7 June 2002; Barcelona 2002, 64–73. full_text Umeyama S: Least-square estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 1991, 13(4):376–380. 10.1109/34.88573