Multiple structure alignment and consensus identification for proteins

BMC Bioinformatics - Tập 11 - Trang 1-8 - 2010

Ivaylo Ilinkin¹, Jieping Ye², Ravi Janardan³

¹Department of Computer Science, Gettysburg College, Gettysburg, USA

²Department of Computer Science and Engineering, Arizona State University, Tempe, USA

³Department of Computer Science and Engineering, University of Minnesota, Minneapolis, USA

Tóm tắt

An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein which captures common substructures present in the given proteins. The algorithm represents each protein as a sequence of triples of coordinates of the alpha-carbon atoms along the backbone. It then computes iteratively a sequence of transformation matrices (i.e., translations and rotations) to align the proteins in space and generate the consensus. The algorithm is a heuristic in that it computes an approximation to the optimal alignment that minimizes the sum of the pairwise distances between the consensus and the transformed proteins. Experimental results show that the algorithm converges quite rapidly and generates consensus structures that are visually similar to the input proteins. A comparison with other coordinate-based alignment algorithms (MAMMOTH and MATT) shows that the proposed algorithm is competitive in terms of speed and the sizes of the conserved regions discovered in an extensive benchmark dataset derived from the HOMSTRAD and SABmark databases. The algorithm has been implemented in C++ and can be downloaded from the project's web page. Alternatively, the algorithm can be used via a web server which makes it possible to align protein structures by uploading files from local disk or by downloading protein data from the RCSB Protein Data Bank. An algorithm is presented to compute a multiple structure alignment for a set of proteins, together with their consensus structure. Experimental results show its effectiveness in terms of the quality of the alignment and computational cost.

Tài liệu tham khảo

Guda C, Scheeff ED, Bourne PE, Shindyalov IN: A new algorithm for the alignment of multiple protein structures using Monte Carlo optimization. Proceedings of the Pacific Symposium on Biocomputing: 3–7 January 2001; Hawaii 2001, 275–286. Lupyan D, Leo-Macias A, Ortiz AR: A new progressive-iterative algorithm for multiple structure alignment. Bioinformatics 2005, 21: 3255–3263. 10.1093/bioinformatics/bti527 Menke M, Berger B, Cowen L: Matt: Local Flexibility Aids Protein Multiple Structure Alignment. PLoS Computational Biology 2008, 4: 0088–0099. 10.1371/journal.pcbi.0040010 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Research 2000, 28: 235–242. 10.1093/nar/28.1.235 Ye J, Janardan R: Approximate multiple protein structure alignment using the Sum-of-Pairs distance. Journal of Computational Biology 2004, 11(5):986–1000. 10.1089/cmb.2004.11.986 Gusfield D: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge; 1997. Ye J, Janardan R, Liu S: Pairwise protein structure alignment based on an orientation-independent backbone representation. Journal of Bioinformatics and Computational Biology 2004, 2(4):699–717. 10.1142/S021972000400082X Waterhouse AM, Procter JB, A MDM, M C, Barton GJ: Jalview Version 2 - a multiple sequence alignment editor and analysis workbench. Bioinformatics 2009, 25(9):1189–1191. 10.1093/bioinformatics/btp033 Chemis3D: Molecular Viewer Applet[http://chemis.free.fr/mol3d/] Mizuguchi K, Deane CM, Blundell TL, Overington JP: HOMSTRAD: a database of protein structure alignments for homologous families. Protein Science 1998, 7: 2469–2471. 10.1002/pro.5560071126 VanWalle I, Lasters I, Wyns L: SABmark - A benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 2005, 21: 1267–1268. 10.1093/bioinformatics/bth493 Madhusudhanm MS, Webb BM, Marti-Renom MA, Eswar N, Sali A: Alignment of multiple protein structures based on sequence and structure features. Protein Engineering, Design & Selection 2009, 22(9):569–574. 10.1093/protein/gzp040 Venclovas C, Zemla A, Fidelis K, Moult J: Comparison of performance in successive CASP experiments. Proteins 2001, 45(S5):163–170. 10.1002/prot.10053 Holm L, Sander C: Protein Structure Comparison by Alignment of Distance Matrices. Journal of Molecular Biology 1993, 233: 123–138. 10.1006/jmbi.1993.1489 Singh AP, Brutlag DL: Hierarchical protein structure superposition using both secondary structure and atomic representation. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology: 21–26 June, 1997; Halkidiki 1997, 284–293. Golub GH, Van Loan CF: Matrix Computations. Johns Hopkins University Press, Baltimore; 1996. Chew LP, Kedem K: Finding the consensus shape of a protein family. Proceedings of the Eighteenth Annual ACM Symposium on Computational Geometry: 5–7 June 2002; Barcelona 2002, 64–73. full_text Umeyama S: Least-square estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 1991, 13(4):376–380. 10.1109/34.88573

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA