Secondary structure spatial conformation footprint: a novel method for fast protein structure comparison and classification

Springer Science and Business Media LLC - Tập 6 - Trang 1-12 - 2006
Elena Zotenko1,2, Dianne P O'Leary1,3, Teresa M Przytycka2
1Department of Computer Science, University of Maryland, College Park, USA
2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA.
3Institute for Advanced Computer Studies, University of Maryland, College Park, USA

Tóm tắt

Recently a new class of methods for fast protein structure comparison has emerged. We call the methods in this class projection methods as they rely on a mapping of protein structure into a high-dimensional vector space. Once the mapping is done, the structure comparison is reduced to distance computation between corresponding vectors. As structural similarity is approximated by distance between projections, the success of any projection method depends on how well its mapping function is able to capture the salient features of protein structure. There is no agreement on what constitutes a good projection technique and the three currently known projection methods utilize very different approaches to the mapping construction, both in terms of what structural elements are included and how this information is integrated to produce a vector representation. In this paper we propose a novel projection method that uses secondary structure information to produce the mapping. First, a diverse set of spatial arrangements of triplets of secondary structure elements, a set of structural models, is automatically selected. Then, each protein structure is mapped into a high-dimensional vector of "counts" or footprint, where each count corresponds to the number of times a given structural model is observed in the structure, weighted by the precision with which the model is reproduced. We perform the first comprehensive evaluation of our method together with all other currently known projection methods. The results of our evaluation suggest that the type of structural information used by a projection method affects the ability of the method to detect structural similarity. In particular, our method that uses the spatial conformations of triplets of secondary structure elements outperforms other methods in most of the tests.

Tài liệu tham khảo

Redfern O, Alastair G, Maibaum M, Orengo C: Survey of current protein family databases and their application in comparative, structural and functional genomics. J Chromatogr B Analyt Technol Biomed Life Sci 2005, 815: 97–107. 10.1016/j.jchromb.2004.11.010 Murzin A, Brenner S, Hubbard T, Chotia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247: 536–540. 10.1006/jmbi.1995.0159 Orengo C, Michie A, Jones S, Jones D, Swindells M, Thornton J: CATH – A hierarchic classification of protein domain structures. Structure 1997, 5: 1093–1108. 10.1016/S0969-2126(97)00260-8 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235 Nussinov R, Wolfson H: Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. Proc Natl Acad Sci USA 1991, 88: 10495–10499. 10.1073/pnas.88.23.10495 Orengo C, Brown N, Taylor W: Fast structure alignment for protein databank searching. Proteins 1992, 14: 139–167. 10.1002/prot.340140203 Holm L, Sander C: Protein structure comparison by alignment of distance matrices. J Mol Biol 1993, 233: 123–138. 10.1006/jmbi.1993.1489 Gibrat J, Madej T, Bryant S: Surprising similarities in structure comparison. Curr Opin Struct Biol 1996, 6: 377–385. 10.1016/S0959-440X(96)80058-3 Gerstein M, Levitt M: Comprehensive assessment of automatic structural alignment against a manual standard, the SCOP classification of proteins. Protein Science 1998, 7: 445–456. Shindyalov I, Bourne P: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering 1998, 11: 739–747. 10.1093/protein/11.9.739 Holm L, Sander C: 3-D Lookup: fast protein structure database searches at 90% reliability. Proc Int Conf Intell Syst Mol Biol 1995, 3: 179–87. Przytycka TM, Aurora R, Rose GD: A protein taxonomy based on secondary structure. Nature Structural Biology 1999, 6: 672–682. 10.1038/10728 Martin A: The ups and downs of protein topology; rapid comparison of protein structure. Protein Engineering 2000, 13: 829–837. 10.1093/protein/13.12.829 Rogen P, Fain B: Automatic classification of protein structure by using Gauss integrals. Proc Natl Acad Sci USA 2003, 100: 119–124. 10.1073/pnas.2636460100 Krissinel E, Henrick K: Protein structure comparison in 3D based on secondary structure matching (SSM) followed by CA alignment, scored by a new structural similarity function. Proceedings of the 5th International Conference on Molecular Structural Biology 2003. Camoglu O, Kahveci T, Singh A: PSI: indexing protein structures for fast similarity search. Bioinformatics 2003, (19 Suppl 1):i81-i83. 10.1093/bioinformatics/btg1009 Choi I, Kwon J, Kim S: Local feature frequency profile: a method to measure structural similarity in proteins. Proc Natl Acad Sci USA 2004, 101: 3797–3802. 10.1073/pnas.0308656100 Comin M, Guerra C, Zanotti G: PROuST: A comparison method of three-dimensional structures of proteins using indexing techniques. J Comput Biol 2004, 11: 1061–1072. 10.1089/cmb.2004.11.1061 Carugo O, Pongor S: Protein fold similarity estimated by a probabilistic approach based on C[alpha]-C[alpha] distance comparison. J Mol Biol 2002, 315: 887–898. 10.1006/jmbi.2001.5250 Gáspári Z, Vlahovicek K, Pongor S: Efficient recognition of folds in protein 3D structures by the improved PRIDE algorithm. Bioinformatics 2005, 21(15):3322–3323. 10.1093/bioinformatics/bti513 Jeong J, Berman P, Przytycka T: Fold classification based on secondary structure–how much is gained by including loop topology? BMC Struct Biol 2006, 6: 3. 10.1186/1472-6807-6-3 Lodhi H, Saunders G, Shawe-Taylor J, Cristianini N, Watkins C: Text classification using string kernels. Journal of Machine Learning Research 2002, 2: 419–444. 10.1162/153244302760200687 Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U: Superfamilies of evolved and designed networks. Science 2004, 303(5663):1538–1542. 10.1126/science.1089167 Holm L, Sander C: Mapping the protein universe. Science 1996, 273(5275):595–603. The MMDB Database[http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml] Kabsch W, Sander C: Secondary structure definition by the program DSSP. Biopolymers 1983, 22: 2577–2637. 10.1002/bip.360221211 Chandonia J, Hon G, Walker N, Conte LL, Koehl P, Levitt M, Brenner S: The ASTRAL Compendium in 2004. Nucleic Acids Research 2004, 32(Database issue):D189-D192. 10.1093/nar/gkh034 The BioPython Project[http://www.biopython.org]