MSAT

Springer Science and Business Media LLC - Tập 3 - Trang 149-158 - 2012
Te Ren1, Mallika Veeramalai1, Aik Choon Tan1, David Gilbert1
1Bioinformatics Research Centre, Department of Computing Science, University of Glasgow, Glasgow, UK

Tóm tắt

This article describes the development of a new method for multiple sequence alignment based on fold-level protein structure alignments, which provides an improvement in accuracy compared with the most commonly used sequence-only-based techniques. This method integrates the widely used, progressive multiple sequence alignment approach ClustalW with the Topology of Protein Structure (TOPS) topology-based alignment algorithm. The TOPS approach produces a structural alignment for the input protein set by using a topology-based pattern discovery program, providing a set of matched sequence regions that can be used to guide a sequence alignment using ClustalW. The resulting alignments are more reliable than a sequence-only alignment, as determined by 20-fold cross-validation with a set of 106 protein examples from the CATH database, distributed in seven superfold families. The method is particularly effective for sets of proteins that have similar structures at the fold level but low sequence identity. The aim of this research is to contribute towards bridging the gap between protein sequence and structure analysis, in the hope that this can be used to assist the understanding of the relationship between sequence, structure and function. The tool is available at http://balabio.dcs.gla.ac.uk/msat/.

Tài liệu tham khảo

Goldsmith-Fischman S, Honig B. Structural genomics: computational methods for structure analysis. Protein Sci 2003; 12: 1813–21 Kim S-H. Shining a light on structural genomics. Nat Struct Biol 1998; 5: 643–5 Brenner SE, Levitt M. Expectations from structural genomics. Protein Sci 2000; 9: 197–200 Murzin AG, Brenner SE, Hubbard T, et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995; 247: 536–40 Orengo CA, Michie AD, Jones S, et al. CATH: ahierarchic classification of protein domain structures. Structure 1997; 5: 1093–108 Holm L, Sander C. Dali/FSSP classification of three-dimensional protein folds. Nucleic Acids Res 1997; 25: 231–4 Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol 1990; 215: 403–10 Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997; 25: 3389–3402 Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 1988; 85: 2444–8 Gilbert DR, Westhead DR, Nagano N, et al. Motif-based searching in TOPS protein topology databases. Bioinformatics 1999; 15: 317–26 Gilbert DR, Westhead DR, Viksna J, et al. A computer system to perform structure comparison using TOPS representations of protein structure. Comput Chem 2001; 26: 23–30 Viksna J, Gilbert DR. Pattern matching and pattern discovery algorithms for protein topologies. In: Gascuel O, Moret BME. Algorithms in bioinformatics: First International Workshop, WABI 2001, Århus, Denmark, August 28–31, 2001. Lecture Notes in Computer Science. New York: Springer, 2001 Sternberg MJE, Thornton JM. On the conformation of proteins: the handedness of the connection between parallel beta-strands. J Mol Biol 1977; 110: 269–83 Sayle RA, Milner-White EJ. RASMOL: biomolecular graphics for all. Trends Biochem Sci 1995; 20: 374–6 Flores TP, Moss DM, Thornton JM. An algorithm for automatically generating protein topology cartoons. Protein Eng 1994; 7: 31–7 Westhead DR, Hutton DC, Thornton JM. An atlas of protein topology cartoons available on the World Wide Web. Trends Biochem Sci 1998; 23: 35–6 Westhead DR, Slidel TWF, Flores TPJ, et al. Protein structural topology: automated analysis and diagrammatic representation. Protein Sci 1999; 8: 897–904 Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970; 48: 443–53 Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol 1981; 147: 195–7 Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994; 22: 4673–80 Notredame C, Higgins D, Heringa J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 2000; 302: 205–17 Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 1983; 22: 2577–637 Slidel TWF, Thornton JM. Chirality in protein structure. In: Bohr H, Brunak S, editors. Protein folds: a distance-based approach. Boca Raton (FL): CRC Press, 1996: 253–64 Eddy SR. Profile hidden Markov models. Bioinformatics 1998; 14: 755–63 Durbin R, Eddy S, Krogh A, et al. Biological sequence analysis. Cambridge: Cambridge University Press, 1998 Mian JS, Dubchak I. Representing and reasoning about protein families using generative and discriminative methods. J Comput Biol 2000; 7(6): 849–62 Gribskov M, Robinson NL. Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput Chem 1996; 20: 25–33 van Rijsbergen CJ. Information retrieval. London: Butterworths, 1979 Brazma A, Jonassen I, Eidhammer I, et al. Approaches to the automatic discovery of patterns in biosequences. J Comput Biol 1998; 5: 277–303 Nagano N, Orengo CA, Thornton JM. One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. J Mol Biol 2002; 321: 741–65 Bell CE, Yeates TO, Eisenberg D. Unusual conformation of nicotinamide adenine dinucleotide (NAD) bound to diphtheria toxin: a comparison with NAD bound to the oxidoreductase enzymes. Protein Sci 1997; 6: 2084–96 Dalton JAR, Michalopoulos I, Westhead DR. Calculation of helix packing angles in protein structures. Bioinformatics 2003; 19: 1298–9 Michaelopoulos I, Torrance GM, Gilbert DR, et al. TOPS: an enhanced database of protein structural topology. Nucleic Acids Res 2004; 32: D251–4