Using Genome-Wide Protein Sequence Data to Predict Amino Acid Conservation

The Protein Journal - Tập 27 - Trang 401-407 - 2008
Peter Palenchar1, Mathew Mount1, Douglas Cusato1, Jeffery Dougherty1
1Department of Chemistry, Rutgers University, Camden, (USA)

Tóm tắt

For most proteins, multiple sequence alignments are a viable method to identify functionally and structurally important amino acids, but for most organisms, there is a subset of proteins that are unique or found in a few closely related organisms. For these proteins, it is not possible to produce sequence alignments that are useful in identifying functionally or structurally important amino acids. We have investigated the relationship between amino acid conservation and five factors (the amino acid’s identity, N-terminal neighbor, C-terminal neighbor, the local hydropathy of surrounding amino acids, and the local expected net charge of the surrounding amino acids based on the primary sequence) in Escherichia coli proteins. For four of the factors examined (all but the amino acid’s identity), there is a significant relationship with conservation for some of the standard 20 amino acids. Using the combination of all five factors, we show that it is possible to calculate a score based on the primary sequences of a subset of E. coli proteins that has statistically significant predictive value with respect to predicting conserved amino acids in other E. coli proteins and Saccharomyces cerevisiae proteins. As these five variables show significant relationships with conservation, we have termed them conservation factors.

Tài liệu tham khảo

Argos P, Schwarz J, Schwarz J (1976) Biochim Biophys Acta 439:261–273 Bock JR, Gough DA (2001) Bioinformatics 17:455–460 Boekhorst J, Snel B (2007) BMC Bioinformatics 8:1–7 Chinnasamy A, Mittal A, Sung WK (2006) Comput Biol Med 36:1143–1154 Chou PY, Fasman GD (1978) Adv Enzymol Relat Areas Mol Biol 47:45–148 Gutman GA, Hatfield GW (1989) Proc Natl Acad Sci USA 86:699–703 Kyte J, Doolittle RF (1982) J Mol Biol 157:105–312 Palenchar PM (2008) Protein J 5:283–291 Persson B (2000) EXS 88:215–231 Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) BMC Bioinformatics 4:41