Information Content of Protein Sequences

Journal of Theoretical Biology - Tập 206 - Trang 379-386 - 2000
OLAF WEISS1, MIGUEL A JIMÉNEZ-MONTAÑO2, HANSPETER HERZEL1
1Institute for Theoretical Biology, Humboldt University Berlin, Invalidenstr. 43, D-10115, Berlin, Germany
2Universidad de las Américas/Puebla Sta. Catarina Mártir, 72820, Puebla, México

Tài liệu tham khảo

BAŠARIN, 1959, On a statistical estimate for the entropy of a sequence of independent random variables, Teor. Verojatnost. i Primenen, 4, 361 BERMAN, 1994, Underlying order in protein sequence organization, Proc. Natl. Acad. Sci. U.S.A., 91, 4044, 10.1073/pnas.91.9.4044 BURROWS, M. WHEELER, D. J. 1994, A block-sorting lossless data compression algorithm, 124, Digital Systems Research Center, Palo Alto CHECHETKIN, 1999, Characterization and comparison of protein structures. Part I—characterization, J. theor. Biol., 198, 197, 10.1006/jtbi.1999.0910 EBELING, 1980, On grammars, complexity, and information measures of biological macromolecules, Math. Biosci., 52, 53, 10.1016/0025-5564(80)90004-8 GARNIER, 1978, Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins, J. Mol. Biol., 120, 95, 10.1016/0022-2836(78)90297-8 GATLIN, 1972 GERSTEIN, 1994, Volume changes on protein evolution, J. Mol. Biol., 236, 1067, 10.1016/0022-2836(94)90012-4 GOLUMBFSKIE, 1999, Simulation of biomimetic recognition between polymers and surfaces, Proc. Natl. Acad. Sci. U.S.A., 96, 11707, 10.1073/pnas.96.21.11707 GROSSE, 2000, Species independence of mutual information in coding and noncoding DNA, Phys. Rev. E, 61, 5624, 10.1103/PhysRevE.61.5624 GRUMBACH, 1994, A new challenge for compression algorithms: genetic sequences, J. Inf. Process. Manage., 30, 875, 10.1016/0306-4573(94)90014-0 HARRIS, 1975, The statistical estimation of entropy in the non-parametric case, 10.21236/ADA020217 HERZEL, 1988, Complexity of symbol sequences, Syst. Anal. Model. Simul., 5, 435 HERZEL, 1994, Entropies of biosequences: the role of repeats, Phys. Rev. E, 50, 5061, 10.1103/PhysRevE.50.5061 JIMÉNEZ-MONTAÑO, 1984, On the syntactic structure of protein sequences and the concept of grammar complexity, Bull. Mat. Biol., 46, 641, 10.1007/BF02459508 JIMÉNEZ-MONTAÑO, 1997, 113 KANEHISA, 1980, Hydrophobicity and protein structure, Biopolymers, 19, 1617, 10.1002/bip.1980.360190906 KOLMOGOROV, 1968, Three approaches to the definition of the concept quantity of information, IEEE Trans. Inf. Theory, IT-14, 662, 10.1109/TIT.1968.1054210 KULLBACK, 1959 LIQUORI, 1986, Pattern recognition of sequence similarities in globular proteins by fourier analysis: a novel approach to molecular evolution, J. Mol. Evol., 23, 80, 10.1007/BF02101001 MACCHIATO, 1985, Determination of the autocorrelation orders of proteins, Eur. J. Biochem., 149, 375, 10.1111/j.1432-1033.1985.tb08935.x MAKEEV, 1996, Search of periodicities in primary structure of biopolymers: a general Fourier approach, CABIOS, 12, 49 MANTEGNA, 1994, Linguistic features of non-coding DNA sequences, Phys. Rev. Lett., 73, 3169, 10.1103/PhysRevLett.73.3169 MONOD, 1969, On symmetry and function of biological systems, 15 PANDE, 1994, Nonrandomness in protein sequences: evidence for a physically driven stage of evolution?, Proc. Natl. Acad. Sci. U.S.A., 91, 12972, 10.1073/pnas.91.26.12972 PTITSYN, 1986, Protein structures and neutral theory of evolution, J. Biomol. Struct. Dyn., 4, 137, 10.1080/07391102.1986.10507651 RACKOWSKY, 1998, “Hidden” sequence periodicities and protein architecture, Proc. Natl. Acad. Sci. U.S.A., 95, 8580, 10.1073/pnas.95.15.8580 RANI, 1996, Pair preferences: a quantitative measure of regularities in protein sequences, J. Biomol. Struct. Dynam., 13, 935, 10.1080/07391102.1996.10508908 RIVALS, 1997, Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequences, CABIOS, 13, 131 SAITO, 1997, Evolution of the folding ability of proteins through functional selection, Proc. Natl. Acad. Sci. U.S.A., 94, 11324, 10.1073/pnas.94.21.11324 SCHMITT, 1997, Estimating the entropy of DNA sequences, J. theor. Biol., 188, 369, 10.1006/jtbi.1997.0493 SHANNON, 1948, A mathematical theory of communication, The Bell System Tech. J., 27, 379, 10.1002/j.1538-7305.1948.tb01338.x STRAIT, 1996, The Shannon information entropy of protein sequences, Biophys. J., 71, 148, 10.1016/S0006-3495(96)79210-X WEISS, 1998, Correlations in protein sequences and property codes, J. theor. Biol., 190, 341, 10.1006/jtbi.1997.0560 WELCH, 1984, A technique for high performance data compression, IEEE Comput., 17, 8, 10.1109/MC.1984.1659158 WHITE, 1994, Global statistics of protein sequences: implications for the origin, evolution, and prediction of structure, Annu. Rev. Biophys. Biomolec. Struct., 23, 407, 10.1146/annurev.bb.23.060194.002203 WHITE, 1993, The evolution of proteins from random amino acid sequences. I. Evidence from the lengthwise distribution of amino acids in modern proteins, J. Mol. Evol., 36, 79, 10.1007/BF02407307 WOOTTON, 1993, Statistics of local complexity in amino acid sequences and sequence databases, Comput. Chem., 17, 149, 10.1016/0097-8485(93)85006-X YAMAUCHI, 1998, Characterization of soluble artificial proteins with random sequences, FEBS Lett., 421, 147, 10.1016/S0014-5793(97)01552-4 YOCKEY, 1977, On the information content of cytochrome, J. theor. Biol., 67, 345, 10.1016/0022-5193(77)90043-1 ZIV, 1977, A universal algorithm for sequential data compression, IEEE Trans. Inf. Theor., IT-23, 337, 10.1109/TIT.1977.1055714