How representative are the known structures of the proteins in a complete genome? A comprehensive structural census

Folding and Design - Tập 3 - Trang 497-512 - 1998
Mark Gerstein1
1Department of Molecular Biophysics & Biochemistry 266 Whitney Avenue Yale University PO Box 208114, New Haven CT 06520, USA

Tài liệu tham khảo

Garnier, 1996, GOR method for predicting protein secondary structure from amino acid sequence, Methods Enzymol, 266, 540, 10.1016/S0076-6879(96)66034-0 Gibrat, 1987, Further developments of protein secondary structure prediction using information theory, J. Mol. Biol, 198, 425, 10.1016/0022-2836(87)90292-0 Rost, 1996, PHD: predicting one-dimensional protein secondary structure by profile-based neural networks, Methods Enzymol, 266, 525, 10.1016/S0076-6879(96)66033-9 Rost, 1992, Jury returns on structure prediction, Nature, 360, 540, 10.1038/360540b0 Rost, 1993, Prediction of protein secondary structure at better than 70% accuracy, J. Mol. Biol, 232, 584, 10.1006/jmbi.1993.1413 Benner, 1992, Correct structure prediction?, Nature, 359, 781, 10.1038/359781a0 Benner, 1994, Predicting protein crystal structures, Science, 265, 1642, 10.1126/science.8085149 Benner, 1993, Predicting the conformation of proteins. Man versus machine, FEBS Lett, 325, 29, 10.1016/0014-5793(93)81408-R Scharf, 1994, GeneQuiz: a workbench for sequence analysis, 348 Casari, 1995, Challenging times for bioinformatics, Nature, 376, 647, 10.1038/376647a0 Ouzounis, 1995, New protein functions in yeast chromosome VIII, Protein Sci, 4, 2424, 10.1002/pro.5560041121 Arkin, 1997, Are there dominant membrane protein families with a given number of helices?, Proteins, 28, 465, 10.1002/(SICI)1097-0134(199708)28:4<465::AID-PROT1>3.0.CO;2-9 Goffeau, 1993, How many yeast genes code for membrane-spanning proteins?, Yeast, 9, 691, 10.1002/yea.320090703 Rost, 1995, Prediction of helical transmembrane segments at 95% accuracy, Protein Sci, 4, 521, 10.1002/pro.5560040318 Rost, 1996, Topology prediction for helical transmembrane segments at 95% accuracy, Protein Sci, 7, 1704, 10.1002/pro.5560050824 Boyd, 1998, How many membrane proteins are there?, Protein Sci, 7, 201, 10.1002/pro.5560070121 Blaisdell, 1996, Similarities and dissimilarities of phage genomes, Proc. Natl Acad. Sci. USA, 93, 5854, 10.1073/pnas.93.12.5854 Karlin, 1995, Dinucleotide relative abundance extremes: a genomic signature, Trends Genet, 11, 283, 10.1016/S0168-9525(00)89076-9 Karlin, 1992, Statistical analyses of counts and distributions of restriction sites in DNA sequences, Nucleic Acids Res, 20, 1363, 10.1093/nar/20.6.1363 Karlin, 1996, Frequent oligonucleotides and peptides of the haemophilus influenzae genome, Nucleic Acids Res, 24, 4263, 10.1093/nar/24.21.4263 Gerstein, 1997, A structural census of the current population of protein sequences, Proc. Natl Acad. Sci. USA, 94, 11911, 10.1073/pnas.94.22.11911 Gerstein, 1997, A structural census of genomes: comparing eukaryotic, bacterial and archaeal genomes in terms of protein structure, J. Mol. Biol, 274, 562, 10.1006/jmbi.1997.1412 Gerstein, 1998, Comparing microbial genomes in terms of protein structure: surveys of a finite parts list, FEMS Microbiol. Rev, 10.1111/j.1574-6976.1998.tb00371.x Gerstein, 1998, Patterns of protein-fold usage in eight microbial genomes: a comprehensive structural census, Proteins, 33, 518, 10.1002/(SICI)1097-0134(19981201)33:4<518::AID-PROT5>3.0.CO;2-J Bryant, 1995, Statistics of sequence-structure threading, Curr. Opin. Struct. Biol, 5, 236, 10.1016/0959-440X(95)80082-4 Altschul, 1994, Issues in searching molecular sequence databases, Nat. Genet, 6, 119, 10.1038/ng0294-119 Levitt, 1998, A unified statistical framework for sequence comparison and structure comparison, Proc. Natl Acad. Sci. USA, 95, 5913, 10.1073/pnas.95.11.5913 Brenner, 1997, Population statistics of protein structures: lessons from structural classifications, Curr. Opin. Struct. Biol, 7, 369, 10.1016/S0959-440X(97)80054-1 Levitt, 1976, Structural patterns in globular proteins, Nature, 261, 552, 10.1038/261552a0 Panchenko, 1996, Foldons, protein structural modules, and exons, Proc. Natl Acad. Sci. USA, 93, 2008, 10.1073/pnas.93.5.2008 Netzer, 1997, Recombination of protein domains facilitated by co-translational folding in eukaryotes, Nature, 388, 343, 10.1038/41024 Das, 1997, Biology's new Rosetta stone, Nature, 385, 29, 10.1038/385029a0 Argos, 1988, An investigation of protein subunit and domain interfaces, Protein Eng, 2, 101, 10.1093/protein/2.2.101 Olsen, 1994, The winds of (evolutionary) change: breathing new life into microbiology, J. Bacteriol, 176, 1, 10.1128/jb.176.1.1-6.1994 Koonin, 1996, Sequencing and analysis of bacterial genomes, Curr. Biol, 6, 404, 10.1016/S0960-9822(02)00508-0 Lansing, 1996 Tomb, 1997, The complete genome sequence of the gastric pathogen Helicobacter pylori, Nature, 388, 539, 10.1038/41483 Doolittle, 1997, A bug with excess gastric acidity, Nature, 388, 515, 10.1038/41418 Wootton, 1993, Statistics of local complexity in amino acid sequences and sequence databases, Comput. Chem, 17, 149, 10.1016/0097-8485(93)85006-X Wootton, 1996, Analysis of compositionally biased regions in sequence databases, Methods Enzymol, 266, 554, 10.1016/S0076-6879(96)66035-2 Gerstein, 1993, Domain closure in lactoferrin: two hinges produce a see-saw motion between alternative close-packed interfaces, J. Mol. Biol, 234, 357, 10.1006/jmbi.1993.1592 Gerstein, 1998, A database of macromolecular movements, Nucleic Acids Res, 26, 4280, 10.1093/nar/26.18.4280 Weiss, 1991, Molecular architecture and electrostatic properties of a bacterial porin, Science, 254, 1627, 10.1126/science.1721242 Efron, 1991, Statistical data analysis in the computer age, Science, 253, 390, 10.1126/science.253.5018.390 von Heijne, 1992, Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule, J. Mol. Biol, 225, 487, 10.1016/0022-2836(92)90934-C von Heijne, 1994, Membrane proteins: from sequence to structure, Annu. Rev. Biophys. Biomol. Struct, 23, 167, 10.1146/annurev.bb.23.060194.001123 Wallin, 1998, Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms, Protein Sci, 7, 1029, 10.1002/pro.5560070420 Abola, 1997, Protein Data Bank archives of three-dimensional macromolecular structures, Methods Enzymol, 277, 556, 10.1016/S0076-6879(97)77031-9 Stampf, 1995, PDBbrowse – a graphics interface to the Brookhaven Protein Data Bank, Nature, 374, 572, 10.1038/374572a0 Murzin, 1995, SCOP: a structural classification of proteins for the investigation of sequences and structures, J. Mol. Biol, 247, 536, 10.1016/S0022-2836(05)80134-2 Brenner, 1995, Gene duplication in H. influenzae, Nature, 378, 140, 10.1038/378140a0 Hubbard, 1997, SCOP: a structural classification of proteins database, Nucleic Acids Res, 25, 236, 10.1093/nar/25.1.236 Altman, 1994, Finding an average core structure: application to the globins, 19 Gerstein, 1995, Average core structures and variability measures for protein families: application to the immunoglobulins, J. Mol. Biol, 251, 161, 10.1006/jmbi.1995.0423 Gerstein, 1996, Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of protein structures, 59 Gerstein, 1998, Comprehensive assessment of automatic structural alignment against a manual standard, the Scop classification of proteins, Protein Sci, 7, 445, 10.1002/pro.5560070226 Wall, 1996 Gaasterland, 1996, Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture, Biochimie, 78, 302, 10.1016/0300-9084(96)84761-4 Medigue, 1995, Analysis of a Bacillus subtilis genome fragment using a co-operative computer system prototype, Gene, 165, GC37, 10.1016/0378-1119(95)00636-K Lipman, 1985, Rapid and sensitive protein similarity searches, Science, 227, 1435, 10.1126/science.2983426 Pearson, 1988, Improved tools for biological sequence analysis, Proc. Natl Acad. Sci. USA, 85, 2444, 10.1073/pnas.85.8.2444 Brenner, 1998, Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships, Proc. Natl Acad. Sci. USA, 95, 6073, 10.1073/pnas.95.11.6073 Pearson, 1998, Empirical statistical estimates for sequence similarity searches, J. Mol. Biol, 276, 71, 10.1006/jmbi.1997.1525 Pearson, 1996, Effective protein sequence comparison, Methods Enzymol, 266, 227, 10.1016/S0076-6879(96)66017-0 Pearson, 1997, Identifying distantly related protein sequences, Comput. Appl. Biosci, 13, 325 Gerstein, 1998, Measurement of the effectiveness of transitive sequence comparison, through a third ‘intermediate’ sequence, Bioinformatics, 14, 707, 10.1093/bioinformatics/14.8.707 Smith, 1981, Identification of common molecular subsequences, J. Mol. Biol, 147, 195, 10.1016/0022-2836(81)90087-5 Bowie, 1993, Inverted protein structure prediction, Curr. Opin. Struct. Biol, 3, 437, 10.1016/S0959-440X(05)80118-6 Tatusov, 1994, Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks, Proc. Natl Acad. Sci. USA, 91, 12091, 10.1073/pnas.91.25.12091 Eddy, 1996, Hidden Markov models, Curr. Opin. Struct. Biol, 6, 361, 10.1016/S0959-440X(96)80056-X Dubchak, 1995, Prediction of protein folding class using global description of amino acid sequence, Proc. Natl Acad. Sci. USA, 92, 8700, 10.1073/pnas.92.19.8700 Altschul, 1997, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res, 25, 3389, 10.1093/nar/25.17.3389 Teichmann, 1998, Structural assignments to the proteins of Mycoplasma genitalium show that they have been formed by extensive gene duplications and domain rearrangements, Proc. Natl Acad. Sci. USA, 10.1073/pnas.95.25.14658 Hobohm, 1992, Selection of a representative set of structures from the Brookhaven Protein Data Bank, Protein Sci, 1, 409, 10.1002/pro.5560010313 Boberg, 1992, Selection of a representative set of structures from Brookhaven Protein Data Bank, Proteins, 14, 265, 10.1002/prot.340140212 Kaufman, 1990 Felsenstein, 1993 Engelman, 1986, Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins, Annu. Rev. Biophys. Biophysical Chem, 15, 321, 10.1146/annurev.bb.15.060186.001541 Wootton, 1994, Non-globular domains in protein sequences: automated segmentation using complexity measures, Comput. Chem, 18, 269, 10.1016/0097-8485(94)85023-2 Frishman, 1997, PEDANTic genome analysis, Trends Genet, 13, 415, 10.1016/S0168-9525(97)01224-9 Garnier, 1978, Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins, J. Mol. Biol, 120, 97, 10.1016/0022-2836(78)90297-8 King, 1996, Identification and application of the concepts important for accurate and reliable protein secondary structure prediction, Protein Sci, 5, 2298, 10.1002/pro.5560051116 Rost, 1996, Topology prediction for helical transmembrane proteins at 86% accuracy, Protein Sci, 5, 1704, 10.1002/pro.5560050824 Salamov, 1995, Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments, J. Mol. Biol, 247, 11, 10.1006/jmbi.1994.0116 Dubchak, 1993, Prediction of protein folding class from amino acid composition, Proteins, 16, 79, 10.1002/prot.340160109 Metfessel, 1993, Cross-validation of protein structural class prediction using statistical clustering and neural networks, Protein Sci, 2, 1171, 10.1002/pro.5560020712 Fleischmann, 1995, Whole-genome random sequencing and assembly of haemophilus influenzae rd, Science, 269, 496, 10.1126/science.7542800 Fraser, 1995, The minimal gene complement of Mycoplasma genitalium, Science, 270, 397, 10.1126/science.270.5235.397 Bult, 1996, Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii, Science, 273, 1058, 10.1126/science.273.5278.1058 Kaneko, 1996, Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions, DNA Res, 3, 109, 10.1093/dnares/3.3.109 Himmelreich, 1996, Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae, Nucleic Acids Res, 24, 4420, 10.1093/nar/24.22.4420 Goffeau, 1997, The yeast genome directory, Nature, 387, 5, 10.1038/387s005 Blattner, 1997, The complete genome sequence of Escherichia coli K-12, Science, 277, 1453, 10.1126/science.277.5331.1453 Chakrabartty, 1994, Helix propensities of the amino acids measured in alanine-based peptides without helix-stabilizing side-chain interactions, Protein Sci, 3, 843, 10.1002/pro.5560030514 Smith, 1994, A thermodynamic scale for the beta-sheet forming tendencies of the amino acids, Biochemistry, 33, 5510, 10.1021/bi00184a020 Press, 1992 Amari, 1982, Differential geometry of curved exponential families – curvatures and information loss, Ann. Stat, 10, 357, 10.1214/aos/1176345779 Efron, 1986, Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy, Stat. Sci, 1, 54 Simon, 1993