How representative are the known structures of the proteins in a complete genome? A comprehensive structural census
Tài liệu tham khảo
Garnier, 1996, GOR method for predicting protein secondary structure from amino acid sequence, Methods Enzymol, 266, 540, 10.1016/S0076-6879(96)66034-0
Gibrat, 1987, Further developments of protein secondary structure prediction using information theory, J. Mol. Biol, 198, 425, 10.1016/0022-2836(87)90292-0
Rost, 1996, PHD: predicting one-dimensional protein secondary structure by profile-based neural networks, Methods Enzymol, 266, 525, 10.1016/S0076-6879(96)66033-9
Rost, 1992, Jury returns on structure prediction, Nature, 360, 540, 10.1038/360540b0
Rost, 1993, Prediction of protein secondary structure at better than 70% accuracy, J. Mol. Biol, 232, 584, 10.1006/jmbi.1993.1413
Benner, 1992, Correct structure prediction?, Nature, 359, 781, 10.1038/359781a0
Benner, 1994, Predicting protein crystal structures, Science, 265, 1642, 10.1126/science.8085149
Benner, 1993, Predicting the conformation of proteins. Man versus machine, FEBS Lett, 325, 29, 10.1016/0014-5793(93)81408-R
Scharf, 1994, GeneQuiz: a workbench for sequence analysis, 348
Casari, 1995, Challenging times for bioinformatics, Nature, 376, 647, 10.1038/376647a0
Ouzounis, 1995, New protein functions in yeast chromosome VIII, Protein Sci, 4, 2424, 10.1002/pro.5560041121
Arkin, 1997, Are there dominant membrane protein families with a given number of helices?, Proteins, 28, 465, 10.1002/(SICI)1097-0134(199708)28:4<465::AID-PROT1>3.0.CO;2-9
Goffeau, 1993, How many yeast genes code for membrane-spanning proteins?, Yeast, 9, 691, 10.1002/yea.320090703
Rost, 1995, Prediction of helical transmembrane segments at 95% accuracy, Protein Sci, 4, 521, 10.1002/pro.5560040318
Rost, 1996, Topology prediction for helical transmembrane segments at 95% accuracy, Protein Sci, 7, 1704, 10.1002/pro.5560050824
Boyd, 1998, How many membrane proteins are there?, Protein Sci, 7, 201, 10.1002/pro.5560070121
Blaisdell, 1996, Similarities and dissimilarities of phage genomes, Proc. Natl Acad. Sci. USA, 93, 5854, 10.1073/pnas.93.12.5854
Karlin, 1995, Dinucleotide relative abundance extremes: a genomic signature, Trends Genet, 11, 283, 10.1016/S0168-9525(00)89076-9
Karlin, 1992, Statistical analyses of counts and distributions of restriction sites in DNA sequences, Nucleic Acids Res, 20, 1363, 10.1093/nar/20.6.1363
Karlin, 1996, Frequent oligonucleotides and peptides of the haemophilus influenzae genome, Nucleic Acids Res, 24, 4263, 10.1093/nar/24.21.4263
Gerstein, 1997, A structural census of the current population of protein sequences, Proc. Natl Acad. Sci. USA, 94, 11911, 10.1073/pnas.94.22.11911
Gerstein, 1997, A structural census of genomes: comparing eukaryotic, bacterial and archaeal genomes in terms of protein structure, J. Mol. Biol, 274, 562, 10.1006/jmbi.1997.1412
Gerstein, 1998, Comparing microbial genomes in terms of protein structure: surveys of a finite parts list, FEMS Microbiol. Rev, 10.1111/j.1574-6976.1998.tb00371.x
Gerstein, 1998, Patterns of protein-fold usage in eight microbial genomes: a comprehensive structural census, Proteins, 33, 518, 10.1002/(SICI)1097-0134(19981201)33:4<518::AID-PROT5>3.0.CO;2-J
Bryant, 1995, Statistics of sequence-structure threading, Curr. Opin. Struct. Biol, 5, 236, 10.1016/0959-440X(95)80082-4
Altschul, 1994, Issues in searching molecular sequence databases, Nat. Genet, 6, 119, 10.1038/ng0294-119
Levitt, 1998, A unified statistical framework for sequence comparison and structure comparison, Proc. Natl Acad. Sci. USA, 95, 5913, 10.1073/pnas.95.11.5913
Brenner, 1997, Population statistics of protein structures: lessons from structural classifications, Curr. Opin. Struct. Biol, 7, 369, 10.1016/S0959-440X(97)80054-1
Levitt, 1976, Structural patterns in globular proteins, Nature, 261, 552, 10.1038/261552a0
Panchenko, 1996, Foldons, protein structural modules, and exons, Proc. Natl Acad. Sci. USA, 93, 2008, 10.1073/pnas.93.5.2008
Netzer, 1997, Recombination of protein domains facilitated by co-translational folding in eukaryotes, Nature, 388, 343, 10.1038/41024
Das, 1997, Biology's new Rosetta stone, Nature, 385, 29, 10.1038/385029a0
Argos, 1988, An investigation of protein subunit and domain interfaces, Protein Eng, 2, 101, 10.1093/protein/2.2.101
Olsen, 1994, The winds of (evolutionary) change: breathing new life into microbiology, J. Bacteriol, 176, 1, 10.1128/jb.176.1.1-6.1994
Koonin, 1996, Sequencing and analysis of bacterial genomes, Curr. Biol, 6, 404, 10.1016/S0960-9822(02)00508-0
Lansing, 1996
Tomb, 1997, The complete genome sequence of the gastric pathogen Helicobacter pylori, Nature, 388, 539, 10.1038/41483
Doolittle, 1997, A bug with excess gastric acidity, Nature, 388, 515, 10.1038/41418
Wootton, 1993, Statistics of local complexity in amino acid sequences and sequence databases, Comput. Chem, 17, 149, 10.1016/0097-8485(93)85006-X
Wootton, 1996, Analysis of compositionally biased regions in sequence databases, Methods Enzymol, 266, 554, 10.1016/S0076-6879(96)66035-2
Gerstein, 1993, Domain closure in lactoferrin: two hinges produce a see-saw motion between alternative close-packed interfaces, J. Mol. Biol, 234, 357, 10.1006/jmbi.1993.1592
Gerstein, 1998, A database of macromolecular movements, Nucleic Acids Res, 26, 4280, 10.1093/nar/26.18.4280
Weiss, 1991, Molecular architecture and electrostatic properties of a bacterial porin, Science, 254, 1627, 10.1126/science.1721242
Efron, 1991, Statistical data analysis in the computer age, Science, 253, 390, 10.1126/science.253.5018.390
von Heijne, 1992, Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule, J. Mol. Biol, 225, 487, 10.1016/0022-2836(92)90934-C
von Heijne, 1994, Membrane proteins: from sequence to structure, Annu. Rev. Biophys. Biomol. Struct, 23, 167, 10.1146/annurev.bb.23.060194.001123
Wallin, 1998, Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms, Protein Sci, 7, 1029, 10.1002/pro.5560070420
Abola, 1997, Protein Data Bank archives of three-dimensional macromolecular structures, Methods Enzymol, 277, 556, 10.1016/S0076-6879(97)77031-9
Stampf, 1995, PDBbrowse – a graphics interface to the Brookhaven Protein Data Bank, Nature, 374, 572, 10.1038/374572a0
Murzin, 1995, SCOP: a structural classification of proteins for the investigation of sequences and structures, J. Mol. Biol, 247, 536, 10.1016/S0022-2836(05)80134-2
Brenner, 1995, Gene duplication in H. influenzae, Nature, 378, 140, 10.1038/378140a0
Hubbard, 1997, SCOP: a structural classification of proteins database, Nucleic Acids Res, 25, 236, 10.1093/nar/25.1.236
Altman, 1994, Finding an average core structure: application to the globins, 19
Gerstein, 1995, Average core structures and variability measures for protein families: application to the immunoglobulins, J. Mol. Biol, 251, 161, 10.1006/jmbi.1995.0423
Gerstein, 1996, Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of protein structures, 59
Gerstein, 1998, Comprehensive assessment of automatic structural alignment against a manual standard, the Scop classification of proteins, Protein Sci, 7, 445, 10.1002/pro.5560070226
Wall, 1996
Gaasterland, 1996, Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture, Biochimie, 78, 302, 10.1016/0300-9084(96)84761-4
Medigue, 1995, Analysis of a Bacillus subtilis genome fragment using a co-operative computer system prototype, Gene, 165, GC37, 10.1016/0378-1119(95)00636-K
Lipman, 1985, Rapid and sensitive protein similarity searches, Science, 227, 1435, 10.1126/science.2983426
Pearson, 1988, Improved tools for biological sequence analysis, Proc. Natl Acad. Sci. USA, 85, 2444, 10.1073/pnas.85.8.2444
Brenner, 1998, Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships, Proc. Natl Acad. Sci. USA, 95, 6073, 10.1073/pnas.95.11.6073
Pearson, 1998, Empirical statistical estimates for sequence similarity searches, J. Mol. Biol, 276, 71, 10.1006/jmbi.1997.1525
Pearson, 1996, Effective protein sequence comparison, Methods Enzymol, 266, 227, 10.1016/S0076-6879(96)66017-0
Pearson, 1997, Identifying distantly related protein sequences, Comput. Appl. Biosci, 13, 325
Gerstein, 1998, Measurement of the effectiveness of transitive sequence comparison, through a third ‘intermediate’ sequence, Bioinformatics, 14, 707, 10.1093/bioinformatics/14.8.707
Smith, 1981, Identification of common molecular subsequences, J. Mol. Biol, 147, 195, 10.1016/0022-2836(81)90087-5
Bowie, 1993, Inverted protein structure prediction, Curr. Opin. Struct. Biol, 3, 437, 10.1016/S0959-440X(05)80118-6
Tatusov, 1994, Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks, Proc. Natl Acad. Sci. USA, 91, 12091, 10.1073/pnas.91.25.12091
Eddy, 1996, Hidden Markov models, Curr. Opin. Struct. Biol, 6, 361, 10.1016/S0959-440X(96)80056-X
Dubchak, 1995, Prediction of protein folding class using global description of amino acid sequence, Proc. Natl Acad. Sci. USA, 92, 8700, 10.1073/pnas.92.19.8700
Altschul, 1997, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res, 25, 3389, 10.1093/nar/25.17.3389
Teichmann, 1998, Structural assignments to the proteins of Mycoplasma genitalium show that they have been formed by extensive gene duplications and domain rearrangements, Proc. Natl Acad. Sci. USA, 10.1073/pnas.95.25.14658
Hobohm, 1992, Selection of a representative set of structures from the Brookhaven Protein Data Bank, Protein Sci, 1, 409, 10.1002/pro.5560010313
Boberg, 1992, Selection of a representative set of structures from Brookhaven Protein Data Bank, Proteins, 14, 265, 10.1002/prot.340140212
Kaufman, 1990
Felsenstein, 1993
Engelman, 1986, Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins, Annu. Rev. Biophys. Biophysical Chem, 15, 321, 10.1146/annurev.bb.15.060186.001541
Wootton, 1994, Non-globular domains in protein sequences: automated segmentation using complexity measures, Comput. Chem, 18, 269, 10.1016/0097-8485(94)85023-2
Frishman, 1997, PEDANTic genome analysis, Trends Genet, 13, 415, 10.1016/S0168-9525(97)01224-9
Garnier, 1978, Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins, J. Mol. Biol, 120, 97, 10.1016/0022-2836(78)90297-8
King, 1996, Identification and application of the concepts important for accurate and reliable protein secondary structure prediction, Protein Sci, 5, 2298, 10.1002/pro.5560051116
Rost, 1996, Topology prediction for helical transmembrane proteins at 86% accuracy, Protein Sci, 5, 1704, 10.1002/pro.5560050824
Salamov, 1995, Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments, J. Mol. Biol, 247, 11, 10.1006/jmbi.1994.0116
Dubchak, 1993, Prediction of protein folding class from amino acid composition, Proteins, 16, 79, 10.1002/prot.340160109
Metfessel, 1993, Cross-validation of protein structural class prediction using statistical clustering and neural networks, Protein Sci, 2, 1171, 10.1002/pro.5560020712
Fleischmann, 1995, Whole-genome random sequencing and assembly of haemophilus influenzae rd, Science, 269, 496, 10.1126/science.7542800
Fraser, 1995, The minimal gene complement of Mycoplasma genitalium, Science, 270, 397, 10.1126/science.270.5235.397
Bult, 1996, Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii, Science, 273, 1058, 10.1126/science.273.5278.1058
Kaneko, 1996, Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions, DNA Res, 3, 109, 10.1093/dnares/3.3.109
Himmelreich, 1996, Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae, Nucleic Acids Res, 24, 4420, 10.1093/nar/24.22.4420
Goffeau, 1997, The yeast genome directory, Nature, 387, 5, 10.1038/387s005
Blattner, 1997, The complete genome sequence of Escherichia coli K-12, Science, 277, 1453, 10.1126/science.277.5331.1453
Chakrabartty, 1994, Helix propensities of the amino acids measured in alanine-based peptides without helix-stabilizing side-chain interactions, Protein Sci, 3, 843, 10.1002/pro.5560030514
Smith, 1994, A thermodynamic scale for the beta-sheet forming tendencies of the amino acids, Biochemistry, 33, 5510, 10.1021/bi00184a020
Press, 1992
Amari, 1982, Differential geometry of curved exponential families – curvatures and information loss, Ann. Stat, 10, 357, 10.1214/aos/1176345779
Efron, 1986, Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy, Stat. Sci, 1, 54
Simon, 1993
