What's in a Likelihood? Simple Models of Protein Evolution and the Contribution of Structurally Viable Reconstructions to the Likelihood

Systematic Biology - Tập 60 Số 2 - Trang 161-174 - 2011
Clemens Lakner1,2, Mark T. Holder3, Nick Goldman4, Gavin J. P. Naylor2
1Department of Biological Science, Section of Ecology and Evolution
2Department of Scientific Computing, Florida State University, Tallahassee, FL 32306-4120, USA
3Department of Ecology and Evolution, University of Kansas, 6031 Haworth, 1200 Sunnyside Avenue, Lawrence, KS 66045
4European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Tóm tắt

Từ khóa


Tài liệu tham khảo

Adachi, 1996, Model of amino acid substitution in proteins encoded by mitochondrial DNA, J. Mol. Biol., 42, 459

Bastolla, 2001, How to guarantee optimal stability for most representative structures in the protein data bank, Proteins, 44, 79, 10.1002/prot.1075

Berman, 2000, The protein data bank, Nucleic Acids Res., 28, 235, 10.1093/nar/28.1.235

Bishop, 1987, Tetrapod relationships: the molecular evidence, Molecules and morphology in evolution: conflict or compromise?, 123

Bowie, 1991, A method to identify protein sequences that fold into a known three-dimensional structure, Science, 253, 164, 10.1126/science.1853201

Cao, 1998, Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders, J. Mol. Evol., 47, 307, 10.1007/PL00006389

Chang, 2002, Recreating a functional ancestral archosaur visual pigment, Mol. Biol. Evol., 19, 1483, 10.1093/oxfordjournals.molbev.a004211

Chiu, 1998, Optimizing potentials for the inverse protein folding problem, Protein Eng, 11, 749, 10.1093/protein/11.9.749

Choi, 2008, Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences. Philos. Trans. R Soc. Lond., B. Biol. Sci., 363, 3931, 10.1098/rstb.2008.0167

Collins, 1994, Compositional bias, character state bias, and character state reconstruction using parsimony, Syst. Biol., 43, 482, 10.1093/sysbio/43.4.482

Das, 2008, Macromolecular modeling with Rosetta, Annu. Rev. Biochem, 77, 363, 10.1146/annurev.biochem.77.062906.171838

Dayhoff, 1978, A model for evolutionary change in proteins, Atlas of protein sequence and structure

Henikoff, 1992, Amino acid substitution matrices from protein blocks, Proc. Natl. Acad. Sci. U.S.A., 89, 10915, 10.1073/pnas.89.22.10915

Hillis, 1998, Taxonomic sampling, phylogenetic accuracy, and investigator bias, Syst. Biol., 47, 3, 10.1080/106351598260987

Hillis, 2003, Is sparse taxon sampling a problem for phylogenetic inference?, Syst. Biol., 52, 124, 10.1080/10635150390132911

Huelsenbeck, 2008, Bayesian analysis of amino acid substitution models, Phil. Trans. R Soc. B, 363, 3941, 10.1098/rstb.2008.0175

Jensen, 2000, Probabilistic models of DNA sequence evolution with context dependent rates of substitution, Adv. Appl. Prob, 32, 499, 10.1239/aap/1013540176

Jones, 1998, THREADER: protein sequence threading by double dynamic programming, Computational methods in molecular biology, 10.1016/S0167-7306(08)60470-6

Jones, 1999, GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences, J. Mol. Biol., 287, 797, 10.1006/jmbi.1999.2583

Jones, 1992, A new approach to protein fold recognition, Nature, 358, 86, 10.1038/358086a0

Jones, 1992, The rapid generation of mutation data matrices from protein sequences, Comput. Appl. Biosci, 8, 275

Jukes, 1969, Evolution of protein molecules, Mammalian protein metabolism., 21, 10.1016/B978-1-4832-3211-9.50009-7

Kim, 2004, Protein structure prediction and analysis using the Robetta server, Nucleic Acids Res., 32, W526, 10.1093/nar/gkh468

Koehl, 1999, De novo protein design. I. In search of stability and specificity, J. Mol. Biol., 293, 1161, 10.1006/jmbi.1999.3211

Koehl, 1999, De novo protein design. II. Plasticity in sequence space, J. Mol. Biol., 293, 1183, 10.1006/jmbi.1999.3212

Koshi, 1998, Models of natural mutations including site heterogeneity, Proteins, 32, 289, 10.1002/(SICI)1097-0134(19980815)32:3<289::AID-PROT4>3.0.CO;2-D

Koshi, 1997, Beyond mutation matrices: physical-chemistry based evolutionary models. Genome. Inform. Ser. Workshop Genome, Inform, 8, 80

Krishnan, 2004, Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference, Mol. Biol. Evol., 21, 1871, 10.1093/molbev/msh198

Le, 2008, An improved general amino acid replacement matrix, Mol. Biol. Evol., 25, 1307, 10.1093/molbev/msn067

Le, 2010, Accounting for solvent accessibility and secondary structure in protein phylogenetics is clearly beneficial, Syst. Biol., 59, 277, 10.1093/sysbio/syq002

Lio, 1999, Using protein structural information in evolutionary inference: transmembrane proteins, Mol. Biol. Evol., 16, 1696, 10.1093/oxfordjournals.molbev.a026083

Lio, 1998, PASSML: combining evolutionary inference and protein secondary structure prediction, Bioinformatics, 14, 726, 10.1093/bioinformatics/14.8.726

Mateiu, 2006, Inferring complex DNA substitution processes on phylogenies using uniformization and data augmentation, Syst. Biol., 55, 259, 10.1080/10635150500541599

Meller, 2001, Linear programming optimization and a double statistical filter for protein threading protocols, Proteins, 45, 241, 10.1002/prot.1145

Misura, 2005, Progress and challenges in high-resolution refinement of protein structure models, Proteins, 59, 15, 10.1002/prot.20376

Nielsen, 2001, Mutations as missing data: inferences on the ages and distributions of nonsynonymous and synonymous mutations, Genetics, 159, 401, 10.1093/genetics/159.1.401

Pedersen, 2001, A dependent-rates model and an MCMC-based methodology for the maximum-likelihood analysis of sequences with overlapping reading frames, Mol. Biol. Evol., 18, 763, 10.1093/oxfordjournals.molbev.a003859

Pollock, 1999, Coevolving protein residues: maximum likelihood identification and relationship to structure, J. Mol. Biol., 287, 187, 10.1006/jmbi.1998.2601

Pollock, 2002, Increased taxon sampling is advantageous for phylogenetic inference, Syst. Biol., 51, 664, 10.1080/10635150290102357

Pollock, 2007, Dealing with uncertainty in ancestral reconstruction: sampling from the posterior distribution, Ancestral sequence reconstruction, 10.1093/acprof:oso/9780199299188.003.0008

Pollock, 1997, Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution, Protein Eng, 10, 647, 10.1093/protein/10.6.647

Rivas, 2008, Probabilistic phylogenetic inference with insertions and deletions, PLoS Comput. Biol., 4, e1000172, 10.1371/journal.pcbi.1000172

Robinson, 2003, Protein evolution with dependence among codons due to tertiary structure, Mol. Biol. Evol., 20, 1692, 10.1093/molbev/msg184

Rodrigue, 2005, Site interdependence attributed to tertiary structure in amino acid sequence evolution, Gene, 347, 207, 10.1016/j.gene.2004.12.011

Rodrigue, 2006, Assessing site-interdependent phylogenetic models of sequence evolution, Mol. Biol. Evol., 23, 1762, 10.1093/molbev/msl041

Rodrigue, 2007, Exploring fast computational strategies for probabilistic phylogenetic analysis, Syst. Biol., 56, 711, 10.1080/10635150701611258

Rodrigue, 2008, Uniformization for sampling realizations of Markov processes: applications to Bayesian implementations of codon substitution models, Bioinformatics, 24, 56, 10.1093/bioinformatics/btm532

Rohl, 2004, Protein structure prediction using Rosetta, Meth. Enzymol, 383, 66, 10.1016/S0076-6879(04)83004-0

Sanderson, 1994, TreeBASE: a prototype database of phylogenetic analyses and an interactive tool for browsing the phylogeny of life, Am. J. Bot, 81, 183

Simons, 1997, Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions, J. Mol. Biol., 268, 209, 10.1006/jmbi.1997.0959

Thorne, 2007, Protein evolution constraints and model-based techniques to study them, Curr. Opin. Struct. Biol., 17, 337, 10.1016/j.sbi.2007.05.006

Wang, 2005, Context dependence and coevolution among amino acid residues in proteins, Meth. Enzymol, 395, 779, 10.1016/S0076-6879(05)95040-4

Whelan, 2001, A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach, Mol. Biol. Evol., 18, 691, 10.1093/oxfordjournals.molbev.a003851

Williams, 2006, Assessing the accuracy of ancestral protein reconstruction methods, PLoS Comput. Biol., 2, e69, 10.1371/journal.pcbi.0020069

Yang, 1993, Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites, Mol. Biol. Evol., 10, 1396

Yang, 1994, Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods, J. Mol. Evol., 39, 306, 10.1007/BF00160154

Yang, 2007, PAML 4: phylogenetic analysis by maximum likelihood, Mol. Biol. Evol., 24, 1586, 10.1093/molbev/msm088

Yang, 1995, Mixed model analysis of DNA sequence evolution, Biometrics, 51, 552, 10.2307/2532943

Yang, 1998, Models of amino acid substitution and applications to mitochondrial protein evolution, Mol. Biol. Evol., 15, 1600, 10.1093/oxfordjournals.molbev.a025888

Zwickl, 2002, Increased taxon sampling greatly reduces phylogenetic error, Syst. Biol., 51, 588, 10.1080/10635150290102339