Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio Tests
Tóm tắt
Từ khóa
Tài liệu tham khảo
Adachi, 1996, MOLPHY version 2.3.: Programs for molecular phylogenetics based in maximum likelihood, Comput. Sci. Monogr., 28, 1
Agresti, 1990, Categorical data analysis, 2nd edition
Akaike, 1973, Information theory and an extension of the maximum likelihood principle, Second International Symposium on Information Theory, 267
Akaike, 1974, A new look at the statistical model identification, IEEE Trans. Aut. Control, 19, 716, 10.1109/TAC.1974.1100705
Akaike, 1981, Likelihood of a model and information criteria, J. Econometrics, 16, 3, 10.1016/0304-4076(81)90071-3
Akaike, 1983, Information measures and model selection, Int. Stat. Inst., 22, 277
Anderson, 2000, Null hypothesis testing: Problems, prevalence, and an alternative, J. Wildl. Manage, 64, 912, 10.2307/3803199
Aris-Brosou, 2002, Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA phylogeny, Syst. Biol., 51, 703, 10.1080/10635150290102375
Bartlett, 1957, A comment on D, V. Lindley's statistical paradox. Biometrika, 44, 533
Berger, 1987, Testing a point null hypothesis: The irreconcilability of P values and evidence, J. Am. Stat. Assoc., 82, 112
Bollback, 2002, Bayesian model adequacy and choice in phylogenetics, Mol. Biol. Evol., 19, 1171, 10.1093/oxfordjournals.molbev.a004175
Bozdogan, 1987, Model selection and Akaike's information criterion (AIC): The general theory and its analytical extensions, Psychometrika, 52, 345, 10.1007/BF02294361
Bruno, 1999, Topological bias and inconsistency of maximum likelihood using wrong models, Mol. Biol. Evol., 16, 564, 10.1093/oxfordjournals.molbev.a026137
Buckley, 2002, Model misspecification and probabilistic tests of topology: Evidence from empirical data sets, Syst. Biol., 51, 509, 10.1080/10635150290069922
Buckley, 2002, Combined data, Bayesian phylogenetics, and the origin of the New Zealand cicada genera, Syst. Biol., 51, 4, 10.1080/106351502753475844
Buckland, 1997, Model selection uncertainty: An integral part of inference, Biometrics, 53, 603, 10.2307/2533961
Buckley, 2002, The effects of nucleotide substitution model assumptions on estimates of nonparametric bootstrap support, Mol. Biol. Evol., 19, 394, 10.1093/oxfordjournals.molbev.a004094
Buckley, 2001, Exploring among-site rate variation models in a maximum likelihood framework using empirical data: The effects of model assumptions on estimates of topology, branch lengths, and bootstrap support, Syst. Biol., 50, 67, 10.1080/10635150116786
Burnham, 1998, Model selection and inference: A practical information-theoretic approach, 1st ed
Burnham, 2003, Model selection and multimodel inference: A practical information-theoretic approach, 2nd ed
Burnham, 1994, Evaluation of the Kullback-Leibler discrepancy for model selection in open population capture-recapture models, Biometrica J., 36, 299, 10.1002/bimj.4710360308
Cavanaugh, 1999, Generalizing the derivation of the Schwarz information criterion, Commun. Stat. Theory Methods, 28, 49, 10.1080/03610929908832282
Chamberlain, 1890, The method of multiple working hypotheses, Science, 15, 93
Chatfield, 1995, Model uncertainty, data mining and statistical inference, J. R. Stat. Soc. A, 158, 419, 10.2307/2983440
Churchill, 1992, Sample size for a phylogenetic inference, Mol. Biol. Evol., 9, 753
Deleeuw, 1992, Introduction to Akaike 1973 information theory and an extension of the maximum likelihood principle, Breakthroughs in statistics, 599, 10.1007/978-1-4612-0919-5_37
Edwards, 1972, Likelihood
Felsenstein, 1978, Cases in which parsimony or compatibility methods will be positively misleading, Syst. Zool., 27, 401, 10.2307/2412923
Felsenstein, 1981, Evolutionary trees from DNA sequences: A maximum likelihood approach, J. Mol. Evol., 17, 368, 10.1007/BF01734359
Felsenstein, 1981, A likelihood approach to character weighting and what it tells us about parsimony and compatibility, Biol. J. Linnaean Soc., 16, 183, 10.1111/j.1095-8312.1981.tb01847.x
Findley, 1991, Counterexamples to parsimony and BIC, Ann. Inst. Stat. Math., 43, 505, 10.1007/BF00053369
Fisher, 1921, On the ‘probable error’ of a coefficient of correlation deduced from a small sample, Metron I, part, 4, 3
Forster, 2000, Key Concepts in model selection: Performance and generalizability, J. Math. Psychol., 44, 205, 10.1006/jmps.1999.1284
Forster, 2001, The new science of simplicity, Simplicity, inference and modeling, 83
Forster, 2002, Predictive accuracy as am achievable goal of science, Phil. Sci., 69, S124, 10.1086/341840
Forster, 1994, How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions, Br. J. Phil. Sci., 45, 1, 10.1093/bjps/45.1.1
Foulds, 1979, A graph theoretic approach to the development of minimal phylogenetic trees, J. Mol. Evol., 13, 127, 10.1007/BF01732868
Foutz, 1977, The performance of the likelihood ratio test when the model is incorrect, Ann. Stat., 5, 1183, 10.1214/aos/1176344003
Frati, 1997, Gene evolution and phylogeny of the mitochondrial cytochrome oxidase gene in Collembola, J. Mol. Evol., 44, 145, 10.1007/PL00006131
Gelfand, 1996, Model determination using sampling-based methods, Markov chain Monte Carlo in practice, 145
Gilks, 1996, Markov chain Monte Carlo in practice
Golden, 1995, Making correct statistical inferences using a wrong probability model, J. Math. Psychol., 38, 3, 10.1006/jmps.1995.1002
Goldman, 1990, Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses, Syst. Zool., 39, 345, 10.2307/2992355
Goldman, 1993, Statistical tests of models of DNA substitution, J. Mol. Evol., 36, 182, 10.1007/BF00166252
Goldman, 1998, Phylogenetic information and experimental design in molecular systematics, Proc. R. Soc. Lond. B Biol. Sci., 265, 1779, 10.1098/rspb.1998.0502
Goldman, 2000, Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics, Mol. Biol. Evol., 17, 975, 10.1093/oxfordjournals.molbev.a026378
Green, 1995, Reversible jump MCMC computation and Bayesian model determination, Biometrika, 92, 711, 10.1093/biomet/82.4.711
Hasegawa, 1990, Mitochondrial DNA evolution in primates: Transition rate has been extremely low in the lemur, J. Mol. Evol., 31, 113, 10.1007/BF02109480
Hasegawa, 1990, Phylogeny and molecular evolution in primates, Jpn. J. Genet., 65, 243, 10.1266/jjg.65.243
Hasegawa, 1985, Dating the human-ape splitting by a molecular clock of mitochondrial DNA, J. Mol. Evol., 22, 160, 10.1007/BF02101694
Hastings, 1970, Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57, 97, 10.1093/biomet/57.1.97
Hochberg, 1988, A sharper Bonferroni procedure for multiple tests of significance, Biometrika, 75, 800, 10.1093/biomet/75.4.800
Hoeting, 1999, Bayesian model averaging: A tutorial, Stat. Sci., 14, 382
Holder, 2003, Phylogeny estimation: Traditional and Bayesian approaches, Nat. Rev. Genet., 4, 275, 10.1038/nrg1044
Hsiao, 1997, Approximate Bayes factors when a mode occurs on the boundary, J. Am. Stat. Assoc., 92, 656, 10.1080/01621459.1997.10474017
Huelsenbeck, 1997, Phylogeny estimation and hypothesis testing using maximum likelihood, Annu. Rev. Ecol. Syst., 28, 437, 10.1146/annurev.ecolsys.28.1.437
Huelsenbeck, 1993, Success of phylogenetic methods in the four-taxon case, Syst. Biol., 42, 247, 10.1093/sysbio/42.3.247
Huelsenbeck, 2002, Geographic origin of human mitochondrial DNA: Accommodating phylogenetic uncertainty and model comparison, Syst. Biol., 51, 155, 10.1080/106351502753475934
Huelsenbeck, 2004, Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo, Mol. Biol. Evol., 21, 1123, 10.1093/molbev/msh123
Huelsenbeck, 2002, Potential applications and pitfalls of Bayesian inference of phylogeny, Syst. Biol., 51, 673, 10.1080/10635150290102366
Huelsenbeck, 2000, A Bayesian framework for the analysis of cospeciation, Evol. Int. J. Org. Evol., 54, 352, 10.1111/j.0014-3820.2000.tb00039.x
Huelsenbeck, 2001, Bayesian inference of phylogeny and its impact on evolutionary biology, Science, 294, 2310, 10.1126/science.1065889
Hurvich, 1989, Regression and time series model selection in small samples, Biometrika, 76, 297, 10.1093/biomet/76.2.297
Jeffreys, 1939, Theory of probability
Jermiin, 1997, Majority-rule consensus of phylogenetic trees obtained by maximum-likelihood analysis, Mol. Biol. Evol., 14, 1296, 10.1093/oxfordjournals.molbev.a025739
Johnson, 2003, Model selection in ecology and evolution, Trends Ecol. Evol., 19, 101, 10.1016/j.tree.2003.10.013
Jukes, 1969, Evolution of protein molecules, Mammalian protein metabolism, 21, 10.1016/B978-1-4832-3211-9.50009-7
Kadane, 1998, Experiencies in elicitation, J. R. Stat. Soc. D 47 Part, 1, 3, 10.1111/1467-9884.00113
Kass, 1995, A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion, J. Am. Stat. Assoc., 90, 928, 10.1080/01621459.1995.10476592
Kelsey, 1999, Different models, different trees: The geographic origin of PTLV-I, Mol. Phylogenet. Evol., 13, 336, 10.1006/mpev.1999.0663
Kendall, 1979, The advanced theory of statistics, 4th edition
Kent, 1982, Robust properties of likelihood ratio tests, Biometrika, 69, 19
Keuzenkamp, 1995, Simplicity, scientific inference and economic modeling, Econ. J., 105, 1, 10.2307/2235317
Kimura, 1980, A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences, J. Mol. Evol., 16, 111, 10.1007/BF01731581
Kimura, 1981, Estimation of evolutionary distances between homologous nucleotide sequences, Proc. Nat. Acad. Sci. USA, 78, 454, 10.1073/pnas.78.1.454
Kishino, 1989, Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea, J. Mol. Evol., 29, 170, 10.1007/BF02100115
Kuha, 2003, AIC and BIC: Comparisons of assumptions and performance, Sociol. Methods Res.
Larget, 1999, Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees, Mol. Biol. Evol., 16, 750, 10.1093/oxfordjournals.molbev.a026160
Linhart, 1988, A test whether two AIC's differ significantly, S. Afr. Stat. J., 22, 153
Linhart, 1986, Model selection
Madigan, 1995, Eliciting prior information to enhance the predictive performance of Bayesian graphical models, Commun. Stat. Theory Methods, 24, 2271, 10.1080/03610929508831616
Madigan, 1994, Model selection and accounting for model uncertainty in graphical models using Occam's Window, J. Am. Stat. Assoc., 89, 1335, 10.1080/01621459.1994.10476894
Mau, 1997, Phylogenetic inference for binary data on dendrograms using Markov chain Monte Carlo, J. Comp. Grap. Stat.
Mau, 1999, Bayesian phylogenetic inference via Markov chain Monte Carlo methods, Biometrics, 55, 1, 10.1111/j.0006-341X.1999.00001.x
Metropolis, 1953, Equations of state calculations by fast computing machines, J. Chem. Phys., 21, 1087, 10.1063/1.1699114
Miller, 2002, Subset Selection in Regression, 2nd edition edition
Minin, 2003, Performance-based selection of likelihood models for phylogeny estimation, Syst. Biol., 52, 674, 10.1080/10635150390235494
Morozov, 2000, A new method for characterizing replacement rate variation in molecular sequences: Application of the Fourier and Wavelet models to Drosophila and mammalian proteins, Genetics, 154, 381, 10.1093/genetics/154.1.381
Myrvold, 2002, Model Selection, Simplicity, and Scientific Inference, Philos. Sci., 69, S135, 10.1086/341841
Nishii, 1984, Asymptotic properties of criteria for selection of variables in multiple regression, Ann. Stat., 12, 758, 10.1214/aos/1176346522
Nishii, 1988, Maximum likelihood principle and model selection when the true model is unspecified, J. Multivar. Ana., 27
Nylander, 2004, Bayesian Phylogenetics and the Evolution of Gall Wasps, Acta Universitatis Upsaliensis, 43
Nylander, 2004, Bayesian phylogenetic analysis of combined data, Syst. Biol., 53, 47, 10.1080/10635150490264699
Occam, .1320, Scriptum in Librum Primum Sententiarum, Opera Theologica, I
Ogishima, 2000, Efficiencies of information criteria for topology selection in reconstructing molecular phylogenetic tree in Proceedings of International Symposium on Artificial Life and Robotics, 745
Ota, 2000, Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters, Mol. Biol. Evol., 17, 798, 10.1093/oxfordjournals.molbev.a026358
Penny, 1994, The role of models in reconstructing evolutionary trees, Models in Phylogenetic Reconstruction, 211, 10.1093/oso/9780198548249.003.0012
Pol, Empirical problems of the hierarchical likelihood ratio test for model selection, Syst. Biol.
Posada, 2001, The effect of branch length variation on the selection of models of molecular evolution, J. Mol. Evol., 52, 434, 10.1007/s002390010173
Posada, 2003, Using Modeltest and PAUP* to select a model of nucleotide substitution, Current Protocols in Bioinformatics, 6.5.1, 10.1002/0471250953.bi0605s00
Posada, 1998, Modeltest: Testing the model of DNA substitution, Bioinformatics, 14, 817, 10.1093/bioinformatics/14.9.817
Posada, 2001., Selecting models of nucleotide substitution: An application to human immunodeficiency virus 1 (HIV-1), Mol. Biol. Evol., 18, 897, 10.1093/oxfordjournals.molbev.a003890
Posada, 2001., Selecting the best-fit model of nucleotide substitution, Syst. Biol., 50, 580, 10.1080/10635150118469
Posada, 2001., Simple (wrong) models for complex trees: Empirical Bias, Mol. Biol. Evol., 18, 271, 10.1093/oxfordjournals.molbev.a003802
Pupko, 2002, Combining multiple data sets in a likelihood analysis: Which models are the best? Mol, Biol. Evol., 19, 2294, 10.1093/oxfordjournals.molbev.a004053
Raftery, 1996, Hypothesis testing and model selection, Markov chain Monte Carlo in practice, 163
Raftery, 1999, Bayes factors and BIC: Comment on “A critique of the Bayesian information criterion for model selection”, Sociol. Methods Res., 27, 411, 10.1177/0049124199027003005
Robinson, 1981, Comparison of phylogenetic trees, Math. Biosci., 53, 131, 10.1016/0025-5564(81)90043-2
Rzhetsky, 1995, Tests of applicability of several substitution models for DNA sequence data, Mol. Biol. Evol., 12, 131, 10.1093/oxfordjournals.molbev.a040182
Sakamoto, 1986, Akaike information criterion statistics
Sanderson, 2000, Parametric phylogenetics? Syst, Biol., 49, 817
Shafer, 1982, Lindley's paradox (with discussion), J. Am. Stat. Assoc., 77, 325, 10.1080/01621459.1982.10477809
Shibata, 1986, Consistency of model selection and parameter estimation, J. Appl. Prob., 23A, 127, 10.2307/3214348
Shimodaira, 1997, Assessing the error probability of the model selection test, Ann. Inst. Stat. Math., 49, 395, 10.1023/A:1003140609666
Shimodaira, 1998, An application of multiple comparison techniques to model selection, Ann. Inst. Stat. Math., 1, 1, 10.1023/A:1003483128844
Shimodaira, 2001, Multiple comparisons of log-likelihoods and combining nonnested models with applications to phylogenetic tree selection, Commun. Stat. Theory Methods, 30, 1751, 10.1081/STA-100105696
Shimodaira, 1999, Multiple comparisons of log-likelihoods with applications to phylogenetic inference, Mol. Biol. Evol., 16, 1114, 10.1093/oxfordjournals.molbev.a026201
Sober, 2002, Bayesianism—its scope and limits, Bayes's Theorem, 21
Sober, 2002, Instrumentalism, parsimony, and the Akaike framework, Phil. Sci., 69, S112, 10.1086/341839
Sober, 2002, Testing the hypothesis of common ancestry, J. Theoret. Biol., 218, 395, 10.1016/S0022-5193(02)93086-9
Sota, 2001, Incongruence of mitochondrial and nuclear gene trees in the Carabid beetles Ohomopterus, Syst. Biol., 50, 39, 10.1093/sysbio/50.1.39
Steel, 2000, Parsimony, likelihood, and the role of models in molecular phylogenetics, Mol. Biol. Evol., 17, 839, 10.1093/oxfordjournals.molbev.a026364
Stone, 1977, An asymptotic equivalence of choice of model by cross-validation and Akaike's criterion, J. R. Stat. Soc., 39, 44
Strimmer, 2001, Model selection using expected likelihood weights: A Bayes-frequentist compromise
Strimmer, 2001, Inferring confidence sets of possibly misspecified gene trees, Proc. R. Soc. Lond. B Biol. Sci., 269, 137, 10.1098/rspb.2001.1862
Suchard, 2003., Hierarchical phylogenetic models for analyzing multipartite sequence data, Syst. Biol., 52, 649, 10.1080/10635150390238879
Suchard, 2002, Oh brother, where art thou? A Bayes factor test for recombination with uncertain heritage, Syst. Biol., 51, 715, 10.1080/10635150290102384
Suchard, 2001, Bayesian selection of continuous-time Markov chain evolutionary models, Mol. Biol. Evol., 18, 1001, 10.1093/oxfordjournals.molbev.a003872
Suchard, 2003., Testing a molecular clock without an outgroup: Derivations of induced priors on branch-Length restrictions in a Bayesian framework, Syst. Biol., 52, 48, 10.1080/10635150390132713
Sugiura, 1978, Further analysis of the data by Akaike's information criterion and the finite corrections, Commun. Stat. Theory Methods A, 7, 13, 10.1080/03610927808827599
Sullivan, 1997, Are guinea pigs rodents? The importance of adequate models in molecular phylogenies, J. Mamm. Evol., 4, 77, 10.1023/A:1027314112438
Sullivan, 2001, Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated? Syst, Biol., 50, 723
Suzuki, 2002, Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics, Proc. Natl. Acad. Sci. USA, 99, 16138, 10.1073/pnas.212646199
Swofford, 1998, PAUP* Phylogenetic analysis using parsimony and other methods, version 4.0. beta
Swofford, 2000, PAUP* Phylogenetic analysis using parsimony (*and other methods). version 4
Tamura, 1994, Model selection in the estimation of the number of nucleotide substitutions, Mol. Biol. Evol., 11, 154
Tamura, 1993, Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees, Mol. Biol. Evol., 10, 512
Tanaka, 1999, Topology selection in unrooted molecular phylogenetic tree by minimum model-based complexity method, Pac. Symp. Biocomput., 4, 326
Tavaré, 1986, Some probabilistic and statistical problems in the analysis of DNA sequences, Some mathematical questions in biology—DNA sequence analysis, 57
Van Den Bussche, 1998, Base compositional bias and phylogenetic analyses: A test of the “flying DNA” hypothesis, Mol. Phylogenet. Evol., 10, 408, 10.1006/mpev.1998.0531
Verdinelli, 1995, Computing Bayes factors using a generalization of the Savage-Dickey density ratio, J. Am. Stat. Assoc., 90, 614, 10.1080/01621459.1995.10476554
Vuong, 1989, Likelihood ratio tests for model selection and non-nested hypotheses, Econometrica, 57, 307, 10.2307/1912557
Wasserman, 2000, Bayesian model selection and model averaging, J. Math. Psychol., 44, 92, 10.1006/jmps.1999.1278
Weakliem, 1999, A critique of the Bayesian information criterion for model selection, Sociol. Methods Res., 27, 359, 10.1177/0049124199027003002
Whelan, 1999, Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics, Mol. Biol. Evol., 16, 1292, 10.1093/oxfordjournals.molbev.a026219
Woodroofe, 1982, On the model selection and the arc sine laws, Ann. Stat., 10, 1182, 10.1214/aos/1176345983
Yang, 1996, Among-site rate variation and its impact on phylogenetic analysis, Trends Ecol. Evol., 11, 367, 10.1016/0169-5347(96)10041-0
Yang, 1996, Maximum-likelihood models for combined analyses of multiple sequence data, J. Mol. Evol., 42, 587, 10.1007/BF02352289
Yang, 1995, Maximum likelihood trees from DNA sequences: A peculiar statistical estimation problem, Syst. Biol., 44, 384, 10.1093/sysbio/44.3.384
Yang, 2000, Codon-substitution models for heterogeneous selection pressure at amino acid sites, Genetics, 155, 431, 10.1093/genetics/155.1.431
Yang, 1997, Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method, Mol. Biol. Evol., 14, 717, 10.1093/oxfordjournals.molbev.a025811
Zhang, 1999, Performance of likelihood ratio tests of evolutionary hypotheses under inadequate substitution models, Mol. Biol. Evol., 16, 868, 10.1093/oxfordjournals.molbev.a026171
Zharkikh, 1994, Estimation of evolutionary distances between nucleotide sequences, J. Mol. Evol., 39, 315, 10.1007/BF00160155