Bayesian random local clocks, or one rate to rule them all

BMC Biology - Tập 8 - Trang 1-12 - 2010
Alexei J Drummond1,2, Marc A Suchard3,4
1Allan Wilson Centre for Molecular Ecology and Evolution, University of Auckland, Auckland, New Zealand
2Computational Evolution Group, University of Auckland, Auckland, New Zealand
3Departments of Biomathematics and Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA
4Department of Biostatistics, UCLA School of Public Health, Los Angeles, USA

Tóm tắt

Relaxed molecular clock models allow divergence time dating and "relaxed phylogenetic" inference, in which a time tree is estimated in the face of unequal rates across lineages. We present a new method for relaxing the assumption of a strict molecular clock using Markov chain Monte Carlo to implement Bayesian modeling averaging over random local molecular clocks. The new method approaches the problem of rate variation among lineages by proposing a series of local molecular clocks, each extending over a subregion of the full phylogeny. Each branch in a phylogeny (subtending a clade) is a possible location for a change of rate from one local clock to a new one. Thus, including both the global molecular clock and the unconstrained model results, there are a total of 22n-2 possible rate models available for averaging with 1, 2, ..., 2n - 2 different rate categories. We propose an efficient method to sample this model space while simultaneously estimating the phylogeny. The new method conveniently allows a direct test of the strict molecular clock, in which one rate rules them all, against a large array of alternative local molecular clock models. We illustrate the method's utility on three example data sets involving mammal, primate and influenza evolution. Finally, we explore methods to visualize the complex posterior distribution that results from inference under such models. The examples suggest that large sequence datasets may only require a small number of local molecular clocks to reconcile their branch lengths with a time scale. All of the analyses described here are implemented in the open access software package BEAST 1.5.4 ( http://beast-mcmc.googlecode.com/ ).

Tài liệu tham khảo

Sarich VM, Wilson AC: Immunological time scale for hominid evolution. Science. 1967, 158: 1200-1203. 10.1126/science.158.3805.1200. Zuckerkandl E, Pauling L: Evolutionary Divergence and Convergence in Proteins. 1965, New York: Academic Press, 97-166. Thorne JL, Kishino H, Painter IS: Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol. 1998, 15: 1647-1657. Huelsenbeck JP, Larget B, Swofford D: A compound poisson process for relaxing the molecular clock. Genetics. 2000, 154: 1879-1892. Sanderson MJ: Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Mol Biol Evol. 2002, 19: 101-109. Drummond AJ, Ho SYW, Phillips MJ, Rambaut A: Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006, 4: e88-10.1371/journal.pbio.0040088. Rannala B, Yang Z: Inferring speciation times under an episodic molecular clock. Syst Biol. 2007, 56: 453-466. 10.1080/10635150701420643. Gillespie JH: Lineage effects and the index of dispersion of molecular evolution. Mol Biol Evol. 1989, 6: 636-647. Gillespie JH: The Causes of Molecular Evolution. 1991, Oxford: Oxford University Press Bromham L, Penny D: The modern molecular clock. Nat Rev Genet. 2003, 4: 216-224. 10.1038/nrg1020. Yoder AD, Yang Z: Estimation of primate speciation dates using local molecular clocks. Mol Biol Evol. 2000, 17: 1081-1090. Drummond AJ, Rambaut A: BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007, 7: 214-10.1186/1471-2148-7-214. Felsenstein J: Inferring Phylogenies. 2004, Sunderland, MA: Sinauer Associates, Inc Lange K: Applied Probability. 2003, New York: Springer, [Springer Texts in Statistics.] Hasegawa M, Kishino H, Yano T: Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985, 22: 160-174. 10.1007/BF02101694. Lanave C, Preparata G, Saccone C, Serio G: A new method for calculating evolutionary substitution rates. J Mol Evol. 1984, 20: 86-93. 10.1007/BF02101990. Yang Z: Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol. 1996, 11: 367-372. 10.1016/0169-5347(96)10041-0. Felsenstein J: Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981, 17: 368-376. 10.1007/BF01734359. Kishino H, Thorne JL, Bruno WJ: Performance of a divergence time estimation method under a probabilistic model of rate evolution. Mol Biol Evol. 2001, 18: 352-361. Thorne JL, Kishino H: Divergence time and evolutionary rate estimation with multilocus data. Syst Biol. 2002, 51: 689-702. 10.1080/10635150290102456. George EL, McCulloch RE: Variable selection via Gibbs sampling. J Am Stat Assoc. 1993, 88: 881-889. 10.2307/2290777. Kuo L, Mallick B: Variable selection for regression models. Sankhya B. 1998, 60: 65-81. Chipman H, George EI, McCulloch RE: The practical implementation of Bayesian model selection. Model Selection. 2001, Benchwood, OH: Institute of Mathematical Statistics, 38: 67-134. [IMS Lecture Notes - Monograph Series] Yang Z: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol. 1994, 39: 306-314. 10.1007/BF00160154. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E: Equations of state calculations by fast computing machines. J Chem Phys. 1953, 21: 1087-1092. 10.1063/1.1699114. Hastings WK: Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970, 57: 97-109. 10.1093/biomet/57.1.97. Drummond AJ, Nicholls GK, Rodrigo AG, Solomon W: Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics. 2002, 161: 1307-1320. Jeffreys H: Some tests of significance, treated by the theory of probability. Proc Camb Philos Soc. 1935, 31: 203-222. 10.1017/S030500410001330X. Jeffreys H: Theory of Probability. 1961, London: Oxford University Press, 1 Kass RE, Raftery AE: Bayes factors. J Am Stat Assoc. 1995, 90: 773-795. 10.2307/2291091. Suchard MA, Weiss RE, Sinsheimer JS: Models for estimating Bayes factors with applications in phylogeny and tests of monophyly. Biometrics. 2005, 61: 665-673. 10.1111/j.1541-0420.2005.00352.x. Douzery EJP, Delsuc P, Stanhope MJ, Huchon D: Local molecular clocks in three nuclear genes: divergence times for rodents and other mammals and incompatibility among fossil calibrations. J Mol Evol. 2003, 57: S201-S213. 10.1007/s00239-003-0028-x. Brown WM, Prager EM, Wang A, Wilson AC: Mitochondrial DNA sequences of primates, tempo and mode of evolution. J Mol Evol. 1982, 18: 225-239. 10.1007/BF01734101. Hayasaka K, Gojobori KT, Horai S: Molecular phylogeny and evolution of primate mitochondrial DNA. Mol Biol Evol. 1988, 5: 626-644. Suchard MA, Weiss RE, Sinsheimer JS: Bayesian selection of continuous-time Markov chain evolutionary models. Mol Biol Evol. 2001, 18: 1001-1013. Suchard MA, Weiss RE, Sinsheimer JS: Testing a molecular clock without an outgroup: derivations of induced priors on branch length restrictions in a Bayesian framework. Syst Biol. 2003, 52: 48-54. 10.1080/10635150390132713. Larget B, Simon DL: Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol Biol Evol. 1999, 16: 750-759. Huelsenbeck JP, Ronquist F: Mrbayes: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17: 754-755. 10.1093/bioinformatics/17.8.754. Yang Z: PAML 4: a program package for phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088. Drummond AJ, Suchard MA: Fully Bayesian tests of neutrality using genealogical summary statistics. BMC Genet. 2008, 9: 68-10.1186/1471-2156-9-68. Drummond A, Forsberg R, Rodrigo AG: The inference of stepwise changes in substitution rates using serial sequence samples. Mol Biol Evol. 2001, 18: 1365-1371. Sanderson MJ: A nonparametric approach to estimating divergence times in the absence of rate consistency. Mol Biol Evol. 1997, 14: 1218-1231. Foster PG: Modeling compositional heterogeneity. Syst Biol. 2004, 53: 485-495. 10.1080/10635150490445779. Gray RD, Drummond AJ, Greenhill SJ: Language phylogenies reveal expansion pulses and pauses in pacific settlement. Science. 2009, 323: 479-483. 10.1126/science.1166858. Lartillot N, Philippe H: Computing Bayes factors using thermodynamic integration. Syst Biol. 2006, 55: 195-207. 10.1080/10635150500433722. Beerli P, Palczewski M: Unified framework to evaluate panmixia and migration direction among multiple sampling locations. Genetics. 2010, 185: 313-326. 10.1534/genetics.109.112532. Fan Y, Wu R, Chen M-H, Kuo L, Lewis PO: Choosing among partition models in Bayesian phylogenetics. Mol Biol Evol. 2010. Liu JS: The collasped Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J Am Stat Assoc. 1994, 89: 958-966. 10.2307/2290921. Redelings BD, Suchard MA: Joint Bayesian estimation of alignment and phylogeny. Syst Biol. 2005, 54: 401-418. 10.1080/10635150590947041.