Invariant based quartet puzzling

Springer Science and Business Media LLC - Tập 7 - Trang 1-9 - 2012
Joseph P Rusinko1, Brian Hipp2
1Department of Mathematics, Winthrop University, Rock Hill, USA
2Department of Mathematics, Winthrop University, 142 Bancroft Hall, Rock Hill, SC 29733, USA, Winthrop University, Rock Hill, USA

Tóm tắt

First proposed by Cavender and Felsenstein, and Lake, invariant based algorithms for phylogenetic reconstruction were widely dismissed by practicing biologists because invariants were perceived to have limited accuracy in constructing trees based on DNA sequences of reasonable length. Recent developments by algebraic geometers have led to the construction of lists of invariants which have been demonstrated to be more accurate on small sequences, but were limited in that they could only be used for trees with small numbers of taxa. We have developed and tested an invariant based quartet puzzling algorithm which is accurate and efficient for biologically reasonable data sets. We found that our algorithm outperforms Maximum Likelihood based quartet puzzling on data sets simulated with low to medium evolutionary rates. For faster rates of evolution, invariant based quartet puzzling is reasonable but less effective than maximum likelihood based puzzling. This is a proof of concept algorithm which is not intended to replace existing reconstruction algorithms. Rather, the conclusion is that when seeking solutions to a new wave of phylogenetic problems (super tree algorithms, gene vs. species tree, mixture models), invariant based methods should be considered. This article demonstrates that invariants are a practical, reasonable and flexible source for reconstruction techniques.

Tài liệu tham khảo

Coughlan S, Connell J, Cohen B, Jin L, Hall W: Suboptimal measles-mumps-rubella vaccination coverage facilitates an imported measles outbreak in ireland. Clin Infect Dis. 2002, 35: 84-86. 10.1086/340708 Vazquez D, Gittleman J: Biodiversity conservation: does phylogeny matter?. Curr Biol. 1998, 8 (11): 379-381. 10.1016/S0960-9822(98)70242-8 Maddison D, Schulz K: The tree of life web project. 2007, [http://tolweb.org]. Cavender J, Felsenstein J: Invariants of phylogenies in a simple case with discrete states. J Classificiation. 1987, 4: 57-71. 10.1007/BF01890075 Lake J: A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Bio Evol. 1987, 4: 167-191. Evans S, Speed T: Invariants of some probability models used in phylogenetic infrences. Ann Stat. 1993, 21: 355-377. 10.1214/aos/1176349030 Huelsenbeck J: Performance of phylogenetic methods in simulations. Syst Biol. 1995, 44: 17-48. Jin L, Nei M: Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol Biol Evol. 1990, 7: 82-102. Sturmfels S, Sullivant S: Toric Ideals of phylogenetic invariants. J Comput Biol. 2005, 12: 204-228. 10.1089/cmb.2005.12.204 Eriksson N: Using invariants for phylogenetic tree construction. Emerging applications of algebraic geometry, Volume 149 of IMA Vol. Math. Appl. 2009, 89-108.http://dx.doi.org/10.1007/978-0-387-09686-5_4], Springer, New York. Casanellas M, Fernandez-Sanchez F: Performance of a new invariants method on homogeneuous and nonhomogeneous quartet trees. Mol Biol Evol. 2007, 24: 288-293. Allman E, Rhodes J: Phylogenetic invariants for the general Markov model of sequence mutation. Math Bioscil. 1990, 186 (2): 113-144. Casanellas M, Garcia LD, Sullivant S: Catalog of small trees. Algebraic Statistics for Computational Biology. 2005, 291-304.http://dx.doi.org/10.1017/CBO9780511610684.019], Cambridge Univ. Press, New York. Allman PSRJE, Sullivant S: Identifiability of two-tree mixtures for group-based models. IEEE/ACM Trans Comput Biol Bioinform. 2011, 8 (3): 710-722. Donten-Bury M, Michalek M: Phylogenetic invariants for group-based models. 2011, [http://front.math.ucdavis.edu/1011.3236]. Strimmer K, von Haeseler A: Quartet puzzling: a quartet maximum-likelihood method of reconstructing tree topologies. Mol Biol Evol. 1996, 13 (7): 964-969. 10.1093/oxfordjournals.molbev.a025664 Schmidt H, Strimmer K, Vingron M, von Haesler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartet and parallel computing. Bioinformatics. 2002, 18: 502-504. 10.1093/bioinformatics/18.3.502 Ranwez V, Gascuel O: Quartet-based phylogenetic inference: improvements and limits. Mol Biol Evol. 2001, 18: 1103-1116. 10.1093/oxfordjournals.molbev.a003881 Snir S, Warnow T, Rao S: Short quartet puzzling: a new quartet-based phylogeny reconstruction algorithm. J Comput Biol. 2008, 15: 91-103. 10.1089/cmb.2007.0103 Berry V, Jiang T, Kearney P: Quartet cleaning: improved algorithms and simulations. Eur Symp Algorithms. 1999, 313-324. Casanellas M, Fernandez-Sanchez J: Relevant phylogenetic invariants of evolutionary models. J Mathematiques Pures et Appliqees. 2011, 96: 207-229. 10.1016/j.matpur.2010.11.002 Tamura K, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: Molecular Evolution Genetics Analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28 (10): 2731-2739. 10.1093/molbev/msr121 Sumner J, Scharleston M: Markov invariants, plethysms, and phylogenetics. J Theor Biol. 2009, 258: 302-310. 10.1016/j.jtbi.2009.01.021 Sumner J, Jarvis P: Markov invariants and the isotropy subgroup of a quartet tree. J Theor Biol. 2008, 253: 601-615. 10.1016/j.jtbi.2008.04.001 Felsenstein J: PHYLIP (Phylogeney Inference Package) version 3.6. Distributed by the author Department of Genome Sciences, University of Washington, Seattle 2005.[http://evolution.genetics.washington.edu/phylip/faq.html#citation], Distributed by the author Department of Genome Sciences, University of Washington, Seattle 2005. Snir S, Warnow T, Rao S: Short quartet puzzling: a new quartet based phylogeny reconstruction algorithm. J Comp Biol. 2008, 15: 91-103. 10.1089/cmb.2007.0103 Rambaut A, Grassly N: Seq-gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Math Biosci l. 1997, 53: 235-238. Robinson D, Foulds LR: Comparison of phylogenetic trees. Math Biosci l. 1981, 53: 131-147. 10.1016/0025-5564(81)90043-2 Steel M, Penny D: Distributions of tree comparison metrics-some new results. Syst Biol. 2011, 8 (3): 710-722. Snir S, Rao S: Quartet MaxCut: a fast algorithm for amalgamating quartet trees. Mol Biol evol. 2012, 62: 1-8.