Độ chính xác của các phương pháp tái hiện hệ thống phân loại kết hợp các tập dữ liệu gen chồng chéo

Anne Kupczok1, Heiko A. Schmidt1, Arndt von Haeseler1
1Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna, University of Veterinary Medicine Vienna, Dr. Bohr-Gasse 9, A-1030, Vienna, Austria

Tóm tắt

Tóm tắt Giới thiệu

Sự có sẵn của nhiều căn chỉnh gen với các tập hợp taxon chồng chéo đặt ra câu hỏi về chiến lược nào là tốt nhất để suy luận về hệ thống phân loại loài từ thông tin gen đa dạng. Các phương pháp và chương trình phong phú sử dụng căn chỉnh gen theo nhiều cách khác nhau để tái cấu trúc cây loài. Đặc biệt, các phương pháp khác nhau kết hợp dữ liệu gốc tại các thời điểm khác nhau trên con đường từ các chuỗi cơ bản đến cây cuối cùng. Do đó, chúng được phân loại thành các phương pháp siêu căn chỉnh, siêu cây và cấp trung. Ở đây, chúng tôi trình bày một nghiên cứu mô phỏng để so sánh các phương pháp khác nhau từ mỗi một trong ba tiếp cận này.

Kết quả

Chúng tôi quan sát thấy rằng các phương pháp siêu căn chỉnh thường vượt trội hơn các phương pháp khác trên một loạt các tham số bao gồm dữ liệu thưa thớt và các tham số tiến hóa đặc trưng cho gen. Tuy nhiên, trong trường hợp có sự không nhất quán cao giữa các cây gen, các phương pháp kết hợp khác cho thấy hiệu suất tốt hơn so với phương pháp siêu căn chỉnh. Đáng ngạc nhiên, một số phương pháp siêu cây và cấp trung cho thấy, trung bình, kết quả tồi tệ hơn so với một hệ thống phân loại gen đơn lẻ với thông tin taxon hoàn chỉnh.

Kết luận

Đối với một số phương pháp, việc sử dụng cây gen đã tái cấu trúc như một ước lượng cho cây loài là vượt trội hơn so với việc kết hợp thông tin không đầy đủ. Siêu căn chỉnh thường hoạt động tốt nhất vì nó ít dễ bị sai số ngẫu nhiên. Các phương pháp siêu cây có thể vượt trội hơn siêu căn chỉnh trong trường hợp có xung đột giữa các cây gen.

Từ khóa


Tài liệu tham khảo

Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P: Toward Automatic Reconstruction of a Highly Resolved Tree of Life. Science. 2006, 311: 1283-1287. 10.1126/science.1123061

Driskell AC, Ané C, Burleigh JG, McMahon MM, O'Meara BC, Sanderson MJ: Prospects for Building the Tree of Life from Large Sequence Databases. Science. 2004, 306: 1172-1174. 10.1126/science.1102036

McMahon MM, Sanderson MJ: Phylogenetic Supermatrix Analysis of GenBank Sequences from 2228 Papilionoid Legumes. Syst Biol. 2006, 818-836. 55,

Schmidt HA: Phylogenetic Trees from Large Datasets. PhD thesis. 2003, Universität Düsseldorf,

Philippe H, Telford MJ: Large-scale sequencing and the new animal phylogeny. Trends Ecol Evol. 2006, 614-620. 21,

Chen F, Mackey AJ, Vermunt JK, Roos DS: Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes. PLoS ONE. 2007, 2: e383- 10.1371/journal.pone.0000383

Dutilh BE, van Noort V, van der Heijden RTJM, Boekhout T, Snel B, Huynen MA: Assessment of phylogenomic and orthology approaches for phylogenetic inference. Bioinformatics. 2007, 23: 815-824. 10.1093/bioinformatics/btm015

Edgar RC, Batzoglou S: Multiple sequence alignment. Curr Opin Struct Biol. 2006, 368-373. 16,

Landan G, Graur D: Heads or Tails: A Simple Reliability Check for Multiple Sequence Alignments. Mol Biol Evol. 2007, 1380-1383. 24,

Ebersberger I, von Haeseler A, Schmidt HA: Phylogenetic Reconstruction. Bioinformatics - From Genomes to Therapies. Edited by: Lengauer T. 2006, 1: 83-128. Weinheim, Germany: Wiley-VCH Verlag, 2,

Kluge AG: A concern for evidence and a phylogenetic hypothesis of relationships among Epicrates (Boidae, Serpentes). Syst Zool. 1989, 7-25. 10.2307/2992432. 38,

de Queiroz A, Gatesy J: The supermatrix approach to systematics. Trends Ecol Evol. 2007, 34-41. 22,

Philippe H, Snell EA, Bapteste E, Lopez P, Holland PWH, Casane D: Phylogenomics of Eukaryotes: Impact of Missing Data on Large Alignments. Mol Biol Evol. 2004, 1740-1752. 21,

Lapointe FJ, Cucumel G: The Average Consensus Procedure: Combining of Weighted Trees Containing Identical or Overlapping Sets of Taxa. Syst Biol. 1997, 306-312. 10.1093/sysbio/46.2.306. 46,

Criscuolo A, Berry V, Douzery EJP, Gascuel O: SDM: A Fast Distance-Based Approach for (Super)Tree Building in Phylogenomics. Syst Biol. 2006, 740-755. 55,

Bininda-Emonds ORP, : Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. 2004, Dordrecht: Kluwer Academic,

Gordon AD: Consensus Supertrees: The Synthesis of Rooted Trees Containing Overlapping Sets of Labelled Leaves. J Classif. 1986, 335-348. 10.1007/BF01894195. 3,

Baum BR: Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon. 1992, 41: 3-10. 10.2307/1222480

Ragan MA: Phylogenetic inference based on matrix representation of trees. Mol Phylogenet Evol. 1992, 53-58. 1,

Bininda-Emonds ORP, Cardillo M, Jones KE, MacPhee RDE, Beck RMD, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A: The delayed rise of present-day mammals. Nature. 2007, 446: 507-512. 10.1038/nature05634

Daubin V, Gouy M, Perrière G: A Phylogenomic Approach to Bacterial Phylogeny: Evidence of a Core of Genes Sharing a Common History. Genome Res. 2002, 1080-1090. 12,

Barrett M, Donoghue MJ, Sober E: Against Consensus. Syst Zool. 1991, 486-493. 10.2307/2992242. 40,

Bull JJ, Huelsenbeck JP, Cunningham CW, Swo ord DL, Waddell PJ: Partitioning and Combining Data in Phylogenetic Analysis. Syst Biol. 1993, 384-387. 42,

de Queiroz A, Donoghue MJ, Kim J: Separate Versus Combined Analysis of Phylogenetic Evidence. Annu Rev Ecol Syst. 1995, 657-681. 10.1146/annurev.es.26.110195.003301. 26,

Page RDM: On consensus, confidence, and "total evidence". Cladistics. 1996, 12: 83-92.

Page RDM, Holmes EC: Molecular Evolution: A Phylogenetic Approach. 1998, Oxford: Blackwell Science,

Gadagkar SR, Rosenberg MS, Kumar S: Inferring species phylogenies from multiple genes: Concatenated sequence tree versus consensus gene tree. J Exp Zool B Mol Dev Evol. 2005, 64-74. 304B,

Salamin N, Hodkinson TR, Savolainen V: Building Supertrees: An Empirical Assessment Using the Grass Family (Poaceae). Syst Biol. 2002, 136-150. 51,

Gatesy J, Baker RH, Hayashi C: Inconsistencies in Arguments for the Supertree Approach: Supermatrices versus Supertrees of Crocodylia. Syst Biol. 2004, 342-355. 53,

Fitzpatrick D, Logue M, Stajich J, Butler G: A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis. BMC Evol Biol. 2006, 6: 99- 10.1186/1471-2148-6-99

Baker WJ, Savolainen V, Asmussen-Lange CB, Chase MW, Drans eld J, Forest F, Harley MM, Uhl NW, Wilkinson M: Complete Generic-Level Phylogenetic Analyses of Palms (Arecaceae) with Comparisons of Supertree and Supermatrix Approaches. Syst Biol. 2009, 240-256. 58,

Bininda-Emonds ORP, Sanderson MJ: Assessment of the Accuracy of Matrix Representation with Parsimony Analysis Supertree Construction. Syst Biol. 2001, 565-579. 50,

Bininda-Emonds ORP: Novel Versus Unsupported Clades: Assessing the Qualitative Support for Clades in MRP Supertrees. Syst Biol. 2003, 839-848. 52,

Eulenstein O, Chen D, Burleigh JG, Fernández-Baca D, Sanderson MJ: Performance of Flip Supertree Construction with a Heuristic Algorithm. Syst Biol. 2004, 299-308. 53,

Levasseur C, Lapointe FJ: Total Evidence, Average Consensus and Matrix Representation with Parsimony: What a Difference Distances Make. Evol Bioinform Online. 2006, 2: 1-5.

Wilkinson M, Pisani D, Cotton JA, Corfe I: Measuring Support and Finding Unsupported Relationships in Supertrees. Syst Biol. 2005, 823-831. 54,

Vinh LS, von Haeseler A: IQPNNI: Moving fast through tree space and stopping in time. Mol Biol Evol. 2004, 1565-1571. 21,

Hasegawa M, Kishino H, Yano TA: Dating of the Human-Ape Splitting by a Molecular Clock of Mitochondrial DNA. J Mol Evol. 1985, 160-174. 22,

Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992, 275-282. 8,

Swo ord DL: PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. 2002, Sinauer Associates, Sunderland, Massachusetts,

Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. 2005, [Distributed by the author], Department of Genome Sciences, University of Washington, Seattle,

Baum BR, Ragan MA: The MRP method. Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Edited by: Bininda-Emonds ORP. 2004, 17-34. Dordrecht, The Netherlands: Kluwer Academic,

Purvis A: A Composite Estimate of Primate Phylogeny. Philos Trans R Soc Lond Ser B. 1995, 348: 405-421. 10.1098/rstb.1995.0078

Sanderson MJ: r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics. 2003, 19: 301-302. 10.1093/bioinformatics/19.2.301

Fitch WM: Toward defining the course of evolution: Minimum change for a specific tree topology. Syst Zool. 1971, 406-416. 10.2307/2412116. 20,

Camin JH, Sokal RR: A Method for Deducing Branching Sequences in Phylogeny. Evolution. 1965, 19: 311-326. 10.2307/2406441

Chen D, Diao L, Eulenstein O, Fernandez-Baca D, Sanderson MJ: Flipping: A Supertree Construction Method. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Edited by: Janowitz MF, Lapointe FJ, McMorris FR, Mirkin B, Roberts FS. 2003, 61: 135-160. Providence, Rhode Island: American Mathematical Society,

Burleigh JG, Eulenstein O, Fernandez-Baca D, Sanderson MJ: MRF supertrees. Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Edited by: Bininda-Emonds ORP. 2004, 65-86. Dordrecht, The Netherlands: Kluwer Academic,

Chen D, Eulenstein O, Fernandez-Baca D, Sanderson M: Minimum-Flip Supertrees: Complexity and Algorithms. IEEE/ACM Trans Comput Biol Bioinform. 2006, 165-173. 3,

Rodrigo AG: A comment on Baum's method for combining phylogenetic trees. Taxon. 1993, 42: 631-636. 10.2307/1222540

Ross HA, Rodrigo AG: An assessment of matrix representation with compatibility in supertree construction. Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Edited by: Bininda-Emonds ORP. 2004, 35-63. Dordrecht, The Netherlands: Kluwer Academic,

Creevey CJ, McInerney JO: Clann: investigating phylogenetic information through supertree analyses. Bioinformatics. 2005, 21: 390-392. 10.1093/bioinformatics/bti020

Aho AV, Sagiv Y, Szymanski TG, Ullman JD: Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions. SIAM J Comput. 1981, 405-421. 10.1137/0210030. 10,

Semple C, Steel M: A supertree method for rooted trees. Discr Appl Math. 2000, 147-158. 10.1016/S0166-218X(00)00202-X. 105,

Page RDM: Modified Mincut Supertrees. Proceedings of the 2nd Workshop on Algorithms in Bioinformatics (WABI 2002), Volume 2452 of Lecture Notes in Computer Science. 2002, 537-551. New York: Springer,

Chen D, Eulenstein O, Fernández-Baca D: Rainbow: a toolbox for phylogenetic supertree construction and analysis. Bioinformatics. 2004, 20: 2872-2873. 10.1093/bioinformatics/bth313

Snir S, Rao S: Using Max Cut to Enhance Rooted Trees Consistency. IEEE/ACM Trans Comput Biol Bioinform. 2006, 323-333. 3,

Piaggio-Talice R, Burleigh G, Eulenstein O: Quartet Supertrees. Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Edited by: Bininda-Emonds ORP. 2004, 173-191. Dordrecht: Kluwer Academic,

Willson SJ: Building Phylogenetic Trees from Quartets by Using Local Inconsistency Measures. Mol Biol Evol. 1999, 685-693. 16,

Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: Maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002, 18: 502-504. 10.1093/bioinformatics/18.3.502

Strimmer K, von Haeseler A: Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies. Mol Biol Evol. 1996, 964-969. 13,

Fitch WM, Margoliash E: Construction of Phylogenetic Trees. Science. 1967, 155: 279-284. 10.1126/science.155.3760.279

Lapointe FJ, Levasseur C: Everything you always wanted to know about average consensus and more. Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Edited by: Bininda-Emonds ORP. 2004, 87-106. Dordrecht, The Netherlands: Kluwer Academic,

Salamin N, Hodkinson TR, Savolainen Coates V: Towards Building the Tree of Life: A Simulation Study for All Angiosperm Genera. Syst Biol. 2005, 183-196. 54,

Rambaut A, Grassly NC: Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci. 1997, 235-238. 13,

Robinson DF, Foulds LR: Comparison of phylogenetic trees. Math Biosci. 1981, 131-147. 10.1016/0025-5564(81)90043-2. 53,

Ewing GB, Ebersberger I, Schmidt HA, von Haeseler A: Rooted triple consensus and anomalous gene trees. BMC Evol Biol. 2008, 8: 118- 10.1186/1471-2148-8-118

Ebersberger I, Galgoczy P, Taudien S, Taenzer S, Platzer M, von Haeseler A: Mapping Human Genetic Ancestry. Mol Biol Evol. 2007, 2266-2276. 24,

Golobo PA: Minority rule supertrees? MRP, Compatibility, and Minimum Flip may display the least frequent groups. Cladistics. 2005, 21: 282-294. 10.1111/j.1096-0031.2005.00064.x

Lin HT, Burleigh JG, Eulenstein O: Triplet supertree heuristics for the tree of life. BMC Bioinformatics. 2009, 10: S8-

Gatesy J, Springer MS: A Critique of Matrix Representation with Parsimony Supertrees. Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Edited by: Bininda-Emonds ORP. 2004, 369-388. Dordrecht, The Netherlands: Kluwer Academic,

Edwards SV: Is a new and general theory of molecular systematics emerging?. Evolution. 2009, 63: 1-19. 10.1111/j.1558-5646.2008.00549.x

Liu L, Yu L, Kubatko L, Pearl DK, Edwards SV: Coalescent methods for estimating phylogenetic trees. Mol Phylogenet Evol. 2009, 320-328. 53,

Salter Kubatko L, Degnan JH: Inconsistency of Phylogenetic Estimates from Concatenated Data under Coalescence. Syst Biol. 2007, 17-24. 56,

Carstens BC, Knowles LL: Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. Syst Biol. 2007, 400-411. 56,

Swenson MS, Barbancon F, Warnow T, Linder CR: A simulation study comparing supertree and combined analysis methods using SMIDGen. Algorithms Mol Biol. 2010, 5: 8- 10.1186/1748-7188-5-8

Huelsenbeck JP, Bull JJ, Cunningham CW: Combining data in phylogenetic analysis. Trends Ecol Evol. 1996, 152-158. 10.1016/0169-5347(96)10006-9. 11,

Planet PJ: Tree disagreement: Measuring and testing incongruence in phylogenies. J Biomed Inform. 2006, 86-102. 39,

Leigh JW, Susko E, Baumgartner M, Roger AJ: Testing Congruence in Phylogenomic Analysis. 2008, 57: 104-115.

Mossel E, Vigoda E: Phylogenetic MCMC Algorithms Are Misleading on Mixtures of Trees. Science. 2005, 309: 2207-2209. 10.1126/science.1115493

Kolaczkowski B, Thornton JW: Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature. 2004, 432: 980-984. 10.1038/nature02917

Liu L: BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics. 2008, 24: 2542-2543. 10.1093/bioinformatics/btn484

Margush T, McMorris FR: Consensus n-trees. Bull Math Biol. 1981, 239-244. 43,