Alignment-free phylogeny of whole genomes using underlying subwords

Matteo Comin1, Davide Verzotto1
1Department of Information Engineering, University of Padova, Padova, Italy

Tóm tắt

Từ khóa


Tài liệu tham khảo

Wildman D, Uddin M, Opazo JC, Liu G, Lefort V, Guindon S, Gascuel O, Grossman LI, Romero R, Goodman M: Genomics, biogeography, and the diversification of placental mammals. Proc Natl Acad Sci USA. 2007, 104: 14395-14400. 10.1073/pnas.0704342104

Huynen M, Bork P: Measuring genome evolution. Proc Natl Acad Sci USA. 1998, 95: 5849-5856. 10.1073/pnas.95.11.5849

Chor B, Horn D, Goldman N, Levy Y, Massingham T: Genomic DNA k-mer spectra: models and modalities. Genome Biol. 2009, 10 (10): R108. 10.1186/gb-2009-10-10-r108

Sims GE, Jun SRR, Wu GA, Kim SH: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc Nat Acad Sci USA. 2009, 106 (8): 2677-2682. 10.1073/pnas.0813249106

Venter C: The sequence of the human genome. Science. 2001, 291: 1305-1350.

Delsuc F, Brinkmann H, Philippe H: Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 2005, 6: 361-375.

Ulitsky I, Burstein D, Tuller T, Chor B: The average common substring approach to phylogenomic reconstruction. J Comput Biol. 2006, 13 (2): 336-350. 10.1089/cmb.2006.13.336

Sims GE, Jun SR, Wu GA, Kim SH: Whole-genome phylogeny of mammals: Evolutionary information in genic and nongenic regions. Proc Nat Acad Sci USA. 2009, 106 (40): 17077-17082. 10.1073/pnas.0909377106

Lin J: Divergence measures based on the Shannon entropy. IEEE T Inform Theory. 1991, 37: 145-151. 10.1109/18.61115

Apostolico A, Comin M, Parida L: Mining, compressing and classifying with extensible motifs. Algorithms Mol Biol. 2006, 1: 4. 10.1186/1748-7188-1-4

Apostolico A, Comin M, Parida L: Motifs in Ziv-Lempel-Welch Clef. Proceedings of IEEE DCC Data Compression Conference. IEEE Computer Society, 2004, 72-81.

Giancarlo R, Scaturro D, Utro F: Textual data compression in computational biology: a synopsis. Bioinformatics. 2009, 25 (13): 1575-1586. 10.1093/bioinformatics/btp117

Iliopoulos C, Mchugh J, Peterlongo P, Pisanti N, Rytter W, Sagot MF: A first approach to finding common motifs with gaps. Int J Foundations Comput Sci. 2005, 16 (6): 1145-1154. 10.1142/S0129054105003716

Apostolico A, Comin M, Parida L: Conservative extraction of over-represented extensible motifs. Bioinformatics. 2005, 21 (Suppl 1): i9-i18. 10.1093/bioinformatics/bti1051

Apostolico A, Comin M, Parida L: VARUN: discovering extensible motifs under saturation constraints. IEEE/ACM Trans Comput Biol Bioinformatics. 2010, 7 (4): 752-762.

Kong SG, Fan WL, Chen HD, Hsu ZT, Zhou N, Zheng B, Lee HC: Inverse symmetry in complete genomes and whole-genome inverse duplication. PLoS ONE. 2009, 4 (11): e7553. 10.1371/journal.pone.0007553

Gusfield D: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. New York, USA: Cambridge University Press, 1997.

Apostolico A: The myriad virtues of subword trees. Combinatorial Algorithms on Words, A. Apostolico, Z. Galil (Eds.). 1985, 12: 85-96.

Apostolico A: Maximal words in sequence comparisons based on subword composition. Algorithms and Applications, Volume 6060 of Lecture Notes in Computer Science. Edited by: Elomaa T, Mannila H, Orponen P. Berlin: Springer-Verlag, 2010, 34-44.

Apostolico A, Parida L: Incremental paradigms of motif discovery. J Comput Biol. 2004, 11: 15-25. 10.1089/106652704773416867

Comin M, Verzotto D: Classification of protein sequences by means of irredundant patterns. BMC Bioinformatics. 2010, 11 (Suppl. 1): S16.

Comin M, Verzotto D: The Irredundant Class method for remote homology detection of protein sequences. J Comput Biol. 2011, 18 (12): 1819-1829. [ http://dx.doi.org/10.1089/cmb.2010.0171 ].] 10.1089/cmb.2010.0171

Apostolico A, Comin M, Parida L: Bridging lossy and lossless compression by motif pattern discovery. Lect Notes Comput Sci. 2006, 4123: 793-813. 10.1007/11889342_51

Comin M, Parida L: Detection of subtle variations as consensus motifs. Theor Comput Sci. 2008, 395 (2-3): 158-170. 10.1016/j.tcs.2008.01.017

Ukkonen E: Maximal and minimal representations of gapped and non-gapped motifs of a string. Theor Comput Sci. 2009, 410 (43): 4341-4349. 10.1016/j.tcs.2009.07.015

Comin M, Verzotto D: Comparing, ranking and filtering motifs with character classes: application to biological sequences analysis. Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data. Edited by: Elloumi M, Zomaya AY. 2013, chapter 13-chapter 13. Wiley.

Cormen TH, Leiserson CE, Rivest RL: Introduction To Algorithms, chap. 9. MIT Press, 1990, 178–180.

Kopelowitz T, Lewenstein M: Dynamic weighted ancestors. Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2007). Society for Industrial and Applied Mathematics Philadelphia SIAM, 2007, 565-574.

Smith GJD, Vijaykrishna D, Bahl J, Lycett SJ, Worobey M, Pybus OG, Ma SK, Cheung CL, Raghwani J, Bhatt S, Peiris JSM, Guan Y, Rambaut A: Origins and evolutionary genomics of the 2009 swine-origin H1N1 Influenza A epidemic. Nature. Nature Publishing Group. 2009, 459 (7250): 1122-1125.

Shiino T, Okabe N, Yasui Y, Sunagawa T, Ujike M, Obuchi M, Kishida N, Xu H, Takashita E, Anraku A, Ito R, Doi T, Ejima M, Sugawara H, Horikawa H, Yamazaki S, Kato Y, Oguchi A, Fujita N, Odagiri T, Tashiro M, Watanabe H: Molecular Evolutionary Analysis of the Influenza A(H1N1)pdm, May–September, 2009: Temporal and Spatial Spreading Profile of the Viruses in Japan. PLoS ONE. 2010, 5 (6): e11057. 10.1371/journal.pone.0011057

Thompson J, Higgins D, Gibson T: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673

Felsenstein J: PHYLIP – Phylogeny Inference Package (Version 3.2). Cladistics. 1989, 5: 164-166.

Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM: The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 2009, 37: D141-D145. 10.1093/nar/gkn879

Martinsen ES, Perkins SL, Schall JJ: A three-genome phylogeny of malaria parasites (Plasmodium and closely related genera): Evolution of life-history traits and host switches. Mol Phylogenet Evol. 2008, 47: 261-273. 10.1016/j.ympev.2007.11.012

Critchlow D, Pearl D, Qian C: The triples distance for rooted bifurcating phylogenetic trees. Syst Biol. 1996, 45 (3): 323-334. 10.1093/sysbio/45.3.323