FAMSA: Fast and accurate multiple sequence alignment of huge protein families
Tóm tắt
Từ khóa
Tài liệu tham khảo
Chatzou, M. et al. Multiple sequence alignment modeling: methods and applications. Brief. Bioinform. 10.1093/bib/bbv099 (2015).
Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994).
Do, Ch. B., Mahabhashyam, M. S. P., Brudno, M. & Batzoglou, S. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15(2), 330–340 (2005).
Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004).
Notredame, C., Higgins, D. G. & Heringa, J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000).
Lassmann, T. & Sonnhammer, E. L. L. Kalign—an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6, 298 (2005).
Lassmann, T., Frings, O. & Sonnhammer, E. L. L. Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Res. 37, 858–865 (2009).
Muth, R. & Manber, U. Approximate multiple string search in Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching, 75–86 (1996).
Deorowicz, S., Debudaj-Grabysz, A. & Gudyś, A. Kalign-LCS—A More Accurate and Faster Variant of Kalign2 Algorithm for the Multiple Sequence Alignment Problem in Man-Machine Interactions 3, AISC 242 (eds Gruca, A. et al.) 495–502 (Springer-Verlag, 2014).
Katoh, K. & Toh, H. PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23, 372–374 (2007).
Katoh, K. & Toh, H. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 9(4), 286–298 (2008).
Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).
Blackshields, G., Sievers, F., Shi, W., Wilm, A. & Higgins, D. G. Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithm. Mol. Biol. 5(1), 21 (2010).
Nguyen, Np. D., Mirarab, S., Kumar, K. & Warnow, T. Ultra-large alignments using phylogeny-aware profiles. Genome Biol. 16, 124 (2015).
Intel Corporation, Intel 64 and IA-32 Architectures Software Developer’s Manual. Combined Volumes: 1, 2A, 2B, 2C, 3A, 3B, 3C and 3Dhttp://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. (Accessed: 30th June 2016).
Sibson, R. SLINK: An optimally efficient algorithm for the single-link cluster method. Comput. J. 16, 30–34 (1973).
Yamada, K. & Tomii, K. Revisiting amino acid substitution matrices for identifying distantly related proteins. Bioinformatics 30, 317–325 (2014).
Gudyś, A. & Deorowicz, S. QuickProbs 2: towards rapid construction of high-quality alignments of large protein families. Preprint available at: http://arxiv.org/abs/1512.07437 (2015).
Mizuguchi, K., Deane, C. M., Blundell, T. L. & Overington, J. P. HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci. 7(11), 2469–2471 (1998).
Plyusnin, I. & Holm, L. Comprehensive comparison of graph based multiple protein sequence alignment strategies. BMC Bioinformatics 13, 64 (2012).
Hyyrö, H. Bit-parallel LCS-length computation revisited in Proceedings of the 15th Australian Workshop on Combinatorial Algorithms, 16–27 (2004).
Khronos Group, The open standard for parallel programming of heterogeneous systems. https://www.khronos.org/opencl. (Accessed: 30th June 2016).
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987).
Sokal, R. R. & Michener, C. D. A statistical method for evaluating systematic relationships. Univ. Kans. Sci. Bull. 38, 1409–1438 (1958).
Florek, K., Łukaszewicz, J., Perkal, J., Steinhaus, H. & Zubrzycki, S. Sur la liaison et la division des points d’un ensemble fini. Colloq Math 2, 282–285 (1951).
Wheeler, T. J. & Kececioglu, J. D. Multiple alignment by aligning alignments. Bioinformatics 23(13), i559–i568 (2007).
Edgar, R. C. Optimizing substitution matrix choice and gap parameters for sequence alignment BMC Bioinformatics 10, 396 (2009).
Chakrabarti, S. et al. Refining multiple sequence alignments with conserved core regions. Nucleic Acids Res. 34(9), 2598–2606 (2006).
Liu, Y., Schmidt, B. & Maskell, D. L. MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26, 1958–1964 (2010).
Edgar, R. C. QSCORE multiple alignment scoring software. http://www.drive5.com/qscore. (Accessed: 30th June 2016).
Thompson, J. D., Plewniak, F. & Poch, O. BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15(1), 87–88 (1999).
Raghava, G., Searle, G., Audley, P., Barber, J. & Barton, G. OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics 4(1), 47 (2003).
Walle, I., Lasters, I. & Wyns, L. SABmark—a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21(7), 1267–1268 (2005).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Sievers, F., Dinnen, D., Wilm, A. & Higgins, D. G. Making automated multiple alignments of very large numbers of protein sequences. Bioinformatics 29, 989–995 (2013).
Gudyś, A. & Deorowicz, S. QuickProbs—A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors. PLoS One 9(7), e103051 (2014).
Ye, Y. et al. GLProbs: Aligning Multiple Sequences Adaptively. IEEE/ACM Trans. Comput. Biol. Bioinf. 12, 67–78 (2015).
Boyce, K., Sievers, F. & Higgins, D. G. Simple chained guide trees give high-quality protein multiple sequence alignments. Proc. Nat. Acad. Sci. USA 111(29), 10556–10561 (2014).
Boyce, K., Sievers, F. & Higgins, D. G. Reply to Tan et al.: Differences between real and simulated proteins in multiple sequence alignments. Proc. Nat. Acad. Sci. USA 112(2), E101 (2015).
Tan, G., Gil, M., Löytynoja, A. P., Goldman, N. & Dessimoz, C. Simple chained guide trees give poorer multiple sequence alignments than inferred trees in simulation and phylogenetic benchmarks. Proc. Nat. Acad. Sci. USA 112, E99–E100 (2015).