MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization
Tóm tắt
This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.
Từ khóa
Tài liệu tham khảo
Katoh, 2002, MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucleic Acids Res, 30, 3059, 10.1093/nar/gkf436
Katoh, 2013, MAFFT multiple sequence alignment software version 7: improvements in performance and usability, Mol Biol Evol, 30, 772, 10.1093/molbev/mst010
Fox, 2016, Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments, Bioinformatics, 32, 814, 10.1093/bioinformatics/btv592
Sievers, 2011, Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega, Mol Syst Biol, 7, 539., 10.1038/msb.2011.75
Mirarab, 2011, FastSP: linear time calculation of alignment accuracy, Bioinformatics, 27, 3250, 10.1093/bioinformatics/btr553
Katoh, 2007, PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences, Bioinformatics, 23, 372, 10.1093/bioinformatics/btl592
Higgins, 1988, CLUSTAL: a package for performing multiple sequence alignment on a microcomputer, Gene, 73, 237, 10.1016/0378-1119(88)90330-7
Needleman, 1970, A general method applicable to the search for similarities in the amino acid sequence of two proteins, J Mol Biol, 48, 443, 10.1016/0022-2836(70)90057-4
Hogeweg, 1984, The alignment of sets of sequences and the construction of phyletic trees: an integrated method, J Mol Evol, 20, 175, 10.1007/BF02257378
Feng, 1987, Progressive sequence alignment as a prerequisite to correct phylogenetic trees, J Mol Evol, 25, 351, 10.1007/BF02603120
Dayhoff, 1978, Atlas of Protein Sequence and Structure, 345
Yamada, 2016, Application of the mafft sequence alignment program to large data-reexamination of the usefulness of chained guide trees, Bioinformatics, 32, 3246, 10.1093/bioinformatics/btw412
Boyce, 2014, Simple chained guide trees give high-quality protein multiple sequence alignments, Proc Natl Acad Sci USA, 111, 10556, 10.1073/pnas.1405628111
Barton, 1987, A strategy for the rapid multiple alignment of protein sequences. confidence levels from tertiary structure comparisons, J Mol Biol, 198, 327, 10.1016/0022-2836(87)90316-0
Berger, 1991, A novel randomized iterative strategy for aligning multiple protein sequences, Comput Appl Biosci, 7, 479
Gotoh, 1993, Optimal alignment between groups of sequences and its application to multiple sequence alignment, Comput Appl Biosci, 9, 361
Katoh, 2012, Adding unaligned sequences into an existing alignment using MAFFT and LAST, Bioinformatics, 28, 3144, 10.1093/bioinformatics/bts578
Le, 2017, Protein multiple sequence alignment benchmarking through secondary structure prediction, Bioinformatics, 33, 1331, 10.1093/bioinformatics/btw840
Notredame, 1998, COFFEE: an objective function for multiple sequence alignments, Bioinformatics, 14, 407, 10.1093/bioinformatics/14.5.407
Sievers, 2014, Systematic exploration of guide-tree topology effects for small protein alignments, BMC Bioinformatics, 15, 338., 10.1186/1471-2105-15-338
Tan, 2015, Simple chained guide trees give poorer multiple sequence alignments than inferred trees in simulation and phylogenetic benchmarks, Proc Natl Acad Sci USA, 112, E99, 10.1073/pnas.1417526112
Nguyen, 2015, Ultra-large alignments using phylogeny-aware profiles, Genome Biol, 16, 124., 10.1186/s13059-015-0688-z
Blackshields, 2010, Sequence embedding for fast construction of guide trees for multiple sequence alignment, Algorithms Mol Biol, 5, 21., 10.1186/1748-7188-5-21
Mirarab, 2015, PASTA: ultra-large multiple sequence alignment for nucleotide and amino-acid sequences, J Comput Biol, 22, 377, 10.1089/cmb.2014.0156
Finn, 2011, Hmmer web server: interactive sequence similarity searching, Nucleic Acids Res, 39, W29, 10.1093/nar/gkr367
Berger, 2011, Aligning short reads to reference alignments and trees, Bioinformatics, 27, 2068, 10.1093/bioinformatics/btr320
Löytynoja, 2012, Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm, Bioinformatics, 28, 1684, 10.1093/bioinformatics/bts198
Gotoh, 2014, Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment, BMC Bioinformatics, 15, 189., 10.1186/1471-2105-15-189
Nagy, 2013, MisPred: a resource for identification of erroneous protein sequences in public databases, Database, 2013, bat053., 10.1093/database/bat053
Yandell, 2012, A beginner’s guide to eukaryotic genome annotation, Nat Rev Genet, 13, 329, 10.1038/nrg3174
Kuraku, 2013, aLeaves facilitates on-demand exploration of metazoan gene family trees on mafft sequence alignment server with enhanced interactivity, Nucleic Acids Res, 41, W22, 10.1093/nar/gkt389
Li, 2001, Clustering of highly homologous sequences to reduce the size of large protein databases, Bioinformatics, 17, 282, 10.1093/bioinformatics/17.3.282
Gouveia-Oliveira, 2007, MaxAlign: maximizing usable data in an alignment, BMC Bioinformatics, 8, 312., 10.1186/1471-2105-8-312
Saitou, 1987, The neighbor-joining method: a new method for reconstructing phylogenetic trees, Mol Biol Evol, 4, 406
Sokal, 1958, A statistical method for evaluating systematic relationships, University of Kansas Scientific Bulletin, 28, 1409
Robinson, 2016, Phylo.io: interactive viewing and comparison of large phylogenetic trees on the web, Mol Biol Evol, 33, 2163, 10.1093/molbev/msw080
Han, 2009, phyloXML: XML for evolutionary biology and comparative genomics, BMC Bioinformatics, 10, 356., 10.1186/1471-2105-10-356
Waterhouse, 2009, Jalview version 2–a multiple sequence alignment editor and analysis workbench, Bioinformatics, 25, 1189, 10.1093/bioinformatics/btp033
Yachdav, 2016, MSAViewer: interactive JavaScript visualization of multiple sequence alignments, Bioinformatics, 32, 3501, 10.1093/bioinformatics/btw474
Sievers, 2013, Making automated multiple alignments of very large numbers of protein sequences, Bioinformatics, 29, 989, 10.1093/bioinformatics/btt093
Kamisetty, 2013, Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era, Proc Natl Acad Sci USA, 110, 15674, 10.1073/pnas.1314045110