MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization

Briefings in Bioinformatics - Tập 20 Số 4 - Trang 1160-1166 - 2019
Kazutaka Katoh1,2, John Rozewicki1, Kazunori Yamada1,3
13-1 Yamadaoka, Suita, Osaka 565-0871, JAPAN.
2Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University.
3Research Institute for Microbial Diseases, Osaka University

Tóm tắt

Abstract

This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.

Từ khóa


Tài liệu tham khảo

Katoh, 2002, MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucleic Acids Res, 30, 3059, 10.1093/nar/gkf436

Katoh, 2013, MAFFT multiple sequence alignment software version 7: improvements in performance and usability, Mol Biol Evol, 30, 772, 10.1093/molbev/mst010

Fox, 2016, Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments, Bioinformatics, 32, 814, 10.1093/bioinformatics/btv592

Sievers, 2011, Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega, Mol Syst Biol, 7, 539., 10.1038/msb.2011.75

Mirarab, 2011, FastSP: linear time calculation of alignment accuracy, Bioinformatics, 27, 3250, 10.1093/bioinformatics/btr553

Katoh, 2007, PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences, Bioinformatics, 23, 372, 10.1093/bioinformatics/btl592

Higgins, 1988, CLUSTAL: a package for performing multiple sequence alignment on a microcomputer, Gene, 73, 237, 10.1016/0378-1119(88)90330-7

Needleman, 1970, A general method applicable to the search for similarities in the amino acid sequence of two proteins, J Mol Biol, 48, 443, 10.1016/0022-2836(70)90057-4

Hogeweg, 1984, The alignment of sets of sequences and the construction of phyletic trees: an integrated method, J Mol Evol, 20, 175, 10.1007/BF02257378

Feng, 1987, Progressive sequence alignment as a prerequisite to correct phylogenetic trees, J Mol Evol, 25, 351, 10.1007/BF02603120

Dayhoff, 1978, Atlas of Protein Sequence and Structure, 345

Yamada, 2016, Application of the mafft sequence alignment program to large data-reexamination of the usefulness of chained guide trees, Bioinformatics, 32, 3246, 10.1093/bioinformatics/btw412

Boyce, 2014, Simple chained guide trees give high-quality protein multiple sequence alignments, Proc Natl Acad Sci USA, 111, 10556, 10.1073/pnas.1405628111

Barton, 1987, A strategy for the rapid multiple alignment of protein sequences. confidence levels from tertiary structure comparisons, J Mol Biol, 198, 327, 10.1016/0022-2836(87)90316-0

Berger, 1991, A novel randomized iterative strategy for aligning multiple protein sequences, Comput Appl Biosci, 7, 479

Gotoh, 1993, Optimal alignment between groups of sequences and its application to multiple sequence alignment, Comput Appl Biosci, 9, 361

Katoh, 2012, Adding unaligned sequences into an existing alignment using MAFFT and LAST, Bioinformatics, 28, 3144, 10.1093/bioinformatics/bts578

Le, 2017, Protein multiple sequence alignment benchmarking through secondary structure prediction, Bioinformatics, 33, 1331, 10.1093/bioinformatics/btw840

Notredame, 1998, COFFEE: an objective function for multiple sequence alignments, Bioinformatics, 14, 407, 10.1093/bioinformatics/14.5.407

Sievers, 2014, Systematic exploration of guide-tree topology effects for small protein alignments, BMC Bioinformatics, 15, 338., 10.1186/1471-2105-15-338

Tan, 2015, Simple chained guide trees give poorer multiple sequence alignments than inferred trees in simulation and phylogenetic benchmarks, Proc Natl Acad Sci USA, 112, E99, 10.1073/pnas.1417526112

Nguyen, 2015, Ultra-large alignments using phylogeny-aware profiles, Genome Biol, 16, 124., 10.1186/s13059-015-0688-z

Blackshields, 2010, Sequence embedding for fast construction of guide trees for multiple sequence alignment, Algorithms Mol Biol, 5, 21., 10.1186/1748-7188-5-21

Mirarab, 2015, PASTA: ultra-large multiple sequence alignment for nucleotide and amino-acid sequences, J Comput Biol, 22, 377, 10.1089/cmb.2014.0156

Finn, 2011, Hmmer web server: interactive sequence similarity searching, Nucleic Acids Res, 39, W29, 10.1093/nar/gkr367

Berger, 2011, Aligning short reads to reference alignments and trees, Bioinformatics, 27, 2068, 10.1093/bioinformatics/btr320

Löytynoja, 2012, Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm, Bioinformatics, 28, 1684, 10.1093/bioinformatics/bts198

Gotoh, 2014, Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment, BMC Bioinformatics, 15, 189., 10.1186/1471-2105-15-189

Nagy, 2013, MisPred: a resource for identification of erroneous protein sequences in public databases, Database, 2013, bat053., 10.1093/database/bat053

Yandell, 2012, A beginner’s guide to eukaryotic genome annotation, Nat Rev Genet, 13, 329, 10.1038/nrg3174

Kuraku, 2013, aLeaves facilitates on-demand exploration of metazoan gene family trees on mafft sequence alignment server with enhanced interactivity, Nucleic Acids Res, 41, W22, 10.1093/nar/gkt389

Li, 2001, Clustering of highly homologous sequences to reduce the size of large protein databases, Bioinformatics, 17, 282, 10.1093/bioinformatics/17.3.282

Gouveia-Oliveira, 2007, MaxAlign: maximizing usable data in an alignment, BMC Bioinformatics, 8, 312., 10.1186/1471-2105-8-312

Saitou, 1987, The neighbor-joining method: a new method for reconstructing phylogenetic trees, Mol Biol Evol, 4, 406

Sokal, 1958, A statistical method for evaluating systematic relationships, University of Kansas Scientific Bulletin, 28, 1409

Robinson, 2016, Phylo.io: interactive viewing and comparison of large phylogenetic trees on the web, Mol Biol Evol, 33, 2163, 10.1093/molbev/msw080

Han, 2009, phyloXML: XML for evolutionary biology and comparative genomics, BMC Bioinformatics, 10, 356., 10.1186/1471-2105-10-356

Waterhouse, 2009, Jalview version 2–a multiple sequence alignment editor and analysis workbench, Bioinformatics, 25, 1189, 10.1093/bioinformatics/btp033

Yachdav, 2016, MSAViewer: interactive JavaScript visualization of multiple sequence alignments, Bioinformatics, 32, 3501, 10.1093/bioinformatics/btw474

Sievers, 2013, Making automated multiple alignments of very large numbers of protein sequences, Bioinformatics, 29, 989, 10.1093/bioinformatics/btt093

Kamisetty, 2013, Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era, Proc Natl Acad Sci USA, 110, 15674, 10.1073/pnas.1314045110

Marks, 2012, Protein structure prediction from sequence variation, Nat Biotechnol, 30, 1072, 10.1038/nbt.2419