trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses

Bioinformatics - Tập 25 Số 15 - Trang 1972-1973 - 2009
Salvador Capella-Gutiérrez1, José M. Silla-Martínez1, Toni Gabaldón1
1Comparative Genomics group, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader, 88 08003 Barcelona, Spain

Tóm tắt

Abstract

Summary: Multiple sequence alignments are central to many areas of bioinformatics. It has been shown that the removal of poorly aligned regions from an alignment increases the quality of subsequent analyses. Such an alignment trimming phase is complicated in large-scale phylogenetic analyses that deal with thousands of alignments. Here, we present trimAl, a tool for automated alignment trimming, which is especially suited for large-scale phylogenetic analyses. trimAl can consider several parameters, alone or in multiple combinations, for selecting the most reliable positions in the alignment. These include the proportion of sequences with a gap, the level of amino acid similarity and, if several alignments for the same set of sequences are provided, the level of consistency across different alignments. Moreover, trimAl can automatically select the parameters to be used in each specific alignment so that the signal-to-noise ratio is optimized.

Availability: trimAl has been written in C++, it is portable to all platforms. trimAl is freely available for download (http://trimal.cgenomics.org) and can be used online through the Phylemon web server (http://phylemon2.bioinfo.cipf.es/). Supplementary Material is available at http://trimal.cgenomics.org/publications.

Contact:  [email protected]

Từ khóa


Tài liệu tham khảo

Castresana, 2000, Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis, Mol. Biol. Evol., 17, 540, 10.1093/oxfordjournals.molbev.a026334

Huerta-Cepas, 2007, The human phylome, Genome Biol., 8, R109, 10.1186/gb-2007-8-6-r109

Huerta-Cepas, 2008, PhylomeDB: a database for genome-wide collections of gene phylogenies, Nucleic Acids Res., 36, D491, 10.1093/nar/gkm899

Notredame, 2007, Recent evolutions of multiple sequence alignment algorithms, PLoS Comput. Biol., 3, e123, 10.1371/journal.pcbi.0030123

Robinson, 1981, Comparison of phylogenetic trees, Math. Biosci., 53, 131, 10.1016/0025-5564(81)90043-2

Stoye, 1998, Rose: generating sequence families, Bioinformatics, 14, 157, 10.1093/bioinformatics/14.2.157

Talavera, 2007, Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments, Syst. Biol., 56, 564, 10.1080/10635150701472164

Tarraga, 2007, Phylemon: a suite of web tools for molecular evolution, phylogenetics and phylogenomics, Nucleic Acids Res., 35, W38, 10.1093/nar/gkm224

Thompson, 2001, Towards a reliable objective function for multiple sequence alignments, J. Mol. Biol., 314, 937, 10.1006/jmbi.2001.5187