STAR: ultrafast universal RNA-seq aligner

Bioinformatics - Tập 29 Số 1 - Trang 15-21 - 2013

Alexander Dobin¹, Carrie Davis¹, Felix Schlesinger¹, Jörg Hackermüller¹, Chris Zaleski¹, Sonali Jha¹, Philippe Batut¹, Mark Chaisson¹, T Gingeras¹

¹1 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA and 2Pacific Biosciences, Menlo Park, CA, USA

Tóm tắt

Abstract Motivation: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80–90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/. Contact: [email protected].

Từ khóa

Tài liệu tham khảo

Au, 2010, Detection of splice junctions from paired-end RNA-seq data by SpliceMap, Nucleic Acids Res., 38, 4570, 10.1093/nar/gkq211

Darling, 2004, Mauve: multiple alignment of conserved genomic sequence with rearrangements, Genome Res., 14, 1394, 10.1101/gr.2289704

Darling, 2010, progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement, PLoS One, 5, e11147, 10.1371/journal.pone.0011147

De Bona, 2008, Optimal spliced alignments of short sequence reads, Bioinformatics, 24, i174, 10.1093/bioinformatics/btn300

Delcher, 1999, Alignment of whole genomes, Nucleic Acids Res., 27, 2369, 10.1093/nar/27.11.2369

Delcher, 2002, Fast algorithms for large-scale genome alignment and comparison, Nucleic Acids Res., 30, 2478, 10.1093/nar/30.11.2478

Djebali, 2012, Landscape of transcription in human cells, Nature, 489, 101, 10.1038/nature11233

Flusberg, 2010, Direct detection of DNA methylation during single-molecule, real-time sequencing, Nat. Methods, 7, 461, 10.1038/nmeth.1459

Grant, 2011, Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM), Bioinformatics, 27, 2518, 10.1093/bioinformatics/btr427

Han, 2011, Pre-mRNA splicing: where and when in the nucleus, Trends Cell. Biol., 21, 336, 10.1016/j.tcb.2011.03.003

Harrow, 2012, GENCODE: The reference human genome annotation for the ENCODE project, Genome Res., 22, 1760, 10.1101/gr.135350.111

Hastings, 2001, Pre-mRNA splicing in the new millennium, Curr. Opin. Cell. Biol., 13, 302, 10.1016/S0955-0674(00)00212-X

Kurtz, 2004, Versatile and open software for comparing large genomes, Genome Biol., 5, R12, 10.1186/gb-2004-5-2-r12

Kent, 2002, BLAT–the BLAST-like alignment tool., Genome Res., 12, 656

Landt, 2012, ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia, Genome Res., 22, 1813, 10.1101/gr.136184.111

Manber, 1993, Suffix arrays—a new method for online string searches, SIAM J. Comput., 22, 935, 10.1137/0222058

Parkhomchuk, 2009, Transcriptome analysis by strand-specific sequencing of complementary DNA, Nucleic Acids Res., 37, e123, 10.1093/nar/gkp596

Rothberg, 2011, An integrated semiconductor device enabling non-optical genome sequencing, Nature, 475, 348, 10.1038/nature10242

Trapnell, 2009, TopHat: discovering splice junctions with RNA-Seq, Bioinformatics, 25, 1105, 10.1093/bioinformatics/btp120

Wang, 2010, MapSplice: accurate mapping of RNA-seq reads for splice junction discovery, Nucleic Acids Res., 38, e178, 10.1093/nar/gkq622

Wu, 2010, Fast and SNP-tolerant detection of complex variants and splicing in short reads, Bioinformatics, 26, 873, 10.1093/bioinformatics/btq057

Zhang, 2012, PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data, Bioinformatics, 28, 479, 10.1093/bioinformatics/btr712

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA