Development of TBSPG Pipelines for Refining Unique Mapping and Repetitive Sequence Detection Using the Two Halves of Each Illumina Sequence Read

Plant Molecular Biology Reporter - 2015

Heng Xiang^1,2, Xiu-Qing Li²

¹College of Animal Science and Technology, Southwest University, Beibei, China

²Potato Research Centre, Agriculture and Agri-Food Canada, Fredericton, Canada

Tóm tắt

We developed six pipelines (TBSPG) for mapping Illumina sequence reads to reference genomes, refining unique mapping, and computing the mapped read number and coverage. These pipelines provide the options of conducting multi-mapping or unique mapping, inputting with paired-end read files or a single-end read file, removing or not removing nucleus-organelle shared sequences, and mapping with the full-length reads or with the two halves of each read to refine the detection of unique and non-unique sequences. These TBSPG pipelines were based on (and named after) publicly available tools: Trimmomatic, the Burrows–Wheeler Aligner (BWA), SAMtools, Picard, and the Genome Analysis Toolkit (GATK). We developed several Perl scripts to fill the gaps between the tools, connect the tools, recognize half-length reads, select uniquely mapped reads, and compute and output data in a Microsoft Excel-recognizable format for studying the read number and the coverage per chromosome and organellar genome. In a potato 100-bp paired-end sequence file (Illumina TruSeq), approximately 6.75 % of uniquely mapped full-length reads were found to actually contain non-unique sequences at the half-length-read level. These freely available TBSPG pipelines can be used for many read-based applications, including repetitive sequence analysis and organellar genome copy number estimation.

Từ khóa

Tài liệu tham khảo

Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079 Lohse M, Bolger AM, Nagel A, Fernie AR, Lunn JE, Stitt M, Usadel B (2012) RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res 40:W622–W627 McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303 Xu X, Pan S, Cheng S, Zhang B, Mu D, Ni P, Zhang G, Yang S, Li R, Wang J et al (2011) Genome sequence and analysis of the tuber crop potato. Nature 475:189–195

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA