Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

Nature Biotechnology - Tập 37 Số 8 - Trang 907-915 - 2019
Daehwan Kim1, Joseph M. Paggi2, Chanhee Park1, Christopher Bennett1, Steven L. Salzberg3
1Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
2Department of Computer Science, Stanford University, Stanford, CA, USA
3Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

GTEx Consortium The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

t Hoen, P. A. et al. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat. Biotechnol. 31, 1015–1022 (2013).

Sanders, S. J. et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241 (2012).

Krumm, N. et al. Excess of rare, inherited truncating mutations in autism. Nat. Genet. 47, 582–588 (2015).

Leinonen, R., Sugawara, H., Shumway, M. & International Nucleotide Sequence Database Consortium. The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2011).

Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

Lappalainen, I. et al. DbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 41, D936–D941 (2013).

Burrows, M. & Wheeler, D. J. A block sorting lossless data compression algorithm. SRC Research Report 124 (Digital Equipment Corporation, 1994).

Ferragina, P. & Manzini, G. in Proceedings 41st Annual Symposium on Foundations of Computer Science, IEEE Computer Society 390–398 (2000).

Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).

Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).

Rakocevic, G. et al. Fast and accurate genomic analyses using genome graphs. Nat. Genet. 51, 354–362 (2019).

Siren, J., Valimaki, N. & Makinen, V. Indexing graphs for path queries with applications in genome research. IEEE-ACM Trans. Comput. Biol. Bioinform. 11, 375–388 (2014).

Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

Robinson, J. et al. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res. 43, D423–D431 (2015).

Hares, D. R. Expanding the CODIS core loci in the United States. Forensic Sci. Int. Genet. 6, e52–e54 (2012).

Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012).

Compeau, P. E., Pevzner, P. A. & Tesler, G. How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol. 29, 987–991 (2011).

Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie2. Nat. Methods 9, 357–359 (2012).

Eberle, M. A. et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27, 157–164 (2017).

Erlich, R. L. et al. Next-generation sequencing for HLA typing of class I loci. BMC Genomics 12, 42 (2011).

Lee, H. & Kingsford, C. Kourami: graph-guided assembly for novel human leukocyte antigen allele discovery. Genome Biol. 19, 16 (2018).

Li, H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32, 2103–2110 (2016).

Pachter, L. Models for transcript quantification from RNA-Seq. Preprint at https://arxiv.org/abs/1104.3889 (2011).

Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).