Statistical Binning for Barcoded Reads Improves Downstream Analyses

Cell Systems - Tập 7 - Trang 219-226.e5 - 2018
Ariya Shajii1, Ibrahim Numanagić1,2, Christopher Whelan3,4,5,6, Bonnie Berger1,2
1Computer Science and AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
2Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA
3Data Sciences Platform, Broad Institute, Cambridge, MA, USA
4Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
5Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA
6Department of Genetics, Harvard Medical School, Boston, MA, USA

Tài liệu tham khảo

10x Genomics (2017). What is long ranger?, https://support.10xgenomics.com/genome-exome/software/pipelines/latest/what-is-long-ranger/. 10x Genomics (2018). Sequencing. https://www.10xgenomics.com/solutions/vdj/, 2018. V(d)j. Amini, 2014, Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing, Nat.Genet., 46, 1343, 10.1038/ng.3119 Bishara, 2015, Read clouds uncover variation in complex regions of the human genome, Genome Res., 25, 1570, 10.1101/gr.191189.115 Cleary, 2017, Efficient generation of transcriptomic profiles by random composite measurements, Cell, 171, 1424, 10.1016/j.cell.2017.10.023 Cleary, 2014, Joint variant and de novo mutation identification on pedigrees from high-throughput sequencing data, J. Comput. Biol., 21, 405, 10.1089/cmb.2014.0029 DePristo, 2011, A framework for variation discovery and genotyping using next-generation DNA sequencing data, Nat. Genet., 43, 491, 10.1038/ng.806 Edge, 2016, Hapcut2: robust and accurate haplotype assembly for diverse sequencing technologies, Genome Res., 27, 801, 10.1101/gr.213462.116 Eid, 2009, Real-time DNA sequencing from single polymerase molecules, Science, 323, 133, 10.1126/science.1162986 Falchi, 2014, Low copy number of the salivary amylase gene predisposes to obesity, Nat. Genet., 46, 492, 10.1038/ng.2939 Goodwin, 2016, Coming of age: ten years of next-generation sequencing technologies, Nat. Rev. Genet., 17, 333, 10.1038/nrg.2016.49 Hashimshony, 2016, Cel-seq2: sensitive highly-multiplexed single-cell rna-seq, Genome Biol., 17, 77, 10.1186/s13059-016-0938-8 Ingelman-Sundberg, 2004, Genetic polymorphisms of cytochrome P450 2D6 (CYP2D6): Clinical consequences, evolutionary aspects and functional diversity, Pharmacogenomics J., 5, 6, 10.1038/sj.tpj.6500285 Jain, 2017, Nanopore sequencing and assembly of a human genome with ultra-long reads, Nat. Biotechnol., 36, 338, 10.1038/nbt.4060 Langmead, 2012, Fast gapped-read alignment with Bowtie 2, Nat. Methods, 9, 357, 10.1038/nmeth.1923 Li, 2009, Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, 25, 1754, 10.1093/bioinformatics/btp324 Li, 2009, The sequence alignment/map format and samtools,, Bioinformatics, 25, 2078, 10.1093/bioinformatics/btp352 Macosko, 2015, Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets, Cell, 161, 1202, 10.1016/j.cell.2015.05.002 Mardis, 2017, DNA sequencing technologies: 2006-2016, Nat. Protoc., 12, 213, 10.1038/nprot.2016.182 McCoy, 2014, Illumina truseq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements, PloS ONE, 9, e106689, 10.1371/journal.pone.0106689 McKenna, 2010, The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data, Genome Res., 20, 1297, 10.1101/gr.107524.110 Mostovoy, 2016, A hybrid approach for de novo human genome sequence assembly and phasing, Nat. Methods, 13, 587, 10.1038/nmeth.3865 Numanagić, 2018, Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes, Nat. Commun., 9, 828, 10.1038/s41467-018-03273-1 Pendleton, 2015, Assembly and diploid architecture of an individual human genome via single-molecule technologies, Nat. Methods, 12, 780, 10.1038/nmeth.3454 Schwarze, 2018, Are whole-exome and whole-genome sequencing approaches cost-effective? A systematic review of the literature, Genet. Med, 10.1038/gim.2017.247 Sekar, 2016, Schizophrenia risk from complex variation of complement component 4, Nature, 530, 177, 10.1038/nature16549 Twist, 2016, Constellation: A tool for rapid, automated phenotype assignment of a highly polymorphic pharmacogene, CYP2D6, from whole-genome sequences, NPJ Genom. Med., 1, 15007, 10.1038/npjgenmed.2015.7 Wang, 2014, The evolution of nanopore sequencing, Front. Genet, 5, 449 Yorukoglu, 2016, Compressive mapping for next-generation sequencing, Nat. Biotech., 34, 374, 10.1038/nbt.3511 Zheng, 2016, Haplotyping germline and cancer genomes using high-throughput linked-read sequencing, Nat. Biotechnol., 34, 303, 10.1038/nbt.3432 Ziegenhain, 2017, Comparative analysis of single-cell rna sequencing methods, Mol. Cell, 65, 631, 10.1016/j.molcel.2017.01.023 Zook, 2016, Extensive sequencing of seven human genomes to characterize benchmark reference materials, Scientific Data, 3, 160025, 10.1038/sdata.2016.25 Zook, 2014, Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls, Nat. Biotech., 32, 246, 10.1038/nbt.2835