Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing

Genome Biology - Tập 20 - Trang 1-18 - 2019
Shunichi Kosugi1,2, Yukihide Momozawa3, Xiaoxi Liu3, Chikashi Terao1,2, Michiaki Kubo4, Yoichiro Kamatani1,2
1Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
2Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
3Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
4RIKEN Center for Integrative Medical Sciences, Yokohama, Japan

Tóm tắt

Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.

Tài liệu tham khảo

Genome of the Netherlands C. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet. 2014;46:818–25.

Guan P, Sung WK. Structural variation detection using next-generation sequencing data: a comparative technical review. Methods. 2016;102:36–49.

Guryev V. 1-2-3-SV. 2012. https://github.com/Vityay/1-2-3-SV. Accessed 25 Oct 2018.

Pacific Biosciences. pbsv. 2017. https://github.com/PacificBiosciences/pbsv. Accessed 17 Aug 2017.

Kosugi S MY, Liu X, Terao C, Kubo M and Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Data set and source code. 2019. Github https://github.com/stat-lab/EvalSVcallers.