Searching for Sequencing Signal Anomalies Associated with Genomic Structural Variations
Biophysics - 2024
Tóm tắt
Genomic structural variations (SVs) are among the main sources of genetic diversity. Structural variants as mutagens may significantly affect human health, causing hereditary diseases and cancers. Existing methods analyze high-throughput sequencing data to find structural variants. Despite substantial progress in their development, the methods still fail to detect structural variations with an accuracy sufficient for their use in diagnosis. Analysis of the sequencing coverage signal (i.e., the number of aligned sequencing reads for every point of a genome) holds the new potential for designing approaches to structural variation detection and can be used as time-series analysis. A method to detect repetitive patterns in the coverage signal was developed based on the time series-assessing algorithms KNN (K-nearest neighbor) and SAX (Symbolic Aggregation Approximation). Using the rich dataset encompassing the full genomes of 911 individuals with different ethnic backgrounds from the Human Genome Diversity Project, generalized patterns of the coverage signal were constructed for regions in the vicinity of breakpoints corresponding to various structural variant types. The patterns were used to develop a software package for fast detection of anomalies in the coverage signal.
Từ khóa
Tài liệu tham khảo
R. L. Collins et al., Nature 581 (7809), 444 (2020).
Y. R. Li et al., Nat. Commun. 11 (1), 255 (2020).
S. Girirajan et al., Am. J. Hum. Genet. 92 (2), 221 (2013).
M. Mahmoud et al., Genome Biol. 20 (1), 1 (2019).
S. Kosugi et al., Genome Biol. 20 (1), 117 (2019).
Z. Liu et al., Genome Biol. 23 (1), 68 (2022).
H. Parikh et al., BMC Genomics 17 (1), 64 (2016).
A. Abyzov et al., Genome Res. 21 (6), 974 (2011).
M. Rapti et al., Brief Bioinf. 23 (2), bbac049 (2022).
Z. A. Aghbari Data Knowl. Eng. 52 (3), 333 (2005).
S. Malinowski et al., Lect. Notes Comput. Sci. 273 (2013).
Proceedings of Thirteen International Multiconference 762 BGRS/SB-2022 Swaveform: A Genome-Wide Survey of Structural Variation Profiles, (2022).
A. Bergstrom et al., Science 367 (6484), eaay5012 (2020).
M. A. Almarri et al., Cell 182 (1), 189 (2020).
H. Sakoe and S. Chiba, IEEE Trans. Acoust., Speech, Signal Process. 26 (1), 43 (1978).
F. Petitjean A. Ketterlin, and P. Gancarski, Pattern Recognit. 44 (3), 678 (2011).
R. Tavenard et al., J. Mach. Learn. Res. 21 (118), 1 (2020).
B. S. Pedersen and A. R. Quinlan, Bioinformatics 34 (5), 867 (2018).
D. V. Zhernakova et al., Genomics 1 (2019).
T. Rausch et al., Bioinformatics 28 (18), i333 (2012).
J. M. Zook et al., Nat. Biotechnol. 1 (2020).
A. Shumate et al., Genome Biol. 1 (2020).
J. M. Zook et al., Sci. Data 3, 160025 (2016).
L. M. Chapman, et al., PLoS Comput. Biol. 16 (6), e1007933–20 (2020).