Hệ thống phát hiện giả mạo dựa trên các đặc trưng tĩnh-dynamic và các mô hình học sâu lai cho xác minh người nói tự động (ASV)
Tóm tắt
Từ khóa
Tài liệu tham khảo
Beranek B (2013) Voice biometrics: success stories, success factors and what’s next. Biometr Technol Today 2013(7):9–11
Indumathi A, Chandra E (2012) Survey on speech synthesis. Signal Process Int J (SPIJ) 6(5):140
Lim R, Kwan E (2011) Voice conversion application (VOCAL). In: 2011 international conference on uncertainty reasoning and knowledge engineering, vol 1. IEEE, pp 259–262
Patil HA, Kamble MR (2018) A survey on replay attack detection for automatic speaker verification (ASV) system. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 1047–1053
Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Commun 66:130–153
Hautamäki RG, Kinnunen T, Hautamäki V, Leino T, Laukkanen AM (2013) I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Interspeech, pp 930–934
Hautamäki RG, Kinnunen T, Hautamäki V, Laukkanen AM (2014) Comparison of human listeners and speaker verification systems using voice mimicry data. Target 4000:5000
Lindberg J, Blomberg M (1999) Vulnerability in speaker verification-a study of technical impostor techniques. In: Sixth European conference on speech communication and technology
Chettri B, Stoller D, Morfi V, Ramírez MAM, Benetos E, Sturm BL (2019) Ensemble models for spoofing detection in automatic speaker verification. arXiv:1904.04589. arXiv preprint
Sahidullah M, Delgado H, Todisco M, Yu H, Kinnunen T, Evans N, Tan ZH (2016) Integrated spoofing countermeasures and automatic speaker verification: an evaluation on ASVspoof 2015
Lavrentyeva G, Novoselov S, Malykh E, Kozlov A, Kudashev O, Shchemelinin V (2017) Audio replay attack detection with deep learning frameworks. In: Interspeech, pp 82–86
Campbell JP (1995) Testing with the YOHO CD-ROM voice verification corpus. In: 1995 international conference on acoustics, speech, and signal processing, vol 1. IEEE, pp 341–344
Chakroborty S, Saha G (2009) Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on Gaussian filter. Int J Signal Process 5(1):11–19
Cai W, Wu H, Cai D, Li M (2019) The DKU replay detection system for the ASVspoof 2019 challenge: on data augmentation, feature representation, classification, and fusion. arXiv:1907.02663. arXiv preprint
Balamurali BT, Lin KE, Lui S, Chen JM, Herremans D (2019) Toward robust audio spoofing detection: a detailed comparison of traditional and learned features. IEEE Access 7:84229–84241
Dua M, Aggarwal RK, Biswas M (2017) Discriminative training using heterogeneous feature vector for Hindi automatic speech recognition system. In: International conference on computer and applications (ICCA), pp 158–162
Sahidullah M, Kinnunen T, Hanilçi C (2015) A comparison of features for synthetic speech detection. In: 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), pp 2087–2091
Pal M, Paul D, Saha G (2018) Synthetic speech detection using fundamental frequency variation and spectral features. Comput Speech Lang 48:31–50
Todisco M, Delgado H, Evans NW (2016) Articulation rate filtering of CQCC features for automatic speaker verification. In: Interspeech, pp 3628–3632
Jelil S, Das RK, Prasanna SM, Sinha R (2017) Spoof detection using source, instantaneous frequency and cepstral features. In: Interspeech, pp 22–26
Dua M, Aggarwal R, Kadyan V, Dua S (2012) Punjabi Speech to text system for connected words, pp 206–209
Dua M, Aggarwal RK, Biswas M (2018) Discriminative training using noise robust integrated features and refined HMM modeling. J Intell Syst 29(1):327–344
Dua M, Aggarwal RK, Biswas M (2019) GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. J Ambient Intell Hum Comput 10(2)
Dua M, Aggarwal RK, Biswas M (2019) Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling. Neural Comput Appl 31(10):6747–6755
Kumar MG, Kumar SR, Saranya MS, Bharathi B, Murthy HA (2019) Spoof detection using time-delay shallow neural network and feature switching. In: 2019 IEEE automatic speech recognition and understanding workshop (ASRU). IEEE, pp 1011–1017
ASVspoof 2019: automatic speaker verification spoofing and countermeasures challenge evaluation plan*. http://www.asvspoof.org/
Huang L, Pun CM (2019) Audio replay spoof attack detection using segment-based hybrid feature and Dense Net-LSTM network. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2567–2571
Mobiny A, Najarian M (2018) Text-independent speaker verification using long short-term memory networks. arXiv:1805.00604. arXiv preprint
Dua M, Jain C, Kumar S (2021) LSTM and CNN based ensemble approach for spoof detection task in automatic speaker verification systems. J Ambient Intell Human Comput
Mittal A, Dua M (2021) Automatic speaker verification system using three dimensional static and contextual variation-based features with two dimensional convolutional neural network. International J Swarm Intell
Mittal A, Dua M (2021) Constant Q cepstral coefficients and long short-term memory model-based automatic speaker verification system. In: Proceedings of international conference on intelligent computing, information and control systems, pp 895–904
Chettri B, Mishra S, Sturm BL, Benetos E (2018) Analysing the predictions of a cnn-based replay spoofing detection system. In: 2018 IEEE spoken language technology workshop (SLT). IEEE, pp 92–97
Valenti G, Delgado H, Todisco M, Evans NW, Pilati L (2018) An end-to-end spoofing countermeasure for automatic speaker verification using evolving recurrent neural networks. In: Odyssey, pp 288–295
Kamble MR, Sailor HB, Patil HA, Li H (2019) Advances in anti-spoofing: from the perspective of ASVspoof challenges. APSIPA Trans Signal Inf Process 9
Lai CI, Abad A, Richmond K, Yamagishi J, Dehak N, King S (2019) Attentive filtering networks for audio replay attack detection. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6316–6320
Edinburgh Data Share https://datashare.is.ed.ac.uk/handle/10283/3336
Brown JC, Puckette MS (1992) An efficient algorithm for the calculation of a constant Q transform. J Acoust Soc Am 92(5):2698–2701
Yang J, Das RK, Li H (2018) Extended constant-Q cepstral coefficients for detection of spoofing attacks. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 1024–1029
Glover JC, Lazzarini V, Timoney J (2011) Python for audio signal processing. In: Linux Audio Conference 2011, May 6-8 2011, Maynooth, Ireland
Cheuk KW, Anderson H, Agres K, Herremans D (2019) nnAudio: an on-the-fly GPU audio to spectrogram conversion toolbox using 1D convolution neural networks. arXiv:1912.12055. arXiv preprint
Dinkel H, Qian Y, Yu K (2018) Investigating raw wave deep neural networks for end-to-end speaker spoofing detection. IEEE/ACM Trans Audio Speech Lang Process 26(11):2002–2014
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. In: Proc. Int. Conf. Learn. Representations, pp 1–13
Brownlee J (2021) https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/. Machine Learning Mastery Pty. Ltd