Morse wavelet transform-based features for voice liveness detection

Computer Speech & Language - Tập 84 - Trang 101571 - 2024
Priyanka Gupta1, Hemant A. Patil1
1Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar 382007, India.

Tài liệu tham khảo

Akimoto, K., Liew, S.P., Mishima, S., Mizushima, R., Lee, K.A., 2020. POCO: A Voice Spoofing and Liveness Detection Corpus based on Pop Noise. In: INTERSPEECH. Shanghai, China, pp. 1081–1085. Anon, 2017, HSBC reports high trust levels in biometric tech as twins spoof its voice ID system, Biometr. Technol. Today, 12 Anon, 2020 Campbell, 2005, A Matlab simulation of “shoebox” room acoustics for use in research and teaching, Comput. Inf. Syst., 9, 48 Daubechies, 1996, Where do wavelets come from? A personal point of view, Proc. IEEE, 84, 510, 10.1109/5.488696 Delprat, 1992, Asymptotic wavelet and Gabor analysis: Extraction of instantaneous frequencies, IEEE Trans. Inform. Theory, 38, 644, 10.1109/18.119728 Elko, G.W., Meyer, J., Backer, S., Peissig, J., 2007. Electronic pop protection for microphones. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. New York, USA, pp. 46–49. Eyring, 1930, Reverberation time in ”dead” rooms, J. Acoust. Soc. Am., 1, 217, 10.1121/1.1915175 Flanagan, 2013 Gabor, 1946, Theory of communication-Part 1: The analysis of information, J. Inst. Electr. Eng. III Radio Commun. Eng., 93, 429 Goupillaud, 1984, Cycle-octave and related transforms in seismic signal analysis, Geoexploration, 23, 85, 10.1016/0016-7142(84)90025-5 Goupillaud, 1984, A simplified view of the cycle-octave and voice representations of seismic signals, 379 Grossmann, 1990, Reading and understanding continuous wavelet transforms, 2 Grossmann, 1984, Decomposition of Hardy functions into square integrable wavelets of constant shape, SIAM J. Math. Anal., 15, 723, 10.1137/0515056 Hansen, 2015, Speaker recognition by machines and humans: A tutorial review, IEEE Signal Process. Mag., 32, 74, 10.1109/MSP.2015.2462851 He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. CVPR, Las Vegas, NV, USA, pp. 770–778. Hsu, 1995, Spectrum analysis of base-line-popping noise in MR heads, IEEE Trans. Magn., 31, 2636, 10.1109/20.490077 Khoria, 2023, On significance of constant-Q transform for pop noise detection, Comput. Speech Lang., 77, 10.1016/j.csl.2022.101421 Kinnunen, T., Sahidullah, M., Delgado, H., Todisco, M., Evans, N., Yamagishi, J., Lee, K.A., 2017. The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. In: INTERSPEECH. Stockholm, Sweden, pp. 2–6. Lau, Y.W., Wagner, M., Tran, D., 2004. Vulnerability of speaker verification to voice mimicking. In: International Symposium on Intelligent Multimedia, Video, and Speech Processing. Hong Kong, pp. 145–148. Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., Shchemelinin, V., 2017. Audio replay attack detection with deep learning frameworks. In: INTERSPEECH. San Francisco, USA, pp. 82–86. Licklider, 1948, Effects of differentiation, integration, and infinite peak clipping upon the intelligibility of speech, J. Acoust. Soc. Am. (JASA), 20, 42, 10.1121/1.1906346 Lilly, 2008, Higher-order properties of analytic wavelets, IEEE Trans. Signal Process., 57, 146, 10.1109/TSP.2008.2007607 Lilly, 2010, On the analytic wavelet transform, IEEE Trans. Inform. Theory, 56, 4135, 10.1109/TIT.2010.2050935 Lilly, 2012, Generalized Morse wavelets as a superfamily of analytic wavelets, IEEE Trans. Signal Process., 60, 6036, 10.1109/TSP.2012.2210890 Lin, 2000, Feature extraction based on Morlet wavelet and its application for mechanical fault diagnosis, J. Sound Vib., 234, 135, 10.1006/jsvi.2000.2864 Łobos, 2002, Wavelet transforms for real-time estimation of transmission line impedance under transient conditions, Electr. Eng., 84, 63, 10.1007/s002020100104 Mallat, 1999 Mallat, 1993, Matching pursuits with time-frequency dictionaries, IEEE Trans. Signal Process., 41, 3397, 10.1109/78.258082 Mallat, 1992, Characterization of signals from multiscale edges, IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), 14, 710, 10.1109/34.142909 McAulay, 1986, Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech Signal Process., 34, 744, 10.1109/TASSP.1986.1164910 McClanahan, R.D., Stewart, B., De Leon, P.L., 2014. Performance of I-vector speaker verification and the detection of synthetic speech. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP, Florence, Italy, pp. 3779–3783. Mochizuki, S., Shiota, S., Kiya, H., 2018. Voice liveness detection using phoneme-based pop-noise detector for speaker verification. In: Odyssey 2018 the Speaker and Language Recognition Workshop. ISCA, Les Sables d’Olonne, France, pp. 233–239. Nishida, Y., Hori, T., Suehiro, T., Hirai, S., 2000. Monitoring of breath sound under daily environment by ceiling dome microphone. In: International Conference on Systems, Man and Cybernetics. Vol. 3. Nashville, USA, pp. 1822–1829. Pike, 1994, Analysis of high resolution marine seismic data using the wavelet transform, 183, 10.1016/B978-0-08-052087-2.50014-1 Priyanka Gupta, Hemant A. Patil, 2022a. Effect of Speaker-Microphone Proximity on Pop Noise: Continuous Wavelet Transform-Based Approach. In: 13th International Symposium on Chinese Spoken Language Processing. ISCSLP, Singapore. Priyanka Gupta, Hemant A. Patil, 2022b. Significance of Distance on Pop Noise for Voice Liveness Detection. In: 24th International Conference on Speech and Computer. SPECOM, Gurugram, India, pp. 226–237. Priyanka Gupta, Piyushkumar K. Chodingala, Hemant A. Patil, 2022a. Morlet Wavelet-Based Voice Liveness Detection Using Convolutional Neural Network. In: European Signal Processing Conference. EUSIPCO, Belgrade, Serbia, 29 Aug - 02 Sep. Priyanka Gupta, Piyushkumar K. Chodingala, Hemant A. Patil, 2022b. Morse Wavelet Features For Pop Noise Detection. In: IEEE International Conference on Signal Processing and Communication. SPCOM, IISc Bengaluru, India, pp. 1–5. Priyanka Gupta, 2021, Voice liveness detection using bump wavelet with cnn Qin, Y., Yu, C., Li, Z., Zhong, M., Yan, Y., Shi, Y., 2021. ProxiMic: Convenient Voice Activation via Close-to-Mic Speech Detected by a Single Microphone. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Yokohama, Japan, pp. 1–12. Quatieri, 2006 Rosenberg, 1976, Automatic speaker verification: A review, Proc. IEEE, 64, 475, 10.1109/PROC.1976.10156 Shiota, S., Villavicencio, F., Yamagishi, J., Ono, N., Echizen, I., Matsui, T., 2015. Voice Liveness Detection algorithms based on pop noise caused by human breath for Automatic Speaker Verification. In: INTERSPEECH. Dresden, Germany, pp. 2047–2051. Shiota Sayaka, 2016, Voice liveness detection for speaker verification based on a tandem single/double-channel pop noise detector, 259 Siddhant Gupta, Kuldeep Khoria, Ankur T. Patil, Hemant A. Patil, 2021b. Deep convolutional neural network for voice liveness detection. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. APSIPA-ASC, Tokyo, Japan. Singh, 2021, Modified group delay function using different spectral smoothing techniques for voice liveness detection, 649 Sizov, 2015, Joint speaker verification and antispoofing in the i-vector space, IEEE Trans. Inf. Forensics Secur., 10, 821, 10.1109/TIFS.2015.2407362 Stylianou, Y., 2009. Voice transformation: A survey. In: International Conference on Acoustics, Speech, and Signal Processing. ICASSP, Taipei, Taiwan, pp. 3585–3588. Tak, 2021, End-to-end anti-spoofing with rawnet2, 6369 Tak, H., Todisco, M., Wang, X., Jung, J.-w., Yamagishi, J., Evans, N., 2022. Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation. In: The Speaker and Language Recognition Workshop. Speaker Odyssey, Beijing, China, June 28 - July 01. Teolis, 1998 Todisco, M., Wang, X., Vestman, V., Sahidullah, M., Delgado, H., Nautsch, A., Evans, N., Kinnunen, T., Lee, K.A., 2019. ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. In: INTERSPEECH. Graz, Austria, pp. 1008–1012. Tu, 2005, Analysis of singularities from modulus maxima of complex wavelets, IEEE Trans. Inform. Theory, 51, 1049, 10.1109/TIT.2004.842706 Vakman, 1977, Amplitude, phase, frequency- fundamental concepts of oscillation theory, Sov. Phys. Uspekhi, 20, 1002, 10.1070/PU1977v020n12ABEH005479 Vara Prasad Naraharisetti, K., 2010. Enhancement of breathing signal using delayless subband adaptive filter with HPF. In: The 10th IEEE International Symposium on Signal Processing and Information Technology. Luxor, Egypt, pp. 177–181. Vásquez-Correa, J.C., Orozco-Arroyave, J.R., Nöth, E., 2017. Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease. In: INTERSPEECH. Stockholm, Sweden, pp. 314–318. Vincent, 2022 Wang, Q., Lin, X., Zhou, M., Chen, Y., Wang, C., Li, Q., Luo, X., 2019. VoicePop: A pop noise based anti-spoofing system for voice authentication on smartphones. In: IEEE Conference on Computer Communications. Paris, France, pp. 2062–2070. Wu, 2018, A light CNN for deep face representation with noisy labels, IEEE Trans. Inf. Forensics Secur., 13, 2884, 10.1109/TIFS.2018.2833032 Yang, 1992, Auditory representations of acoustic signals, IEEE Trans. Inform. Theory, 38, 824, 10.1109/18.119739 Zen, 2009, Statistical parametric speech synthesis, Speech Commun., 51, 1039, 10.1016/j.specom.2009.04.004 Zuidwijk, 1996