Morse wavelet transform-based features for voice liveness detection
Tài liệu tham khảo
Akimoto, K., Liew, S.P., Mishima, S., Mizushima, R., Lee, K.A., 2020. POCO: A Voice Spoofing and Liveness Detection Corpus based on Pop Noise. In: INTERSPEECH. Shanghai, China, pp. 1081–1085.
Anon, 2017, HSBC reports high trust levels in biometric tech as twins spoof its voice ID system, Biometr. Technol. Today, 12
Anon, 2020
Campbell, 2005, A Matlab simulation of “shoebox” room acoustics for use in research and teaching, Comput. Inf. Syst., 9, 48
Daubechies, 1996, Where do wavelets come from? A personal point of view, Proc. IEEE, 84, 510, 10.1109/5.488696
Delprat, 1992, Asymptotic wavelet and Gabor analysis: Extraction of instantaneous frequencies, IEEE Trans. Inform. Theory, 38, 644, 10.1109/18.119728
Elko, G.W., Meyer, J., Backer, S., Peissig, J., 2007. Electronic pop protection for microphones. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. New York, USA, pp. 46–49.
Eyring, 1930, Reverberation time in ”dead” rooms, J. Acoust. Soc. Am., 1, 217, 10.1121/1.1915175
Flanagan, 2013
Gabor, 1946, Theory of communication-Part 1: The analysis of information, J. Inst. Electr. Eng. III Radio Commun. Eng., 93, 429
Goupillaud, 1984, Cycle-octave and related transforms in seismic signal analysis, Geoexploration, 23, 85, 10.1016/0016-7142(84)90025-5
Goupillaud, 1984, A simplified view of the cycle-octave and voice representations of seismic signals, 379
Grossmann, 1990, Reading and understanding continuous wavelet transforms, 2
Grossmann, 1984, Decomposition of Hardy functions into square integrable wavelets of constant shape, SIAM J. Math. Anal., 15, 723, 10.1137/0515056
Hansen, 2015, Speaker recognition by machines and humans: A tutorial review, IEEE Signal Process. Mag., 32, 74, 10.1109/MSP.2015.2462851
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. CVPR, Las Vegas, NV, USA, pp. 770–778.
Hsu, 1995, Spectrum analysis of base-line-popping noise in MR heads, IEEE Trans. Magn., 31, 2636, 10.1109/20.490077
Khoria, 2023, On significance of constant-Q transform for pop noise detection, Comput. Speech Lang., 77, 10.1016/j.csl.2022.101421
Kinnunen, T., Sahidullah, M., Delgado, H., Todisco, M., Evans, N., Yamagishi, J., Lee, K.A., 2017. The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. In: INTERSPEECH. Stockholm, Sweden, pp. 2–6.
Lau, Y.W., Wagner, M., Tran, D., 2004. Vulnerability of speaker verification to voice mimicking. In: International Symposium on Intelligent Multimedia, Video, and Speech Processing. Hong Kong, pp. 145–148.
Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., Shchemelinin, V., 2017. Audio replay attack detection with deep learning frameworks. In: INTERSPEECH. San Francisco, USA, pp. 82–86.
Licklider, 1948, Effects of differentiation, integration, and infinite peak clipping upon the intelligibility of speech, J. Acoust. Soc. Am. (JASA), 20, 42, 10.1121/1.1906346
Lilly, 2008, Higher-order properties of analytic wavelets, IEEE Trans. Signal Process., 57, 146, 10.1109/TSP.2008.2007607
Lilly, 2010, On the analytic wavelet transform, IEEE Trans. Inform. Theory, 56, 4135, 10.1109/TIT.2010.2050935
Lilly, 2012, Generalized Morse wavelets as a superfamily of analytic wavelets, IEEE Trans. Signal Process., 60, 6036, 10.1109/TSP.2012.2210890
Lin, 2000, Feature extraction based on Morlet wavelet and its application for mechanical fault diagnosis, J. Sound Vib., 234, 135, 10.1006/jsvi.2000.2864
Łobos, 2002, Wavelet transforms for real-time estimation of transmission line impedance under transient conditions, Electr. Eng., 84, 63, 10.1007/s002020100104
Mallat, 1999
Mallat, 1993, Matching pursuits with time-frequency dictionaries, IEEE Trans. Signal Process., 41, 3397, 10.1109/78.258082
Mallat, 1992, Characterization of signals from multiscale edges, IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), 14, 710, 10.1109/34.142909
McAulay, 1986, Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech Signal Process., 34, 744, 10.1109/TASSP.1986.1164910
McClanahan, R.D., Stewart, B., De Leon, P.L., 2014. Performance of I-vector speaker verification and the detection of synthetic speech. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP, Florence, Italy, pp. 3779–3783.
Mochizuki, S., Shiota, S., Kiya, H., 2018. Voice liveness detection using phoneme-based pop-noise detector for speaker verification. In: Odyssey 2018 the Speaker and Language Recognition Workshop. ISCA, Les Sables d’Olonne, France, pp. 233–239.
Nishida, Y., Hori, T., Suehiro, T., Hirai, S., 2000. Monitoring of breath sound under daily environment by ceiling dome microphone. In: International Conference on Systems, Man and Cybernetics. Vol. 3. Nashville, USA, pp. 1822–1829.
Pike, 1994, Analysis of high resolution marine seismic data using the wavelet transform, 183, 10.1016/B978-0-08-052087-2.50014-1
Priyanka Gupta, Hemant A. Patil, 2022a. Effect of Speaker-Microphone Proximity on Pop Noise: Continuous Wavelet Transform-Based Approach. In: 13th International Symposium on Chinese Spoken Language Processing. ISCSLP, Singapore.
Priyanka Gupta, Hemant A. Patil, 2022b. Significance of Distance on Pop Noise for Voice Liveness Detection. In: 24th International Conference on Speech and Computer. SPECOM, Gurugram, India, pp. 226–237.
Priyanka Gupta, Piyushkumar K. Chodingala, Hemant A. Patil, 2022a. Morlet Wavelet-Based Voice Liveness Detection Using Convolutional Neural Network. In: European Signal Processing Conference. EUSIPCO, Belgrade, Serbia, 29 Aug - 02 Sep.
Priyanka Gupta, Piyushkumar K. Chodingala, Hemant A. Patil, 2022b. Morse Wavelet Features For Pop Noise Detection. In: IEEE International Conference on Signal Processing and Communication. SPCOM, IISc Bengaluru, India, pp. 1–5.
Priyanka Gupta, 2021, Voice liveness detection using bump wavelet with cnn
Qin, Y., Yu, C., Li, Z., Zhong, M., Yan, Y., Shi, Y., 2021. ProxiMic: Convenient Voice Activation via Close-to-Mic Speech Detected by a Single Microphone. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Yokohama, Japan, pp. 1–12.
Quatieri, 2006
Rosenberg, 1976, Automatic speaker verification: A review, Proc. IEEE, 64, 475, 10.1109/PROC.1976.10156
Shiota, S., Villavicencio, F., Yamagishi, J., Ono, N., Echizen, I., Matsui, T., 2015. Voice Liveness Detection algorithms based on pop noise caused by human breath for Automatic Speaker Verification. In: INTERSPEECH. Dresden, Germany, pp. 2047–2051.
Shiota Sayaka, 2016, Voice liveness detection for speaker verification based on a tandem single/double-channel pop noise detector, 259
Siddhant Gupta, Kuldeep Khoria, Ankur T. Patil, Hemant A. Patil, 2021b. Deep convolutional neural network for voice liveness detection. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. APSIPA-ASC, Tokyo, Japan.
Singh, 2021, Modified group delay function using different spectral smoothing techniques for voice liveness detection, 649
Sizov, 2015, Joint speaker verification and antispoofing in the i-vector space, IEEE Trans. Inf. Forensics Secur., 10, 821, 10.1109/TIFS.2015.2407362
Stylianou, Y., 2009. Voice transformation: A survey. In: International Conference on Acoustics, Speech, and Signal Processing. ICASSP, Taipei, Taiwan, pp. 3585–3588.
Tak, 2021, End-to-end anti-spoofing with rawnet2, 6369
Tak, H., Todisco, M., Wang, X., Jung, J.-w., Yamagishi, J., Evans, N., 2022. Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation. In: The Speaker and Language Recognition Workshop. Speaker Odyssey, Beijing, China, June 28 - July 01.
Teolis, 1998
Todisco, M., Wang, X., Vestman, V., Sahidullah, M., Delgado, H., Nautsch, A., Evans, N., Kinnunen, T., Lee, K.A., 2019. ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. In: INTERSPEECH. Graz, Austria, pp. 1008–1012.
Tu, 2005, Analysis of singularities from modulus maxima of complex wavelets, IEEE Trans. Inform. Theory, 51, 1049, 10.1109/TIT.2004.842706
Vakman, 1977, Amplitude, phase, frequency- fundamental concepts of oscillation theory, Sov. Phys. Uspekhi, 20, 1002, 10.1070/PU1977v020n12ABEH005479
Vara Prasad Naraharisetti, K., 2010. Enhancement of breathing signal using delayless subband adaptive filter with HPF. In: The 10th IEEE International Symposium on Signal Processing and Information Technology. Luxor, Egypt, pp. 177–181.
Vásquez-Correa, J.C., Orozco-Arroyave, J.R., Nöth, E., 2017. Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease. In: INTERSPEECH. Stockholm, Sweden, pp. 314–318.
Vincent, 2022
Wang, Q., Lin, X., Zhou, M., Chen, Y., Wang, C., Li, Q., Luo, X., 2019. VoicePop: A pop noise based anti-spoofing system for voice authentication on smartphones. In: IEEE Conference on Computer Communications. Paris, France, pp. 2062–2070.
Wu, 2018, A light CNN for deep face representation with noisy labels, IEEE Trans. Inf. Forensics Secur., 13, 2884, 10.1109/TIFS.2018.2833032
Yang, 1992, Auditory representations of acoustic signals, IEEE Trans. Inform. Theory, 38, 824, 10.1109/18.119739
Zen, 2009, Statistical parametric speech synthesis, Speech Commun., 51, 1039, 10.1016/j.specom.2009.04.004
Zuidwijk, 1996