Detection of replay signals using excitation source and shifted CQCC features

International Journal of Speech Technology - Tập 24 Số 2 - Trang 497-507 - 2021
Krishna Dutta1, Madhusudan Singh1, Debadatta Pati1
1Department of Electronics and Communication Engineering, National Institute of Technology Nagaland, Dimapur, India

Tóm tắt

Từ khóa


Tài liệu tham khảo

Alku, P. (1991). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11, 109–118.

Beigi, H. (2011). Speaker recognition. In: Fundamentals of Speaker Recognition (pp. 543–559). New York: Springer.

Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of IEEE, 85(9), 1437–1462.

Delgado, H., Todisco, M., Sahidullah, M., Evans, N., Kinnunen, T., Lee, K., Yamagishi, J. (2018). Asvspoof 2017 version 2.0: meta-data analysis and baseline enhancements. In Odyssey 2018 The Speaker and Language Recognition Workshop.

Drugman, T., Thomas, M., Gudnason, J., Naylor, P., & Dutoit, T. (2012). Detection of glottal closure instants from speech signals: A quantitative review. IEEE Transactions on Audio, Speech, and Language Processing, 20(3), 994–1006.

Font, R., Espın, J. M., & Cano, M. J. (2017). Experimental analysis of features for replay attack detection–results on the ASVspoof 2017 challenge. in Proc Interspeech pp. 7–11.

Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272.

Hermansky, H., & Morgan, N. (1994). Rasta processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.

Jelil, S., Das, R. K., Prasanna, S. M., & Sinha, R. (2017). Spoof detection using source, instantaneous frequency and cepstral features. In Proc Interspeech (pp. 22–26).

Kamble, M., & Patil, H. (2018). Novel variable length energy separation algorithm using instantaneous amplitude features for replay detection. Proc. Interspeech, 2018, pp. 646–650.

Kinnunen, T., Evans, N., Yamagishi, J., Lee, K. A., Sahidullah, M., Todisco, M., et al. (2017). Asvspoof 2017: Automatic speaker verification spoofing and countermeasures challenge evaluation plan. Training, 10(1508), 1508.

Lee, K. A., Larcher, A., Wang, G., Kenny, P., Brümmer, N., Leeuwen, D. v., et al. (2015). The reddots data collection for speaker recognition. In Sixteenth Annual Conference of the International Speech Communication Association.

Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In: Proc. Eur. conf. on speech communication technology, Rhodes, Greece, Vol. 4, pp. 1895–1898.

Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio Speech and Language Processing, 16(8), 1602–1613.

Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.

Nordin, F., & Eriksson, T. (2001). A speech spectrum distortion measure with interframe memory. Proc. ICASSP, 2, 717–720.

Patil, H. A., Kamble, M. R., Patel, T. B., & Soni, M. (2017). Novel variable length teager energy separation based instantaneous frequency features for replay detection. In Proc Interspeech (pp. 12–16).

Plumpe, M. D., Quatieri, T. F., & Reynolds, D. A. (1999). Modelling of glottal flow derivative waveform with application to speaker identification. IEEE Transactions on Speech and Audio Processing, 7(5), 569–586.

Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006a). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.

Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006b). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.

Prathosh, A., Ananthapadmanabha, T., & Ramakrishnan, A. (2013). Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Transactions on Audio, Speech, and Language Processing, 21(12), 2471–2480.

Reynolds, D. A. (1995). Speaker identification and verification using gaussian mixture speaker models. Speech Communication, 17, 91–108.

Sailor, H., Kamble, M., Patil, H. (2018). Auditory filterbank learning for temporal modulation features in replay spoof speech detection. In Proc. Interspeech, pp. 666–670.

Singh, M., & Pati, D. (2019). Usefulness of linear prediction residual for replay attack detection. AEU-International Journal of Electronics and Communications. https://doi.org/10.1016/j.aeue.2019.152837.

Suthokumar, G., Sethu, V., Wijenayake, C., Ambikairajah, E. (2018). Modulation dynamic features for the detection of replay attacks. Proc Interspeech pp. 691–695.

Tak, H., & Patil, H. (2018). Novel linear frequency residual cepstral features for replay attack detection. Proc. Interspeech, 2018, 726–730.

The Bosaris toolkit [software package]. Retrieved 2013 from https://sites.google.com/site/bosaristoolkit.

Thomas, M. R., Gudnason, J., & Naylor, P. A. (2012). Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 82–91.

Torres-Carrasquillo, P. A., Singer, E., Kohler, M. A., Greene, R. J., Reynolds, D. A., & Deller, Jr J.R. (2002). Approaches to language identification using gaussian mixture models and shifted delta cepstral features. In Seventh International Conference on Spoken Language Processing.

Villaba, J., Lieida, E. (2011). Preventing replay attacks on speaker verification systems. In Proc. Int. carnahan conf. on security technology (ICCST), pp. 1–8.

Wang, Z., Wei, G., He, Q.H. (2011). Channel pattern noise based playback attack detection algorithm for speaker recognition. in Proc IEEE Int conference of the biometrics special interest Group (BIOSIG) on machine learning and cybernetics pp 1708–1713.

Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015). Spoofing and counter measures for speaker verification: A survey. Speech Communication, 66, 130–153.

Zhang, W. Q., He, L., Deng, Y., Liu, J., & Johnson, M. T. (2010). Time-frequency cepstral features and heteroscedastic linear discriminant analysis for language recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 266–276.