Detection of replay signals using excitation source and shifted CQCC features

International Journal of Speech Technology - Tập 24 Số 2 - Trang 497-507 - 2021

Krishna Dutta¹, Madhusudan Singh¹, Debadatta Pati¹

¹Department of Electronics and Communication Engineering, National Institute of Technology Nagaland, Dimapur, India

Tóm tắt

Từ khóa

Tài liệu tham khảo

Alku, P. (1991). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11, 109–118.

Beigi, H. (2011). Speaker recognition. In: Fundamentals of Speaker Recognition (pp. 543–559). New York: Springer.

Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of IEEE, 85(9), 1437–1462.

Delgado, H., Todisco, M., Sahidullah, M., Evans, N., Kinnunen, T., Lee, K., Yamagishi, J. (2018). Asvspoof 2017 version 2.0: meta-data analysis and baseline enhancements. In Odyssey 2018 The Speaker and Language Recognition Workshop.

Drugman, T., Thomas, M., Gudnason, J., Naylor, P., & Dutoit, T. (2012). Detection of glottal closure instants from speech signals: A quantitative review. IEEE Transactions on Audio, Speech, and Language Processing, 20(3), 994–1006.

Font, R., Espın, J. M., & Cano, M. J. (2017). Experimental analysis of features for replay attack detection–results on the ASVspoof 2017 challenge. in Proc Interspeech pp. 7–11.

Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272.

Hermansky, H., & Morgan, N. (1994). Rasta processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.

Jelil, S., Das, R. K., Prasanna, S. M., & Sinha, R. (2017). Spoof detection using source, instantaneous frequency and cepstral features. In Proc Interspeech (pp. 22–26).

Kamble, M., & Patil, H. (2018). Novel variable length energy separation algorithm using instantaneous amplitude features for replay detection. Proc. Interspeech, 2018, pp. 646–650.

Kinnunen, T., Evans, N., Yamagishi, J., Lee, K. A., Sahidullah, M., Todisco, M., et al. (2017). Asvspoof 2017: Automatic speaker verification spoofing and countermeasures challenge evaluation plan. Training, 10(1508), 1508.

Lee, K. A., Larcher, A., Wang, G., Kenny, P., Brümmer, N., Leeuwen, D. v., et al. (2015). The reddots data collection for speaker recognition. In Sixteenth Annual Conference of the International Speech Communication Association.

Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In: Proc. Eur. conf. on speech communication technology, Rhodes, Greece, Vol. 4, pp. 1895–1898.

Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio Speech and Language Processing, 16(8), 1602–1613.

Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.

Nordin, F., & Eriksson, T. (2001). A speech spectrum distortion measure with interframe memory. Proc. ICASSP, 2, 717–720.

Patil, H. A., Kamble, M. R., Patel, T. B., & Soni, M. (2017). Novel variable length teager energy separation based instantaneous frequency features for replay detection. In Proc Interspeech (pp. 12–16).

Plumpe, M. D., Quatieri, T. F., & Reynolds, D. A. (1999). Modelling of glottal flow derivative waveform with application to speaker identification. IEEE Transactions on Speech and Audio Processing, 7(5), 569–586.

Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006a). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.

Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006b). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.

Prathosh, A., Ananthapadmanabha, T., & Ramakrishnan, A. (2013). Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Transactions on Audio, Speech, and Language Processing, 21(12), 2471–2480.

Reynolds, D. A. (1995). Speaker identification and verification using gaussian mixture speaker models. Speech Communication, 17, 91–108.

Sailor, H., Kamble, M., Patil, H. (2018). Auditory filterbank learning for temporal modulation features in replay spoof speech detection. In Proc. Interspeech, pp. 666–670.

Singh, M., & Pati, D. (2019). Usefulness of linear prediction residual for replay attack detection. AEU-International Journal of Electronics and Communications. https://doi.org/10.1016/j.aeue.2019.152837.

Suthokumar, G., Sethu, V., Wijenayake, C., Ambikairajah, E. (2018). Modulation dynamic features for the detection of replay attacks. Proc Interspeech pp. 691–695.

Tak, H., & Patil, H. (2018). Novel linear frequency residual cepstral features for replay attack detection. Proc. Interspeech, 2018, 726–730.

The Bosaris toolkit [software package]. Retrieved 2013 from https://sites.google.com/site/bosaristoolkit.

Thomas, M. R., Gudnason, J., & Naylor, P. A. (2012). Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 82–91.

Torres-Carrasquillo, P. A., Singer, E., Kohler, M. A., Greene, R. J., Reynolds, D. A., & Deller, Jr J.R. (2002). Approaches to language identification using gaussian mixture models and shifted delta cepstral features. In Seventh International Conference on Spoken Language Processing.

Villaba, J., Lieida, E. (2011). Preventing replay attacks on speaker verification systems. In Proc. Int. carnahan conf. on security technology (ICCST), pp. 1–8.

Wang, Z., Wei, G., He, Q.H. (2011). Channel pattern noise based playback attack detection algorithm for speaker recognition. in Proc IEEE Int conference of the biometrics special interest Group (BIOSIG) on machine learning and cybernetics pp 1708–1713.

Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015). Spoofing and counter measures for speaker verification: A survey. Speech Communication, 66, 130–153.

Zhang, W. Q., He, L., Deng, Y., Liu, J., & Johnson, M. T. (2010). Time-frequency cepstral features and heteroscedastic linear discriminant analysis for language recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 266–276.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]