Speaker recognition with global information modelling of raw waveforms

Journal of Membrane Computing - 2024

Yujiao Wu¹, Jianping Dong², Zulin Fang¹, Gexiang Zhang¹, Haina Rong

¹College of Computer Science and Cyber Security, Chengdu University of Technology, Chengdu, 610059, China

²School of Nuclear Technology and Automation Engineering, Chengdu University of Technology, Chengdu, 610059, China

Tóm tắt

Từ khóa

Tài liệu tham khảo

Kabi, M., Mridha, M. F., Shin, J., Jahan, I., & Ohi, A. Q. (2021). A survey of speaker recognition: Fundamental theories, recognition methods and opportunities. IEEE Access., 9, 79236–79263.

Fechner, G.T. (1948). Elements of psychophysics. IEEE Access. 1860.

Snyder, D., Garcia-Romero, D., Sell, G., Povey, D. & Khudanpur, S. (2018). X-vectors: Robust dnn embeddings for speaker recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP)., 5329–5333.

Desplanques, B., Thienpondt, J. & Demuynck, K. (2020). Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. arXiv., 2005–07143.

Chung, J.S., Nagrani, A. & Zisserman, A. (2020). Voxceleb2: Deep speaker recognition. arXiv., 2005–07143.

Cai, W., Chen, J., Zhang, J., & Li, M. (2020). On-the-fly data loader and utterance-level aggregation for speaker and language recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing., 28, 1038–1051.

Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 19(4), 788–798.

Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems, 33, 12449–12460.

Sainath, T., Weiss, R.J., Wilson, K., Senior, A.W. & Vinyals, O. (2015). Learning the speech front-end with raw waveform cldnns. Advances in neural information processing systems.

Ravanelli, M. & Bengio, Y. (2018). Speaker recognition from raw waveform with sincnet. In 2018 IEEE spoken language technology workshop (SLT)., 1021–1028.

Oglic, D., Cvetkovic, Z., Bell, P. & Renals, S. (2020). A deep 2d convolutional network for waveform-based speech recognition. Interspeech., 1654–1658.

Pariente, M., Cornell, S., Deleforge, A. & Vincent, E. (2020). Filterbank design for end-to-end speech separation. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)., 6364–6368.

Jung, J.W., Heo, H.S., Yang, I.H., Shim, H.J. & Yu, H.J. (2018). A complete end-to-end speaker verification system using deep neural networks: From raw signals to verification result. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)., 5349–5353.

Muckenhirn, H., Doss, M.M. & Marcell, S. (2018). Towards directly modeling raw speech signal for speaker verification using cnns. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)., 4884–4888.

Zhu, G., Jiang, F. & Duan, Z. (2020). Y-vector: Multiscale waveform encoder for speaker embedding. arXiv, 2010–12951

Jung, J.W., Kim, Y.J., Heo, H.S., Lee, B.J., Kwon, Y. & Chung, J.S. (2022). Pushing the limits of raw waveform speaker recognition. arXiv, 2203–08488.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N. & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021). Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, 35(12), 11106–11115.

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z. & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, 10012–10022.

Wang, R., Ao, J., Zhou, L., Liu, S., Wei, Z., Ko, T. & Zhang, Y. (2022). Multi-view self-attention based transformer for speaker recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6732–6736.

Noé, P.G., Parcollet, T. & Morchid, M. (2020). Cgcnn: Complex gabor convolutional neural network on raw speech. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7724–7728.

Andén, J., & Mallat, S. (2014). Deep scattering spectrum. In IEEE Transactions on Signal Processing., 62(16), 4114–4128.

Balestriero, R., Cosentino, R., Glotin, H. & Baraniuk, R. (2018). Spline filters for end-to-end deep learning. In International conference on machine learning.PMRL., 364–373.

Jung, J.W., Heo, H.S., Kim, J.H., Shim, H.J. & Yu, H.J. (2019). Rawnet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification. arXiv., 1904–08104.

Ba, J.L., Kiros, J.R. & Hinton, G.E. (2016). Layer normalization. arXiv., 1607–06450.

Kingma, D.P. & Ba, J. (2014). Adam: A method for stochastic optimization. In 2018 IEEE spoken language technology workshop (SLT), 1412–6980.

Hoffer, E., Ben-Nun, T., Hubara, I., Giladi, N., Hoefler, T. & Soudry, D. (2020). Augment your batch: Improving generalization through instance repetition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8129–8138.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA