Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

International Journal of Speech Technology - Tập 25 - Trang 679-687 - 2022

Samia Abd El-Moneim^1,2, Eman Abd El-Mordy¹, M. A. Nassar¹, Moawad I. Dessouky¹, Nabil A. Ismail³, Adel S. El-Fishawy¹, Sami El-Dolil¹, Ibrahim M. El-Dokany¹, Fathi E. Abd El-Samie^1,4

¹Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt

²High Institute of Engineering and Technology, Tanta, Egypt

³Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt, Menouf, Egypt

⁴Department of Information Technology, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia

Tóm tắt

Automatic Speaker Recognition (ASR) in mismatched conditions is a challenging task, since robust feature extraction and classification techniques are required. Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) is an efficient network that can learn to recognize speakers, text-independently, when the recording circumstances are similar. Unfortunately, when the recording circumstances differ, its performance degrades. In this paper, Radon projection of the spectrograms of speech signals is implemented to get the features, since Radon Transform (RT) has less sensitivity to noise and reverberation conditions. The Radon projection is implemented on the spectrograms of speech signals, and then 2-D Discrete Cosine Transform (DCT) is computed. This technique improves the system recognition accuracy, text-independently with less sensitivity to noise and reverberation effects. The ASR system performance with the proposed features is compared to that of the system that depends on Mel Frequency Cepstral Coefficients (MFCCs) and spectrum features. For noisy utterances at 25 dB, the recognition rate with the proposed feature reaches 80%, while it is 27% and 28% with MFCCs and spectrum, respectively. For reverberant speech, the recognition rate reaches 80.67% with the proposed features, while it reaches 54% and 62.67% with the MFCCs and spectrum, respectively.

Tài liệu tham khảo

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA