Simultaneous recognition of distant talking speech of multiple sound sources based on 3-D N-best search algorithm

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. - Trang 111-114

P. Heracleous¹, S. Nakamura¹, K. Shikano²

¹ATR, Spoken Language Translation Research Labs, Japan

²Nara Institute of Science and Technology, Japan

Tóm tắt

This paper deals with the simultaneous recognition of distant-talking speech of multiple talkers using the 3D N-best search algorithm. We describe the basic idea of the 3D N-best search and we address two additional techniques implemented into the baseline system. Namely, a path distance-based clustering and a likelihood normalization technique appeared to be necessary in order to build an efficient system for our purpose. In previous works we introduced the results of experiments carried out on simulated data. In this paper we introduce the results of the experiments carried out using reverberated data. The reverberated data are those simulated by the image method and recorded in a real room. The image method was used to find out the accuracy-reverberation time relationship, and the real data was used to evaluate the real performance of our algorithm. The obtained Top 3 results of the simultaneous word accuracy was 73.02% under 162 ms reverberation time and using the image method.

Từ khóa

#Speech recognition #Hidden Markov models #Viterbi algorithm #Search methods #Natural languages #Clustering algorithms #Reverberation #Feature extraction #Adaptive systems #Sorting

Tài liệu tham khảo

heracleous, 1999, Simultaneous Recognition of Multiple Sound Sources based on 3 - D N-best Search, Proc of Acoustical Society of Japan, 91 10.1109/ICASSP.1998.674413 10.1121/1.382599 heracleous, 2001, Multiple Sound Sources Recognition by a Microphone Array-based 3-D N-best Search with Likelihood Normalization, Proc International Workshop on Hands-Free Speech Communication, 103 10.1016/0167-6393(95)00011-C heracleous, 2000, A technique for likelihood normalization in the 3-D N-best search for simultaneous recognition of multiple sound sources, Proc of Acoustical Society of Japan, 117 10.1109/ICSLP.1996.607855 10.1109/ICASSP.1996.543272

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA