Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Phát hiện điểm thay đổi phổ theo phương pháp Bayesian trực tuyến: một cách tiếp cận tính toán mềm cho nhận dạng tiếng nói trực tuyến

International Journal of Speech Technology - Tập 15 Số 1 - Trang 5-23 - 2012

Chowdhury, M. F. R.¹, Selouani, S.-A.², O’Shaughnessy, D.¹

¹INRS-EMT, Université du Québec, Montréal, Canada

²Université de Moncton, Moncton, Canada

Tóm tắt

Nhận dạng tiếng nói tự động (ASR) hiện tại hoạt động theo chế độ ngoại tuyến và cần có kiến thức trước về các điều kiện thử nghiệm tĩnh hoặc gần tĩnh để đạt được độ chính xác mong đợi trong việc nhận diện từ. Các yêu cầu này giới hạn khả năng ứng dụng của ASR trong các ứng dụng thực tế, nơi mà các điều kiện thử nghiệm rất không ổn định và không được biết trước. Bài báo này trình bày một kỹ thuật thích ứng nhanh động khung và bù trừ tiếng ồn sáng tạo để theo dõi các tiếng ồn không ổn định cao và ứng dụng của nó cho ASR trực tuyến. Thuật toán đề xuất dựa trên một mô hình tính toán mềm sử dụng suy diễn Bayesian trực tuyến để phát hiện điểm thay đổi phổ (BOSCPD) trong các tiếng ồn không ổn định chưa biết. BOSCPD đã được thử nghiệm với kỹ thuật theo dõi tiếng ồn MCRA cho việc học thay đổi môi trường nhanh trực tuyến trong các kịch bản tiếng ồn không ổn định khác nhau. Kết quả thử nghiệm cho thấy rằng kỹ thuật BOSCPD đề xuất làm giảm đáng kể độ trễ trong việc phát hiện điểm thay đổi phổ so với MCRA cơ bản và các biến thể của nó. Mô hình tính toán mềm BOSCPD được thử nghiệm cho việc bù trừ biến dạng bổ sung và kênh (JAC)-dựa trên ASR trực tuyến trong các điều kiện thử nghiệm chưa biết, sử dụng các mẫu tiếng nói ồn ào không ổn định từ cơ sở dữ liệu tiếng nói Aurora 2. Kết quả mô phỏng cho AR trực tuyến cho thấy sự cải thiện đáng kể về độ chính xác nhận diện so với việc nhận diện tiếng nói phân phối (DSR) Aurora 2 cơ bản trong chế độ theo lô.

Từ khóa

Tài liệu tham khảo

citation_title=Acoustical and environmental robustness in automatic speech recognition; citation_publication_date=1993; citation_id=CR1; citation_author=A. Acero; citation_publisher=Kluwer Academic Adams, R. P., & Mackay, D. J. C. (2007). Bayesian online changepoint detection. University of Cambridge Technical Report. arXiv:0710.3742v1 [stat.ML]. citation_journal_title=IEEE Transactions on Speech and Audio Processing; citation_title=A general joint additive and convolutive bias compensation approach applied to noisy lombard speech recognition; citation_author=M. Afify, Y. Gong, J.-P. Haton; citation_volume=6; citation_issue=6; citation_publication_date=1998; citation_pages=524-538; citation_doi=10.1109/89.725319; citation_id=CR3 citation_journal_title=IEEE Transactions on Audio, Speech, and Language Processing; citation_title=Environmental sniffing: noise knowledge estimation for robust speech systems; citation_author=M. Akbacak, J. H. L. Hansen; citation_volume=15; citation_issue=2; citation_publication_date=2007; citation_pages=465-477; citation_doi=10.1109/TASL.2006.881694; citation_id=CR4 citation_journal_title=Computer Speech & Language; citation_title=On-line stochastic matching compensation for non-stationary noise; citation_author=V. Barreaud, I. Illina, D. Fohr; citation_volume=22; citation_issue=3; citation_publication_date=2008; citation_pages=207-229; citation_doi=10.1016/j.csl.2007.07.004; citation_id=CR5 citation_title=Enhancement of speech corrupted by acoustic noise; citation_inbook_title=Proc. IEEE int. conf. acoustics, speech, signal proc; citation_publication_date=1979; citation_pages=208-211; citation_id=CR6; citation_author=M. Berouti; citation_author=M. Schwartz; citation_author=J. Makhoul citation_title=A study on bias-based speech signal conditioning techniques for improving the robustness of automatic speech recognition; citation_inbook_title=Proc. of IEEE Canadian conference on electrical and computer engineering (CCECE); citation_publication_date=2009; citation_pages=664-669; citation_id=CR7; citation_author=M. F. R. Chowdhury; citation_author=S.-A. Selouani; citation_author=D. O’Shaughnessy citation_title=Real-time Bayesian inference: a soft computing approach to environmental learning for on-line robust automatic speech recognition; citation_inbook_title=Proc. of 6th international conference on soft computing models in industrial and environmental applications SOCO 2011; citation_publication_date=2011; citation_pages=445-452; citation_id=CR8; citation_author=M. F. R. Chowdhury; citation_author=S.-A. Selouani; citation_author=D. O’Shaughnessy citation_title=A rapid adaptation algorithm for tracking highly non-stationary noises based on Bayesian inference for on-line spectral change point detection; citation_inbook_title=Proc. of INTERSPEECH 2011; citation_publication_date=2011; citation_id=CR9; citation_author=M. F. R. Chowdhury; citation_author=S.-A. Selouani; citation_author=D. O’Shaughnessy citation_journal_title=IEEE Transactions on Speech and Audio Processing; citation_title=Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging; citation_author=I. Cohen; citation_volume=11; citation_issue=5; citation_publication_date=2003; citation_pages=466-475; citation_doi=10.1109/TSA.2003.811544; citation_id=CR10 citation_journal_title=IEEE Signal Processing Letters; citation_title=Noise estimation by minima controlled recursive averaging for robust speech enhancement; citation_author=I. Cohen, B. Berdugo; citation_volume=9; citation_issue=1; citation_publication_date=2002; citation_pages=12-15; citation_doi=10.1109/97.988717; citation_id=CR11 citation_title=Speech processing in modern communication: challenges and perspectives; citation_publication_date=2010; citation_id=CR12; citation_publisher=Springer ETSI (2000). Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithm, ETSI ES 201 108, v1.1.1 (2000-02). citation_title=Speech noise estimation using enhanced minima controlled recursive averaging; citation_inbook_title=Proc. IEEE int. conf. acoustics, speech, signal proc.; citation_publication_date=2007; citation_pages=581-584; citation_id=CR14; citation_author=N. Fan; citation_author=J. Rosca; citation_author=R. Balan Gales, M. J. L. (1995). Model-based techniques for noise robust speech recognition. Ph.D. Thesis, University of Cambridge, UK. citation_title=Noise estimation techniques for robust speech recognition; citation_inbook_title=Proc. IEEE int. conf. acoustics, speech, signal proc.; citation_publication_date=1995; citation_pages=153-156; citation_id=CR16; citation_author=H. Hirsch; citation_author=C. Ehrlicher citation_title=The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions; citation_inbook_title=Proceedings of ISCA ITRW ASR2000 automatic speech recognition: challenges for the next millennium; citation_publication_date=2000; citation_pages=181-188; citation_id=CR17; citation_author=H.-G. Hirsch; citation_author=D. Pearce citation_title=Spoken language processing—a guide to theory, algorithm, and system development; citation_publication_date=2001; citation_id=CR18; citation_author=X. Huang; citation_author=A. Acero; citation_author=H. W. Hon; citation_publisher=Prentice Hall ITU-T Recommendation G.712 (1996). Transmission performance characteristics of pulse code modulation channels, ITU-T, November 1996. citation_journal_title=Computer Speech & Language; citation_title=Integrated bias removal techniques for robust speech recognition; citation_author=C. Lawrence, M. Rahim; citation_volume=13; citation_publication_date=1999; citation_pages=283-298; citation_doi=10.1006/csla.1999.0125; citation_id=CR20 citation_title=A database for speaker-independent digit recognition; citation_inbook_title=Proc. IEEE int. conf. acoustics, speech, signal proc.; citation_publication_date=1984; citation_pages=328-331; citation_id=CR21; citation_author=R. G. Leonard citation_journal_title=Computer Speech & Language; citation_title=A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions; citation_author=J. Li, L. Deng, D. Yu, Y. Gong, A. Acero; citation_volume=23; citation_publication_date=2009; citation_pages=389-405; citation_doi=10.1016/j.csl.2009.02.001; citation_id=CR22 citation_title=Speech enhancement: theory and practice; citation_publication_date=2007; citation_id=CR23; citation_author=P. C. Loizou; citation_publisher=CRC Press citation_journal_title=Computer Speech & Language; citation_title=Joint evaluation of multiple speech patterns for speech recognition and training; citation_author=N. U. Nair, T. V. Sreenivas; citation_volume=24; citation_publication_date=2010; citation_pages=307-340; citation_doi=10.1016/j.csl.2009.05.001; citation_id=CR24 citation_title=Speech communications: human and machine; citation_publication_date=1999; citation_id=CR25; citation_author=D. O’Shaughnessy; citation_publisher=Wiley-IEEE Press citation_journal_title=Speech Communication; citation_title=Compensation of channel and noise distortions combining normalization and speech enhancement techniques; citation_author=X. Menéndez-Pidal, R. Chen, D. Wu, M. Tanaka; citation_volume=34; citation_publication_date=2001; citation_pages=115-126; citation_doi=10.1016/S0167-6393(00)00049-2; citation_id=CR26 citation_title=Fundamentals of speech recognition; citation_publication_date=1993; citation_id=CR27; citation_author=L. Rabiner; citation_author=B. H. Juang; citation_publisher=Prentice Hall Rangachari, S. (2004). Noise estimation algorithms for highly non-stationary environments. Ph.D Thesis, University of Texas at Dallas, USA. citation_journal_title=Speech Communication; citation_title=A noise estimation algorithm for highly nonstationary environments; citation_author=S. Rangachari, P. C. Loizou; citation_volume=48; citation_publication_date=2006; citation_pages=220-231; citation_doi=10.1016/j.specom.2005.08.005; citation_id=CR29 citation_title=Bayesian change point detection for satellite fault prediction; citation_inbook_title=Proceedings of interdisciplinary graduate conference (IGC); citation_publication_date=2010; citation_pages=213-221; citation_id=CR30; citation_author=R. Turner citation_title=A unified compensation approach for speech recognition in severely adverse environment; citation_inbook_title=Fourth international symposium on uncertainty modeling and analysis (ISUMA 2003); citation_publication_date=2003; citation_pages=256-261; citation_id=CR31; citation_author=B. Tian; citation_author=M. Sun; citation_author=R. J. Sclabassi; citation_author=K. Yi citation_title=ATK real-time API for HTK, ver. 1.6; citation_publication_date=2007; citation_id=CR32; citation_author=S. Young; citation_publisher=Cambridge University Engineering Department citation_title=HTK BOOK ver 3.4; citation_publication_date=2009; citation_id=CR33; citation_author=S. Young; citation_publisher=Machine Intelligence Laboratory, University of Cambridge

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA