Spectro-temporal directional derivative based automatic speech recognition for a serious game scenario

Multimedia Tools and Applications - Tập 74 - Trang 5313-5327 - 2014

Ghulam Muhammad¹, Mehedi Masud², Abdulhameed Alelaiwi³, Md. Abdur Rahman⁴, Ali Karime⁵, Atif Alamri⁶, M. Shamim Hossain³

¹Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia

²Department of Computer Science, Taif University, Taif, Saudi Arabia

³Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia

⁴Department of Computer Science, Umm Al-Qura University, Makkah, Saudi Arabia

⁵Multimedia Communications Research Laboratory, University of Ottawa, Ottawa, Canada

⁶Department of Information System, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia

Tóm tắt

Speech is one of the important modalities in a serious game platform. Serious game can be very useful for the rehabilitation of individuals with voice disorders. Therefore, we need an efficient and high-performance automatic speech recognition (ASR) system. In this paper, we propose a spectro-temporal directional derivative (STDD) feature that requires less number of computations in the modeling and yet gives high recognition accuracy in the ASR system. The proposed STDD feature is achieved by applying different directional derivative filters in the spectro-temporal domain. The feature dimension is then compressed by discrete cosine transform. The experiments are performed with voice samples of Arabic numerals spoken by persons with and without voice pathology. The experimental results show that the STDD feature outperforms the conventional mel-frequency cepstral coefficients both in clean and noisy environments.

Tài liệu tham khảo

Abe S (2005) Support vector machines for pattern classification. Springer, Berlin Abt CC (1970) Serious games. Viking Press, New York, p 9 Arias-Londoño JD, Godino-Llorente JI, Sáenz-Lechón N, Osma-Ruiz V (2010) An improved method for voice pathology detection by means of a HMM-based feature space transformation. J Pattern Recog 43(9):3100–3112 Atal BS (1974) Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and recognition. J Acoust Soc Am 54(6):1304–1312 Barab S, Thomas M, Dodge T, Carteaux R, Tuzun H (2005) Making learning fun: quest Atlantis a game without guns. Educ Technol Res Dev 53:86–107 Batliner A, Steidi S, Hacker C, Noth E (2008) Private emotions versus social interaction: a data-driven approach towards analyzing emotion in speech. User Model User-Adap Inter 18:175–206 Bergeron B (2008) Learning and retention in adaptive serious games. Stud Health Technol Inf 132:26–30 Botella C, Villa H, Garcia P, Quero S, Banos R, Alcaniz M (2004) The use of VR in the treatment of panic disorders and agoraphobia. Stud Health Technol Inf 99:73–90 Boyanov B, Hadjitodorov S (1997) Acoustic analysis of pathological voices. IEEE Eng Med Biol Mag 16:74–82 Costa SC, Aguiar Neto BG, Fechine JM (2008) Pathological voice discrimination using cepstral analysis, vector quantization and hidden Markov models. Proceedings of 8th IEEE International Conference on BioInformatics and BioEngineering, BIBE, pp. 1–5 Cowie R, Douglas-Cowie E, Tsapatsoulis N et al (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 18:32–80 Dibazar AA, Narayanan S, Berger TW (2002) Feature analysis for automatic detection of pathological speech. Proceedings of 2nd Joint Conference of EMBS/BMES, vol. 1, Houston, TX, USA Duda RO, Hart PE, Strork HG (2000) Pattern classification. Wiley-Interscience, NY Fernandez-Aranda F, Jimenez-Murcia S, Santamaria JJ et al (2012) Video games as a complementary therapy tool in mental disorders: PlayMancer, a European multicentre study. J Ment Health 21(4):364–374 Godino-Llorente JI, Gomez-Vilda P (2004) Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans Biomed Eng 51:380–384 Godino-Llorente JI, Fraile R, Saenz-Lechon N, Osma-Ruiz V, Gomez-Vilda P (2009) Automatic detection of voice impairments from text-dependent running speech. Biomed Sig Process Control 4:176–182 Hadjitodorov S, Boyanov B, Teston B (2000) Laryngeal pathology detection by means of class-specific neural maps. IEEE Trans Inf Technol Biomed 4:68–73 Marinaki M, Kotropoulos C, Pitas I, MaglaverasN (2004) Automatic detection of vocal fold paralysis and edema. Proceedings of ICSLP’04, Jeju Island, South Korea Markaki M, Stylianou Y (2011) Voice pathology detection and discrimination based on modulation spectral features. IEEE Trans Audio Speech Lang Process 19(7):1938–1948 Moore BCJ (1997) An introduction to the psychology of hearing, 4th edn. Academic Press, London Muhammad G, Melhem M (2014) Voice pathology detection and binary classification using MPEG-7 audio features. Biomed Sig Process Controls. doi:10.1016/j.bspc.2014.02.001 Muhammad G, Mesallam TA, Malki KH, Farahat M, Alsulaiman M, Bukhari M (2011) Formant analysis in dysphonic patients and automatic Arabic digit speech recognition. Biomed Eng Online 10:41 Muhammad G, Mesallam TA, Almalki KH, Farahat M, Mahmood A, Alsulaiman M (2012) Multi Directional Regression (MDR) based features for automatic voice disorder detection. J Voice Elsevier 26(6):817.e19–817.e27. doi:10.1016/j.jvoice.2012.05.002 Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs Santamaria JJ, Soto A, Fernandez-Aranda F, Krug I, Forcano L, Kalapanidas E, Gunnard K, Lam T, Raguin T, Davarakis C, Menchon JM, Jimenez-Murcia S (2011) Serious games as additional psychological support: a review of the literature. Cyberpsychol Behav Ther 4:469–476 Schuller B, Steidl S, Batliner A (2010) The Interspeech 2010 Paralinguistic Challenge. Proc. Interspeech 2010, pp. 2794–2797

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA