Vocal emotion recognition in five native languages of Assam using new wavelet features

International Journal of Speech Technology - 2009

Aditya Bihar Kandali¹, Aurobinda Routray¹, T. K. Basu²

¹[Department of Electrical Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India]

²Aliah University, Kolkota, India

Tóm tắt

Từ khóa

Tài liệu tham khảo

Bahoura, M., & Rouat, J. (2006). Wavelet speech enhancement based on time-scale adaptation. Speech Communication, 48, 1620–1637.

Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636.

Borden, G. J., Harris, K. S., & Raphael, L. J. (1994). Speech science primer: Physiology, acoustics and perception of speech (3rd ed.). Baltimore: Williams and Wilkins.

Boruah, B. K. (2003). Asamar Bhasa. Dibrugarh: Banalata.

Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.

Darwin, C. (1872/1965). The expression of the emotions in man and animals. Chicago: Chicago University Press.

Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Audio Speech and Signal Processing, 28(4), 357–365.

Ekman, P. (1992). An argument for basic emotion. Cognition & Emotion, 6, 169–200.

Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion. London: Wiley. Chap. 3.

Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters, 8(7), 196–198.

Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). New York: Morgan Kaufmann, Academic Press.

Furui, S. (1989). Digital speech processing, synthesis and recognition. New York: Dekker.

Goswami, G. C., & Tamuli, J. (2003). Asamiya. In G. Cardona & D. Jain (Eds.), Routledge language family series : Vol. 2. The Indo-Aryan languages (pp. 391–404). London: Routledge.

Hammond, K. R., & Stewart, T. R. (2001). The essential Brunswik—beginnings, explications and applications. Oxford: Oxford University Press.

Holmes, J., & Holmes, W. (2001). Speech synthesis and recognition (2nd ed.). New York: Taylor & Francis.

Hui, G., Shanguang, C., & Guangchuan, S. (2007). Emotion classification of Mandarin speech based on TEO nonlinear features. In Proc. IEEE 8th ACIS int. conf. SNPD (Vol. 3, pp. 394–398).

Jacquesson, F. (2008). A Dimasa grammar. Internet.

Jurafsky, D., & Martin, J. H. (2000). Speech and language processing. Englewood Cliffs: Prentice-Hall.

Juslin, P. N., & Lauka, P. (2003). Communication of emotions in vocal expression and music performance. Psychological Bulletin, 129(5), 770–814.

Kaiser, J. F. (1990a). On a simple algorithm to calculate the ‘energy’ of a signal. In Proc. IEEE int. conf. acoustics. speech. and signal processing (Vol. 1, pp. 381–384), Albuquerque, NM.

Kaiser, J. F. (1990b). On Teager’s energy algorithm and its generalization to continuous signals. In Proc. 4th IEEE digital signal processing workshop, Mohonk (New Paltz), NY.

Kakati, B. (1995). Assamese, its formation and development. Guwahati: LBS Publications.

Kandali, A. B., Routray, A., & Basu, T. K. (2008a). Emotion recognition from speeches of some native languages of Assam independent of text and speaker. National Seminar on Devices, Circuits and Communication, Department of E.C.E., B.I.T. Mesra, Ranchi, Jharkhand, India, 6–7 Nov.

Kandali, A. B., Routray, A., & Basu, T. K. (2008b). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In Proc. IEEE region 10 conference TENCON 2008, 19–21 Nov., Hyderabad, India (pp. 1–5).

LaPolla, R. J., & Thurgood, G. (Eds.) (2002). Routledge language family series. The Sino-Tibetan languages. London: Routledge.

Laukka, P. (2004). Vocal expression of emotion—discrete-emotion and dimensional accounts. Comprehensive Summaries of Uppsala Dissertations from the Faculty of Social Sciences 141, ACTA Universitatis Upsaliensis, Uppsala. Experiments.

Lazarus, R. S. (1991). Emotion & adaptation. New York: Oxford University Press.

Linde, Y., Buzo, A., & Gray, R. M. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 84–95.

Mallat, S. (2006). A wavelet tour of signal processing (2nd ed.). New Delhi: Academic Press, Elsevier.

Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. The Journal of the Acoustic Society of America, 93(2), 1097–1108.

Murray, I. R., & Arnott, J. L. (1995). Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Communication, 16, 369–390.

New, T. L., Foo, S. W., & Silva, L. C. D. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.

Oatley, K., & Johnson-Laird, P. N. (1987). Towards a cognitive theory of emotions. Cognition and Emotion, 1, 29–50.

Pathak, R. (2008). Asomiya Bhasar Itihas. Guwahati: Ashok Book Stall.

Patil, H. A., Dutta, P. K., & Basu, T. K. (2006). The wavelet packet based cepstral features for open set speaker classification in Marathi. In M. Spiliopoulou et al. (Eds.), Studies in classification, data analysis, and knowledge organization (pp. 134–141). Berlin: Springer.

Picard, R. W. (1997). Affective computing. Cambridge: MIT Press.

Plutchik, R. (1994). The psychology and biology of emotion. New York: Harper Collins.

Power, M., & Dalgleish, T. (2008). Cognition and emotion—from order to disorder. Hove: Psychology Press.

Quatieri, T. F. (2002). Discrete time speech signal processing. Upper Saddle River: Prentice-Hall.

Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.

Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. IEEE Transactions on Audio, Speech and Language Processing, 14(3), 737–746.

Razak, A. A., Isa, A. H. M., & Komiya, R. (2004). A neural network approach for emotion recognition in speech. In Proc. 2nd int. conf. art. intell. in engineering & technology, Kota Kinabalu, Sabah, Malaysia.

Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Audio, Speech and Language Processing, 3(1), 72–83.

Rose, P. (2002). Forensic speaker identification. New York: Taylor & Francis, 302.

Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39, 1161–1178.

Sarikiya, R., Pellom, B. L., & Hansen, J. H. L. (1998). Wavelet packet transform features with application to speaker identification. In Proc. IEEE nordic signal processing symposium (pp. 81–84).

Scherer, K. R. (1986). Vocal affect expression: a review and a model for future research. Psychological Bulletin, 99(2), 143–165.

Scherer, K. R. (2003). Vocal communication of emotion: a review of research paradigms. Speech Communication, 40, 227–256.

Scherer, K. R., Banse, R., & Wallbott, H. G. (2001). Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology, 32(1), 76–92.

Scherer, K. R., Johnstone, T., & Klasmeyer, G. (2003). Vocal expression of emotion. In R. J. Davidson, K. R. Scherer, & H. H. Goldsmith (Eds.), Handbook of affective science (1st ed.). Oxford: Oxford University Press. Part IV, Chap. 23.

Singha, D. (2003). The phonology & morphology of Dimasa. M.A. Dissertation, Assam University, Silchar, Assam, India.

Singha, D. (2008). An introduction to Dimasa phonology. New Delhi: Saujanya Books.

Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: resources, features, and methods. Speech Communication, 48, 1162–1181.

Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In ICASSP, 2004 (pp. I-593–I-596).

Vogt, T., & Andre, E. (2005). Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In Proc. IEEE.

Wang, Y., & Guan, L. (2004). An investigation of speech-based human emotion recognition. In IEEE 6th workshop on multimedia signal processing (pp. 15–18).

Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: some acoustical correlates. Journal of Acoustic Society of America, 52, 1238–1250.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]