A corpus-based speech synthesis system with emotion

Speech Communication - Tập 40 - Trang 161-187 - 2003
Akemi Iida1,2, Nick Campbell3,2, Fumito Higuchi4, Michiaki Yasumura4
1Keio Research Institute at SFC, Keio University, 5322, Endo, Fujisawa-city, Kanagawa, 252-8520, Japan
2JST (Japan Science and Technology), CREST, Kyoto, Japan
3ATR Human Information Sciences Research Laboratories, Kyoto, Japan
4Graduate School of Media & Governance, Keio University, Kanagawa, Japan

Tài liệu tham khảo

Abe, M., Sagisaka, Y., Umeda, T., Kuwabara, H., 1990. ATR Technical Report TR-I-0166 Speech Database User’s Manual. ATR Interpreting Telephony Research Lab Banse, 1996, Acoustic profiles in vocal emotion expression, Journal of Personality and Social Psychology, 70, 614, 10.1037/0022-3514.70.3.614 Black, A., Hunt, A., 1996. Generating F0 contours from ToBI labels using linear regression. In: Proceedings of ICSLP 96, Philadelphia, USA, Vol. 3, pp. 1385–1388 Bunnell, H.T., Hoskins, S.R., 1998. Prosodic vs segmental contributions to naturalness in a diphone synthesizer. In: Proceedings of ICSLP 98, Sydney, Australia, Vol. 5, pp. 1723–1726 Cahn, J.E., 1989. Generation of affect in synthesized speech. In: Proceedings of the 1989 Conference of the American Voice I/O Society, pp. 251–256 Campbell, W.N., 1996. Autolabelling Japanese TOBI. In: Proceedings of ICSLP 96, Philadelphia, USA, pp. 2399–2402 Campbell, W.N., 1997a. Processing a speech corpus for CHATR synthesis. In: Proceedings of International Conference on Speech Processing (ICSP ’97), Seoul, Korea, pp. 183–186 Campbell, 1997, Synthesizing spontaneous speech, 165 Campbell, 1997, Prosody and the selection of source units for concatenative synthesis, 279 Carlson, R., Granstrom, G., Nord, L., 1992. Experiments with emotive speech, acted utterances and synthesized replicas. In: Proceedings of ICSLP 92, Banff, Canada. Vol. 2, pp. 671–674 Cowie, 2001, Emotion recognition in human–computer interaction, IEEE Signal Processing Magazine, 18, 32, 10.1109/79.911197 Entropic Research Laboratory, Inc., 1996. ESPS Programs A–L Davitz, 1964, A review of research concerned with facial and vocal expressions of emotion, 13 Davitz, 1964, Auditory correlates of vocal expressions of emotional meanings, 101 Fairbanks, 1939, An experimental study of the pitch characteristics of the voice during the expression of emotion, Speech Monographs, 6, 87, 10.1080/03637753909374863 Guerrero, 1998, Communication and emotion: basic concepts and approaches, 3 Ichikawa, A., Nakayama, T., Nakata, K., 1967. Experimental consideration on the naturalness of the synthesized speech. In: Proceedings of Acoustic Society of Japan Fall Meeting, pp. 95–96 (in Japanese) Iida, A., Iga, S., Higuchi, F., Campbell, N., Yasumura, M., 1998. Acoustic nature and perceptual testing of a corpus of emotional speech. In: Proceedings of ICSLP 98, Sydney, Australia, Vol. 4, pp. 1559–1592 Iida, A., Iga, S., Higuchi, F., Campbell, N., Yasumura, M., 2000. A speech synthesis system with emotion for assisting communication. In: Proceedings of ISCA Workshop on Speech and Emotion, Belfast, UK, pp. 167–172 Ito, 1986, A basic study on voice sound involving emotion (III), Ergonomics, 22, 211, 10.5100/jje.22.211 JEIDA, 2000. The guideline of speech synthesis system performance evaluation methods. Japan Electronic Industry Development Association Kamimura, K., 1990. Ashitawo tsukuru – Sekizui sonshoushano seikatsuno kiroku. Miwashoten, Tokyo (in Japanese) Katae, N., Kimura, S., 2000. An effect of voice quality and prosody control in emotional speech synthesis. In: Proceedings of Acoustic Society of Japan Fall Meeting, pp. 187–188 (in Japanese) Keating, 1984, Vowel variation in Japanese, Phonetica, 41, 191, 10.1159/000261726 Kitahara, 1992, Prosodic control to express emotions for man-machine speech interaction, Institute of Electronics, Information, and Communication Engineers (IEICE) Transactions of Fundamentals of Electronics, E75-A, 155 Kitahara, 1987, Role of prosody in cognitive process of spoken language, Journal of Interaction, Institute of Electronics, Information, and Communication Engineers (IEICE) D, J70-D, 2095 Makino, 1989, A method of vowel recognition in connected speech using the mutual relation of vowels, Institute of Electronics Information and Communication Engineers (IEICE) D-II, J72-DII, 837 Mokhtari, P., Iida, A., Campbell, N., 2001. Some articulatory correlates of emotion variability in speech: a preliminary study on spoken Japanese vowels. In: Proceedings of the International Conference on Speech Processing (ICSP’01), Taejon, Korea, pp. 431–436 Mozziconacci, S.J.L., 1998. Speech variability and emotion: production and perception. Ph.D. thesis, Technical University Eindhoven Murray, 1993, Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion, Journal of Acoustic Society of America, 93, 1097, 10.1121/1.405558 Murray, 1995, Implementation and testing of a system for producing emotion-by-rule in synthesized speech, Speech Communication, 16, 369, 10.1016/0167-6393(95)00005-9 Murray, I.R., Arnott, J.L., Alm, N., Newell, A.F., 1991. A communication system for the disabled with emotional synthesized speech produced by rule. In: Proceedings of Eurospeech ’91, Genova, Italy, pp. 311–314 Murray, I.R., Edgington, M.D., Campion, D., Lynn, J., 2000. Rule-based emotion synthesis using concatenated speech. In: Proceedings of ISCA Workshop on Speech and Emotion, Belfast, UK, pp. 173–177 Nagae, Y., 1998. Onsei ni fukumareru washa no kanjo no bunseki to ninshiki ni kansuru kenkyuu. Bachelor’s thesis submitted to Utsuyomiya University (in Japanese) Ohira, Y., 1995. Watashirashiku, ningenrashiku, Autobiography (in Japanese) Russell, J.A., 1989. Measures of emotion. In: Plutchik, R., Kellerman, (Eds.), Emotion Theory, Research, and Experience, Academic Press, NY, Vol. 4, pp. 83–111 Scherer, 1986, Vocal affect expression: a review and a model for future research, Psychological Bulletin, 99, 143, 10.1037/0033-2909.99.2.143 Scherer, 1991, Vocal cues in emotion encoding and decoding, Motivation and Emotion, 15, 123, 10.1007/BF00995674 Shaver, 1987, Emotion knowledge: Further exploration of a prototype approach, Journal of Personality and Social Psychology, 52, 1061, 10.1037/0022-3514.52.6.1061 Takeda, S., Ishizuka, F., Hiramatsu, M., 2000. Power features of “anger” expressions in pseudo-conversational speech. In: Proceedings of Acoustic Society of Japan Fall Meeting, pp. 191–192 (in Japanese) Todoroki, T., 1993. Kousai - Kagayaki tuzukeru tameni – KB mausude nyuuryokushita Kinzisutorofii seinenno kiroku (Self-published in Japanese)