Voice-awareness control for a humanoid robot consistent with its body posture and movements
Tóm tắt
This paper presents voice-awareness control consistent with robot’s head movements. For a natural spoken communication between robots and humans, robots must behave and speak the way humans expect them to. The consistency between the robot’s voice quality and its body motion is one of the most especially striking factors in naturalness of robot speech. Our control is based on a new model of spectral envelope modification for vertical head motion, and left-right balance modulation for horizontal head motion. We assume that a pitch-axis rotation, or a vertical head motion, and a yaw-axis rotation, or a horizontal head motion, effect the voice quality independently. The spectral envelope modification model is constructed based on the analysis of human vocalizations. The left-right balance model is established by measuring impulse responses using a pair of microphones. Experimental results show that the voice-awareness is perceivable in a robot-to-robot dialogue when the robots stand up to 150 cm away. The dynamic change in the voice quality is also confirmed in the experiment.
Tài liệu tham khảo
K. Aoki, T. Kamakura, and Y. Kumamoto. Parametric loudspeaker — characteristics of acoustic field and suitablemodulation of carrier ultrasound. Electronics and Communications in Japan (Part III: Fundamental Electronic Science), 74(9):76–82, 2007.
P. Birkholz, D. Jackèl, and B. J. Kröger. Construction and control of a three-dimensional vocal tract model. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’ 06), pages 873–876, 2006.
C. Breazeal and B. Scassellati. A context-dependent attention system for a social robot. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI99), pages 1146–1151, 1999.
R. A. J. Clark, K. Richmond, and S. King. Multisyn: Open-domain unit selection for the Festival speech synthesis system. Speech Communication, 49(4):317–330, 2007.
ARNIS Sound Technologies Co., Ltd. Soundlocus. http://www.arns.com/english/tech1.html, 2009.
R. Dillmann, R. Becher, and P. Steinhaus. ARMAR II — a learning and cooperative multimodal humanoid robot system. International Journal of Humanoid Robotics, 1(1):143–155, 2004.
D. Erickson. Expressive speech: Production, perception and application to speech synthesis. Acoustical Science and Technology, 26(4):317–325, 2005.
G. Fant. Acoustical Theory of Speech Production: With Calculations based on X-Ray Studies of Russian Articulations. Mouton, The Hague, The Netherlands, 1970.
S. Fujie, D. Watanabe, Y. Ichikawa, H. Taniyama, K. Hosoya, Y. Matsuyama, and T. Kobayashi. Multi-modal integration for personalized conversation: Towards a humanoid in daily life. In 8th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2008), pages 617–622, Dec. 2008.
E. T. Hall. Hidden Dimension. Doubleday Publishing, 1996.
Z. Inanoglu and S. Young. Intonation modelling and adaptation for emotional prosody generation. Affective Computing and Intelligent Interaction, Lecture Notes in Computer Science 3784:286–293, 2005.
Kawada Industries, Inc. Upper body humanoid robot HIRO. http://global.kawada.jp/mechatronics/hiro.html, 2009.
ISO. ISO 226:2003: Acoustics — Normal equal-loudness-level contours. International Organization for Standardization, 2003.
K. Kaneko, F. Kanehiro, S. Kajita, H. Hirukawa, T. Kawasaki, M. Hirata, K. Akachi, and T. Isozumi. Humanoid robot HRP-2. In IEEE International Conference on Robotics and Automation (ICRA-2004), volume 2, pages 1083–1090 Vol.2, 26–May 1, 2004.
H. Kawahara, M. Morise, R. Nisimura, T. Irino, and H. Banno. Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’ 08), pages 3933–3936, 2008.
H. Kawahara, R. Nisimura, T. Irino, M. Morise, T. Takahashi, and H. Banno. Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’ 09), pages 3377–3680, 2009.
H. Kenmochi and H. Ohshita. Vocaloid — commercial singing synthesizer based on sample concatenation. In Proceedings of INTERSPEECH, pages 4010–4011, 2007.
H. D. Kim. Binaural Active Audition for Humanoid Robots. PhD thesis, Graduate School of Informatics, Kyoto University, Sep. 2009.
Y. Kubota, M. Yoshida, K. Komatani, T. Ogata, and H. G. Okuno. Design and implementation of 3D auditory scene visualizer towards auditory awareness with face tracking. In IEEE International Symposium on Multimedia (ISM2008), pages 468–476, Dec. 2008.
D. Matsui, T. Minato, K. F. MacDorman, and H. Ishiguro. Generating natural motion in an android by mapping human motion. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2005), pages 3301–3308, Aug. 2005.
K. Nakadai and H. Tsujino. Towards new human-humanoid communication: Listening during speaking by using ultrasonic directional speaker. In IEEE International Conference on Robots and Automation (ICRA-2005), pages 1483–1488, Apr. 2005.
K. Nakadai, H. G. Okuno, H. Nakajima, Y. Hasegawa, and H. Tsujino. An open source software system for robot audition HARK and its evaluation. In 8th IEEE-RAS International Conference on Humanoids (Humanoids 2008), pages 561–566, Dec. 2008.
T. Otsuka, K. Nakadai, T. Takahashi, K. Komatani, T. Ogata, and H. G. Okuno. Voice quality manipulation for humanoid robots consistent with their head movements. In 9th IEEE-RAS International Conference on Humanoids (Humanoids-2009), pages 405–410, Dec. 2009.
T. Otsuka, K. Nakadai, Toru Takahashi, K. Komatani, T. Ogata, and H. G. Okuno. Incremental Polyphonic Audio to Score Alignment using Beat Tracking for Singer Robots. In Proceedings of IEEE/RSJ Int’l Conference on Intelligent Robots and Systems, pages 2289–2296, 2009.
T. Tasaki, S. Matsumoto, H. Ohba, M. Toda, K. Komatani, T. Ogata, and H. G. Okuno. Distance-based dynamic interaction of humanoid robot with multiple people. Innovations in Applied Artificial Intelligence, Lecture Notes in Artificial Intelligence 3533:111–120, 2005.
A. Vurma and J. Ross. Where Is a Singer’s Voice if It Is Placed “Forward”. Journal of Voice, 16(3):383–391, 2002.