Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis

Springer Science and Business Media LLC - Tập 1 Số 1 - 2015

Guillaume Gibert¹, Kirk N. Olsen¹, Yvonne Leung¹, Catherine Stevens¹

¹The MARCS Institute, University of Western Sydney, Locked Bag 1797, Penrith, NSW, 2751, Australia

Tóm tắt

Từ khóa

Tài liệu tham khảo

Badin, P., & Serrurier, A. (2006). Three-dimensional linear modeling of tongue: Articulatory data and models. Paper presented at the 7th International Seminar on Speech Production, Belo Horizonte, Brazil

Badin, P, Bailly, G, Reveret, L, Baciu, M, Segebarth, C, & Savariaux, C. (2002). Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. Journal of Phonetics, 30(3), 533–553.

Badin, P., Elisei, F., Bailly, G., & Tarabalka, Y. (2008). An audiovisual talking head for augmented speech generation: Models and animations based on a real speaker's articulatory data. In Articulated Motion and Deformable Objects, Proceedings (Vol. 5098, pp. 132–143, Lecture Notes in Computer Science)

Bailly, G., Gibert, G., & Odisio, M (2002). Evaluation of movement generation systems using the point-light technique. In Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on, 2002 (pp. 27–30)

Bailly, G, Berar, M, Elisei, F, & Odisio, M. (2003). Audiovisual Speech Synthesis. International Journal of Speech Technology, 6, 331–346.

Bailly, G., Govokhina, O., Elisei, F., & Breton, G. (2009). Lip-synching using speaker-specific articulation, shape and appearance models. Journal of Acoustics, Speech and Music Processing. Special issue on “Animating Virtual Speakers or Singers from Audio: Lip-Synching Facial Animation”, doi:10.1155/2009/769494

Berry, JJ. (2011). Accuracy of the NDI Wave Speech Research System. Journal of Speech, Language, and Hearing Research, 54(5), 1295–1301. doi: 10.1044/1092-4388(2011/10-0226) .

Black, A. W., & Lenzo, K. (2007). Festvox: Building synthetic voices. (2.1 ed.)

Boersma, P., & Weenink, D. (2010). Praat: doing phonetics by computer. (5.1.31 ed.)

Burnham, D., Dale, R., Stevens, K., Powers, D., Davis, C., Buchholz, J., et al. (2006–2011). From Talking Heads to Thinking Heads: A Research Platform for Human Communication Science. ARC/NH&MRC Special Initiatives, TS0669874

Cohen, MM, & Massaro, D. (1993). Modeling Coarticulation in Synthetic Visual Speech. In NM Thalmann & D Thalmann (Eds.), Models and Techniques in Computer Animation. Tokyo, Japan: Springer.

Cosatto, E, & Graf, H-P. (2000). Photo-realistic talking heads from image samples. IEEE Transactions on Multimedia, 2, 152–163.

Engwall, O. (2000). A 3D tongue model based on MRI data. In International Conference on Spoken Language Processing, Beijing, China (Vol. 3, pp. 901–904)

Engwall, O. (2003). Combining MRI, EMA and EPG measurements in a three-dimensional tongue model. Speech Communication, 41(2–3), 303–329. doi: 10.1016/s0167-6393(03)00132-2 .

Engwall, O. (2005). Articulatory synthesis using corpus-based estimation of line spectrum pairs. Paper presented at the INTERSPEECH, Lisbon, Portugal.

Engwall, O. (2008). Can audio-visual instructions help learners improve their articulation? An ultrasound study of short term changes. In Interspeech 2008, Brisbane, Australia, 2008 (pp. 2631–2634)

Ezzat, T, & Poggio, T. (2000). Visual speech synthesis by morphing visemes. International Journal of Computer Vision, 38(1), 45–57.

Ezzat, T., Geiger, G., & Poggio, T. (2002). Trainable videorealistic speech animation. Paper presented at the ACM SIGGRAPH, San Antonio, TX

Fabre, D., Hueber, T., & Badin, P. (2014). Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression. Paper presented at the INTERSPEECH, Singapore

Fisher, CG. (1968). Confusions Among Visually Perceived Consonants. Journal of Speech, Language, and Hearing Research, 11(4), 796–804.

Geiger, G., Ezzat, T., & Poggio, T. (2003). Perceptual Evaluation of Video-realistic Speech. In C. P. #224 (Ed.), AI Memo #2003-003. Cambridge, MA: Massachusetts Institute of Technology

Gibert, G., & Stevens, C. J. (2012). Realistic eye model for Embodied Conversational Agents. Paper presented at the ACM 3rd International Symposium on Facial Analysis and Animation, Vienna, Austria, 21st September 2012

Gibert, G, Bailly, G, Beautemps, D, Elisei, F, & Brun, R. (2005). Analysis and synthesis of the three-dimensional movements of the head, face, and hand of a speaker using cued speech. Journal of Acoustical Society of America, 118(2), 1144–1153. doi: 10.1121/1.1944587 .

Gibert, G., Attina, V., Tiede, M., Bundgaard-Nielsen, R., Kroos, C., Kasisopa, B., et al. (2012). Multimodal Speech Animation from Electromagnetic Articulography Data. Paper presented at the 20th European Signal Processing Conference (EUSIPCO), Bucharest, Romania

Gibert, G, Leung, Y, & Stevens, CJ. (2013). Control of speech-related facial movements of an avatar from video. Speech Communication, 55(1), 135–146. http://dx.doi.org/10.1016/j.specom.2012.07.001 .

Granstrom, B, & House, D. (2005). Audiovisual representation of prosody in expressive speech communication. Speech Communication, 46(3–4), 473–484.

Gris, I, Novick, D, Camacho, A, Rivera, D, Gutierrez, M, & Rayon, A. (2014). Recorded Speech, Virtual Environments, and the Effectiveness of Embodied Conversational Agents. In T Bickmore, S Marsella, & C Sidner (Eds.), Intelligent Virtual Agents. Vol. 8637, Lecture Notes in Computer Science (pp. 182–185). New York: Springer International Publishing.

Jiang, J., Alwan, A., Bernstein, L. E., Keating, P., & Auer, E. (2002). On the correlation between facial movements, tongue movements and speech acoustics. Paper presented at the International Conference on Spoken Language Processing (ICSLP), Bejing, China

Kim, J, Lammert, AC, Kumar Ghosh, P, & Narayanan, SS. (2014). Co-registration of speech production datasets from electromagnetic articulography and real-time magnetic resonance imaging. Journal of Acoustical Society of America, 135(2), EL115–EL121. http://dx.doi.org/10.1121/1.4862880 .

Kim, J, Toutios, A, Lee, S, & Narayanan, SS. (2015). A kinematic study of critical and non-critical articulators in emotional speech production. Journal of Acoustical Society of America, 137(3), 1411–1429. http://dx.doi.org/10.1121/1.4908284 .

Kuratate, T. (2008). Text-to-AV synthesis system for Thinking Head Project. Paper presented at the Auditory-Visual Speech Processing, Brisbane, Australia

Musti, U., Toutios, A., Colotte, V., & Ouni, S. (2011). Introducing Visual Target Cost within an Acoustic-Visual Unit-Selection Speech Synthesizer. Paper presented at the AVSP, Volterra, Italy

Narayanan, S, Toutios, A, Ramanarayanan, V, Lammert, A, Kim, J, Lee, S, et al. (2014). Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). Journal of Acoustical Society of America, 136(3), 1307–1311. http://dx.doi.org/10.1121/1.4890284 .

Pammi, S. C., Charfuelan, M., & Schröder, M. (2010). DFKI-LT - Multilingual Voice Creation Toolkit for the MARY TTS Platform. Paper presented at the LREC, Valleta, Malta

Pelachaud, C. (2009). Studies on gesture expressivity for a virtual agent. Speech Communication, 51(7), 630–639. doi: 10.1016/j.specom.2008.04.009 .

Ramanarayanan, V, Goldstein, L, & Narayanan, SS. (2013). Spatio-temporal articulatory movement primitives during speech production: Extraction, interpretation, and validation. Journal of Acoustical Society of America, 134(2), 1378–1394. doi: 10.1121/1.4812765 .

Revéret, L., Bailly, G., & Badin, P. (2000). MOTHER: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation. In International Conference on Speech and Language Processing, Beijing, China, (pp. 755–758)

Rosenblum, LD, Johnson, JA, & Saldana, HM. (1996). Point-light facial displays enhance comprehension of speech in noise. Journal of Speech and Hearing Research, 39(6), 1159–1170.

Schröder, M, Charfuelan, M, Pammi, S, & Steiner, I. (2011). Open source voice creation toolkit for the MARY TTS Platform. In 12th Annual Conference of the International Speech Communication Association - Interspeech 2011, Florence, Italy, 2011–08 (pp. 3253–3256). Italy: ISCA. https://hal.inria.fr/hal-00661061/document ,https://hal.inria.fr/hal-00661061/file/Interspeech2011.pdf.

Sheng, L., Lan, W., & En, Q. The Phoneme-Level Articulator Dynamics for Pronunciation Animation. In Asian Language Processing (IALP), 2011 International Conference on, 15–17 Nov. 2011 2011 (pp. 283–286). doi:10.1109/ialp.2011.13

Steiner, I., Richmond, K., & Ouni, S. (2013). Speech animation using electromagnetic articulography as motion capture data. Paper presented at the Auditory-Visual Speech Processing (AVSP), Annecy, France, August 29 - September 1, 2013

Sumby, WH, & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of Acoustical Society of America, 26, 212–215.

Theobald, BJ. (2003). Visual speech synthesis using shape and appearance models. Norwich, UK: University of East Anglia.

Theobald, B. J., Fagel, S., Bailly, G., & Elisei, F. (2008). LIPS 2008: Visual Speech Synthesis Challenge. Paper presented at the INTERSPEECH 2008, Brisbane, Australia

Toutios, A., Shrikanth, S., & Narayanan, S. (2013). Articulatory Synthesis of French Connected Speech from EMA Data. Paper presented at the INTERSPEECH, Lyon, France

Yehia, HC, Kuratate, T, & Vatikiotis-Bateson, E. (2002). Linking facial animation, head motion and speech acoustics. Journal of Phonetics, 30(3), 555–568.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA