Auditory visual prominence

Journal on Multimodal User Interfaces - Tập 3 Số 4 - Trang 299-309 - 2009
Sàmer Al Moubayed1, Jonas Beskow1, Björn Granström1
1Center for Speech Technology, KTH, Stockholm, Sweden

Tóm tắt

Từ khóa


Tài liệu tham khảo

McGurk H, MacDonald J (1976) Hearing lips and seeing voices

Summerfield Q (1992) Lipreading and audio-visual speech perception. Philos Trans Biol Sci 335(1273):71–78

Cave C, Guaïtella I, Bertrand R, Santi S, Harlay F, Espesser R (1996) About the relationship between eyebrow movements and Fo variations. In: Proc of the fourth international conference on spoken language, vol 4

Munhall K, Jones J, Callan D, Kuratate T, Vatikiotis-Bateson E (2004) Head movement improves auditory speech perception. Psychol Sci 15(2):133–137

Davis C, Kim J (2006) Audio-visual speech perception off the top of the head. Cognition 100(3):21–31

Cvejic E, Kim J, Davis C (2010) Prosody off the top of the head: Prosodic contrasts can be discriminated by head motion. Speech Commun 52

Terken J (1991) Fundamental frequency and perceived prominence of accented syllables. J Acoust Soc Am 89:1768

Gundel J (1999) On different kinds of focus. In: Focus: linguistic, cognitive, and computational perspectives, pp 293–305

Grice M, Savino M (1997) Can pitch accent type convey information status in yes-no questions. In: Proc of the workshop sponsored by the association for computational linguistics, pp 29–38

Granström B, House D (2005) Audiovisual representation of prosody in expressive speech communication. Speech Commun 46(3–4):473–484

Beskow J, Granström B, House D (2006) Visual correlates to prominence in several expressive modes. In: Proc of the ninth international conference on spoken language processing

House D, Beskow J, Granström B (2001) Timing and interaction of visual cues for prominence in audiovisual speech perception. In: Proc of the seventh European conference on speech communication and technology

Swerts M, Krahmer E (2006) The importance of different facial areas for signalling visual prominence. In: Proc of the ninth international conference on spoken language processing

Krahmer E, Swerts M (2007) The effects of visual beats on prosodic prominence: Acoustic analyses, auditory perception and visual perception. J Mem Lang 57(3):396–414

Dohen M, Lœvenbruck H (2009) Interaction of audition and vision for the perception of prosodic contrastive focus. Lang Speech 52(2–3):177

Dohen M, Lcevenbruck H, Hill H (2009) Recognizing prosody from the lips: is it possible to extract prosodic focus. In: Visual speech recognition: lip segmentation and mapping, p 416

Streefkerk B, Pols L, Bosch L (1999) Acoustical features as predictors for prominence in read aloud Dutch sentences used in ANN’s. In: Sixth European conference on speech communication and technology, Citeseer

Fant G, Kruckenberg A, Nord L (1991) Durational correlates of stress in Swedish, French, and English. J Phon 19(3–4):351–365

Bruce G (1977) Swedish word accents in sentence perspective. LiberLäromedel/Gleerup, Malmo

Gussenhoven C, Bruce G (1999) Word prosody and intonation. In: Empirical approaches to language typology, pp 233–272

Heldner M, Strangert E (2001) Temporal effects of focus in Swedish. J Phon 29(3):329–361

Fant G, Kruckenberg A, Liljencrants J, Hertegård S (2000) Acoustic phonetic studies of prominence in Swedish. KTH TMH-QPSR 2(3):2000

Fant G, Kruckenberg A (1994) Notes on stress and word accent in Swedish. In: Proceedings of the international symposium on prosody, 18 September 1994, Yokohama, pp 2–3

Krahmer E, Swerts M (2004) More about brows: a cross-linguistic study via analysis-by-synthesis. In: From brows to trust: evaluating embodied conversational agents, pp 191–216

Massaro D (1998) Perceiving talking faces: from speech perception to a behavioral principle. MIT Press, Cambridge

Agelfors E, Beskow J, Dahlquist M, Granström B, Lundeberg M, Spens K-E, Öhman T (1998) Synthetic faces as a lipreading support. In: Proceedings of ICSLP’98

Salvi G, Beskow J, Al Moubayed S, Granström B (2009) Synface—speech-driven facial animation for virtual speech-reading support. J Audio Speech Music Process 2009

Beskow J (1995) Rule-based visual speech synthesis. In: Proc of the fourth European conference on speech communication and technology

Sjölander K (2003) An HMM-based system for automatic segmentation and alignment of speech. In: Proceedings of fonetik, pp 93–96

Beskow J (2004) Trainable articulatory control models for visual speech synthesis. Int J Speech Technol 7(4):335–349

Shannon R, Zeng F, Kamath V, Wygonski J, Ekelid M (1995) Speech recognition with primarily temporal cues. Science 270(5234):303

Fant G, Kruckenberg A, Nord L (1991) Durational correlates of stress in Swedish, French and English. J Phon 19(1991):351–365

Heldner M, Strangert E (2001) Temporal effects of focus in Swedish. J Phon 29:329–361

Moubayed S Al, Ananthakrishnan G, Enflo L (2010) Automatic prominence classification in Swedish. In: Proceedings of prosodic prominence: perceptual and automatic identification workshop, Chicago, USA

Swerts M, Krahmer E (2004) Congruent and incongruent audiovisual cues to prominence. In: Proc of speech prosody

de Cheveigne A, Kawahara H (2002) YIN, a fundamental frequency estimator for speech and musicy. J Acoust Soc Am 111:1917

Al Moubayed S, Beskow J, Oster A-M, Salvi G, Granström B, van Son N, Ormel E (2009) Virtual speech reading support for hard of hearing in a domestic multi-media setting. In: Proceedings of interspeech 2009

Poggi I, Pelachaud C, De Rosisc F (2000) Eye communication in a conversational 3D synthetic agent. AI Commun 13(3):169–181

Ekman P (1979) About brows: Emotional and conversational signals. In: Human ethology: claims and limits of a new discipline: contributions to the colloquium, pp 169–248

Cassell J, Pelachaud C, Badler N, Steedman M, Achorn B, Becket T, Douville B, Prevost S, Stone M (1994) Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents. In: Proceedings of the 21st annual conference on computer graphics and interactive techniques, pp 413–420

Raidt S, Bailly G, Elisei F (2007) Analyzing and modeling gaze during face-to-face interaction. In: Proceedings of the international conference on auditory-visual speech processing (AVSP 2007)

Vatikiotis-Bateson E, Eigsti I, Yano S, Munhall K (1998) Eye movement of perceivers during audiovisual speech perception. Percept Psychophys 60(6):926–940

Paré M, Richler R, ten Hove M, Munhall K (2003) Gaze behavior in audiovisual speech perception: the influence of ocular fixations on the McGurk effect. Percept Psychophys 65(4):553

Cutler A, Otake T (1999) Pitch accent in spoken-word recognition in Japanese. J Acoust Soc Am 105:1877

van Wassenhove V, Grant K, Poeppel D (2005) Visual speech speeds up the neural processing of auditory speech. Proc Nat Acad Sci 102(4):1181