Negative emotions in the target speaker’s voice enhance speech recognition under “cocktail-party” environments
Tóm tắt
Under a “cocktail-party” environment with simultaneous multiple talkers, recognition of target speech is effectively improved by a number of perceptually unmasking cues. It remains unclear whether emotions embedded in the target-speaker’s voice can either improve speech perception alone or interact with other cues facilitating speech perception against a masker background. This study used two target-speaker voices with different emotional valences to examine whether recognition of target speech is modulated by the emotional valence when the target speech and the maskers were perceptually co-located or separated. The results showed that both the speech recognition against the masker background and the separation-induced unmasking effect were higher for the target speaker with a negatively emotional voice than for the target speaker with a positively emotional voice. Moreover, when the negative voice was fear conditioned, the target-speech recognition was further improved against speech informational masking. These results suggested that the emotionally vocal unmasking cue interacts significantly with the perceived spatial-separation unmasking cue, facilitating the unmasking effect against a masking background. Thus, emotional features embedded in the target-speaker’s vocal timbre are also useful for unmasking the target speech in “cocktail-party” environments.
Tài liệu tham khảo
Arbogast, T. L., Mason, C. R., & G. Kidd Jr (2002). The effect of spatial separation on informational and energetic masking of speech. Journal of the Acoustical Society of America, 112(1), 2086-2098.
Arons, B. (1992). A review of the cocktail party effect. Journal of the American Voice I/O Society, 12(7), 35-50.
Bradley, M. M., & Lang, P. J. (1994). Measuring Emotion: The Self-Assessment Manikin and the Semantic Differential. J Behav Ther Exp Psychiatry, 25(1), 49-59.
Bradley, M. M., & Lang, P. J. (2007). The International Affective Digitized Sounds (IADS-2): Affective ratings of sounds and instruction manual. University of Florida, Gainesville, FL, Tech. Rep. B-3, 29-41
Bragman, A. S. (1994). Auditory scene analysis: The perceptual organization of sound. Cambridge: MIT Press.
Brainard, H. D. (1997). The Psychophysics Toolbox. Spatial Vision, 10(4), 433-436.
Bregman, A. S. (1994). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge: MIT Press.
Brungart, D. S. (2001). Informational and energetic masking effects in the perception of two simultaneous talkers. Journal of the Acoustical Society of America, 109(3), 1101-1109.
Case J., Seyfarth, S., and Levi, Susannah V. (2018). Short-term implicit voice-learning leads to a Familiar Talker Advantage: The role of encoding specificity. The Journal of the Acoustical Society of America, 144, EL479.
Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with 2 ears. Journal of the Acoustical Society of America, 25(5), 975-979. https://doi.org/10.1121/1.1907229
Dupuis, K., & Pichora-Fuller, M. K. (2010). Use of Affective Prosody by Young and Older Adults. Psychology and Aging, 25(1), 16-29. https://doi.org/10.1037/a0018777
Dupuis, K., & Pichora-Fuller, M. K. (2014). Intelligibility of Emotional Speech in Younger and Older Adults. Ear and Hearing, 35(6), 695-707.
Eastwood, J. D., Smilek, D., & Merikle, P. M. (2001). Differential attentional guidance by unattended faces expressing positive and negative emotion. Perception & Psychophysics, 63(6), 1004-1013.
Koster, E., Crombez, G., Van Damme, S., Verschuere, B., De Houwer, J. (2005). Signals for threat modulate attentional capture and holding: Fear-conditioning and extinction during the exogenous cueing task. Cognition & Emotion, 19(5):771-780.
Fox, E. (2002). Processing emotional facial expressions: The role of anxiety and awareness. Cognitive, Affective, & Behavioral Neuroscience 2(1), 52-63.
Freyman, R. L., Balakrishnan, U., & Helfer, K. S. (2001). Spatial release from informational masking in speech recognition. Journal of the Acoustical Society of America, 109(5), 2112-2122.
Freyman, R. L., Helfer, K. S., McCall, D. D., & Clifton, R. K. (1999). The role of perceived spatial separation in the unmasking of speech. Journal of the Acoustical Society of America, 106(6), 3578-3588.
Frühholz, S., Trost, W., & Kotz, S. A. (2016). The sound of emotions: Towards a unifying neural network perspective of affective sound processing. Neuroscience & Biobehavioral Reviews, 68, 96–110.
Gordon, M. S., & Ancheta, J. (2017). Visual and acoustic information supporting a happily expressed speech-in-noise advantage. Quarterly Journal of Experimental Psychology, 70(1), 163-178.
Gordon, M. S., & Hibberts, M. (2011). Audiovisual speech from emotionally expressive and lateralized faces. Quarterly Journal of Experimental Psychology, 64(4), 730-750.
Grandjean, D., Sander, D., Pourtois, G., Schwartz, S., Seghier, M. L., Scherer, K. R., & Vuilleumier, P. (2005). The voices of wrath: brain responses to angry prosody in meaningless speech. Nature Neuroscience, 8(2), 145-146.
Haykin, S., & Chen, Z. (2005). The Cocktail Party Problem. Neural Computation, 17(9), 1875-1902. https://doi.org/10.1162/0899766054322964
Helfer, K. S. (1997). Auditory and auditory-visual perception of clear and conversational speech. Journal of Speech, Language, and Hearing Research, 40, 432-443.
Holmes, E., Domingo, Y., & Johnsrude, I. S. (2018). Familiar Voices Are More Intelligible, Even if They Are Not Recognized as Familiar. Psychological Science, 29(10), 1575-1583. https://doi.org/10.1177/0956797618779083.
Huang, Y., Huang, Q., Chen, X., Wu, X.-H., Li, L. (2009). Transient auditory storage of acoustic details is associated with release of speech from informational masking in reverberant conditions. Journal of Experimental Psychology: Human Perception and Performance, 35, 1618-1628.
Huang, Y., Xu, L.-J., Wu, X.-H., Li, L. (2010). The effect of voice cuing on releasing speech from informational masking disappears in older adults. Ear and Hearing, 31, 579-583.
Iwashiro, N., Yahata, N., Kawamuro, Y., Kasai, K., & Yamasue, H. (2013). Aberrant Interference of Auditory Negative Words on Attention in Patients with Schizophrenia. PLOS ONE, 8(12), 9. https://doi.org/10.1371/journal.pone.0083201
Jeong, J. W., Diwadkar, V. A., Chugani, C. D., Sinsoongsud, P., Muzik, O., Behen, M. E., … Chugani, D. C. (2011). Congruence of happy and sad emotion in music and faces modifies cortical audiovisual activation. Neuroimage, 54(4), 2973-2982. https://doi.org/10.1016/j.neuroimage.2010.11.017
Johnsrude, I. S., Mackey, A. , Hakyemez, H. , Alexander, E. , Trang, H. P. , & Carlyon, R. P. . (2013). Swinging at a cocktail party: voice familiarity aids speech perception in the presence of a competing voice. Psychological Science, 24(10), 1995-2004.
Klaus R. Scherer, K. R. (1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin, 99(2):143-165.
Pischek-Simpson, L. K., Boschen, M. J., Neumann, D. L., Waters, A. M. (2009). The development of an attentional bias for angry faces following Pavlovian fear conditioning. Behaviour Research and Therapy 47(4):322-330
Levi, S. V., Winters, S. J., & Pisoni, D. B. (2011). Effects of cross-language voice training on speech perception: Whose familiar voices are more intelligible? Journal of the Acoustical Society of America, 130(6), 4053-4062.
Li, H.-H., Kong, L.-Z., Wu, X.-H., Li, L. (2013). Primitive auditory memory is correlated with spatial unmasking that is based on direct-reflection integration. PLoS ONE, 8 (4) e63106.
Li, L., Daneman, M., Qi, J. G., & Schneider, B. A. (2004). Does the Information Content of an Irrelevant Source Differentially Affect Spoken Word Recognition in Younger and Older Adults? Journal of Experimental Psychology Human Perception & Performan, 30(6), 1077-1091.
Lu, L., Bao, X., Chen, J., Qu, T., Wu, X., & Li, L. (2018). Emotionally conditioning the target-speech voice enhances recognition of the target speech under “cocktail-party” listening conditions. Attention Perception & Psychophysics, 80(4), 871-883.
New, J. J., & German, T. C. (2015). Spiders at the cocktail party: an ancestral threat that surmounts inattentional blindness. Evolution & Human Behavior, 36(3), 165-173.
Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60(3), 355-376.
Ohman, A., Flykt, A., & Esteves, F. (2001). Emotion Drives Attention : Detecting the Snake in the Grass. Journal of Experimental Psychology General, 130(3), 466-478.
Pollack, I., Pickett, J. M., & Sumby, W. H. (1954). ON THE IDENTIFICATION OF SPEAKERS BY VOICE. Journal of the Acoustical Society of America, 26(3), 403-406. https://doi.org/10.1121/1.1907349
Freyman, R. L., Balakrishnan, U., & Helfer, K. S. (2004). Effect of number of masking talkers and auditory priming on informational masking in speech recognition. The Journal of the Acoustical Society of America, 115(5):2246-2256.
Sander, D., Grandjean, D., Pourtois, G., Schwartz, S., Seghier, M. L., Scherer, K. R., & Vuilleumier, P. (2005). Emotion and attention interactions in social cognition: Brain regions involved in processing anger prosody. Neuroimage, 28(4), 848-858.
Schneider, B. A., Li, L., & Daneman, M. (2007). How competing speech interferes with speech comprehension in everyday listening situations. Journal of the American Academy of Audiology, 18(7), 559-572. https://doi.org/10.3766/jaaa.18.7.4
Singer, W. (1993). Synchronization of cortical activity and its putative role in information-processing and learning. Annual Review of Physiology, 55, 349-374.
Spreadborough, K. L., & Anton-Mendez, I. (2018). It’s not what you sing, it’s how you sing it: How the emotional valence of vocal timbre influences listeners’ emotional perception of words. Psychology of Music.
von der Malsburg, C. (1999). The what and why of binding: The modeler's perspective. Neuron, 24(1), 95-104. https://doi.org/10.1016/s0896-6273(00)80825-9
Vuilleumier, P. (2005). How brains beware: neural mechanisms of emotional attention. Trends in Cognitive Sciences 9(12), 585-594.
Wallach, H., Newman, E. B., & Rosenzweig, M. R. (1949). A Precedence Effect in Sound Localization. The Journal of the Acoustical Society of America, 21, 468.
Wolfram, S. (1991). Mathematica: A system for doing mathematics by computer. Addison-Wesley, New York.
Wu, X., Wang, C., Chen, J., Qu, H., & Li, W. (2005). The effect of perceived spatial separation on informational masking of Chinese speech. Hear Res, 199(1-2), 1-10.
Wu, X., Chen, J., Yang, Z., Huang, Q., Wang, M., & Li, L. (2007). Effect of number of masking talkers on speech-on-speech masking in Chinese. In Proceedings of Interspeech (pp. 390–393). Antwerp, Belgium.
Yang, Z. G., Chen, J., Huang, Q., Wu, X. H., Wu, Y. H., Schneider, B. A., & Li, L. (2007). The effect of voice cuing on releasing Chinese speech from informational masking. Speech Communication, 49(12), 892-904. https://doi.org/10.1016/j.specom.2007.05.005