Unimodal and cross-modal identity judgements using an audio-visual sorting task: Evidence for independent processing of faces and voices

Memory and Cognition - Tập 50 - Trang 216-231 - 2021
Nadine Lavan1,2, Harriet M. J. Smith3, Carolyn McGettigan1
1Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
2Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
3Department of Psychology, Nottingham Trent University, Nottingham, UK

Tóm tắt

Unimodal and cross-modal information provided by faces and voices contribute to identity percepts. To examine how these sources of information interact, we devised a novel audio-visual sorting task in which participants were required to group video-only and audio-only clips into two identities. In a series of three experiments, we show that unimodal face and voice sorting were more accurate than cross-modal sorting: While face sorting was consistently most accurate followed by voice sorting, cross-modal sorting was at chancel level or below. In Experiment 1, we compared performance in our novel audio-visual sorting task to a traditional identity matching task, showing that unimodal and cross-modal identity perception were overall moderately more accurate than the traditional identity matching task. In Experiment 2, separating unimodal from cross-modal sorting led to small improvements in accuracy for unimodal sorting, but no change in cross-modal sorting performance. In Experiment 3, we explored the effect of minimal audio-visual training: Participants were shown a clip of the two identities in conversation prior to completing the sorting task. This led to small, nonsignificant improvements in accuracy for unimodal and cross-modal sorting. Our results indicate that unfamiliar face and voice perception operate relatively independently with no evidence of mutual benefit, suggesting that extracting reliable cross-modal identity information is challenging.

Tài liệu tham khảo

Andrews, S., Jenkins, R., Cursiter, H., & Burton, A. M. (2015). Telling faces together: Learning new faces through exposure to multiple instances. Quarterly Journal of Experimental Psychology, 68(10), 2041–2050. https://doi.org/10.1080/17470218.2014.1003949 Anwyl-Irvine, A., Massonnié, J., Flitton, A., Kirkham, N., & Evershed, J. (2020). Gorilla in our midst: An online behavioral experiment builder. Behavioural Research Methods. 52(1), 388–407. https://doi.org/10.3758/s13428-019-01237-x Barsics, C., & Brédart, S. (2012). Recalling semantic information about newly learned faces and voices. Memory, 20(5), 527–534. https://doi.org/10.1080/09658211.2012.683012 Barsics, C. G. (2014). Person recognition is easier from faces than from voices. Psychologica Belgica, 54(3), 244–254. https://doi.org/10.5334/pb.ap Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). lme4: Linear mixed effects models using Eigen and S4. Journal of Statistical Software, 67, 1–23. https://doi.org/10.18637/jss.v067.i01 Belin, P., Fecteau, S., & Bedard, C. (2004). Thinking the voice: Neural correlates of voice perception. Trends in Cognitive Sciences, 8(3), 129–135. https://doi.org/10.1016/j.tics.2004.01.008 Boersma, P., & Weenink, D. (2019). Praat: Doing phonetics by computer (Version 6.1. 01) [Computer program]. https://www.fon.hum.uva.nl/praat/ Bruce, V., Henderson, Z., Greenwood, K., Hancock, P. J. B., Burton, A. M., & Miller, P. (1999). Verification of face identities from images captured on video. Journal of Experimental Psychology: Applied, 5(4), 339–360. https://doi.org/10.1037/1076-898X.5.4.339 Bruckert, L., Bestelmeyer, P., Latinus, M., Rouger, J., Charest, I., Rousselet, G. A., Kawahara, H., & Belin, P. (2010). Vocal attractiveness increases by averaging. Current Biology, 20(2), 116–120. https://doi.org/10.1016/j.cub.2009.11.034 Bülthoff, I., & Newell, F. N. (2017). Crossmodal priming of unfamiliar faces supports early interactions between voices and faces in person perception. Visual Cognition, 25(4/6), 611–628. https://doi.org/10.1080/13506285.2017.1290729 Burton, A. M. (2013). Why has research in face recognition progressed so slowly? The importance of variability. The Quarterly Journal of Experimental Psychology, 66(8), 1467–1485. https://doi.org/10.1080/17470218.2013.800125 Burton, A. M., Kramer, R. S., Ritchie, K. L., & Jenkins, R. (2016). Identity from variation: Representations of faces derived from multiple instances. Cognitive Science, 40(1), 202–223. https://doi.org/10.1111/cogs.12231 Campanella, S., & Belin, P. (2007). Integrating face and voice in person perception. Trends in Cognitive Sciences, 11(12), 535–543. https://doi.org/10.1016/j.tics.2007.10.001 Collins, S. A., & Missing, C. (2003). Vocal and visual attractiveness are related in women. Animal Behaviour, 65, 997–1004. https://doi.org/10.1006/anbe.2003.2123 Gelman, A., & Su, Y. S. (2013). Arm: Data analysis using regression and multilevel/hierarchical models (R package. Version 1.8–6) [Computer software]. https://CRAN.Rproject.org/package=arm Huestegge, S. M. (2019). Matching unfamiliar voices to static and dynamic faces: No evidence for a dynamic face advantage in a simultaneous presentation paradigm. Frontiers in Psychology, 10, Article 1957. https://doi.org/10.3389/fpsyg.2019.01957 Jenkins, R., White, D., Van Montfort, X., & Burton, A. M. (2011). Variability in photos of the same face. Cognition, 121(3), 313–323. https://doi.org/10.1016/j.cognition.2011.08.001 Johnson, J., McGettigan, C., & Lavan, N. (2020). Comparing unfamiliar voice and face identity perception using identity-sorting tasks. Quarterly Journal of Experimental Psychology. https://doi.org/10.1177/1747021820938659 Kamachi, M., Hill, H., Lander, K., & Vatikiotis-Bateson, E. (2003). Putting the face to the voice: Matching identity across modality. Current Biology, 13, 1709–1714. https://doi.org/10.1016/j.cub.2003.09.005 Krauss, R. M., Freyberg, R., & Morsella, E. (2002). Inferring speakers’ physical attributes from their voices. Journal of Experimental Social Psychology, 38, 618–625. https://doi.org/10.1016/S0022-1031(02)00510-3 Lander, K., Hill, H., Kamachi, M., & Vatikiotis-Bateson, E. (2007). It’s not what you say but the way you say it: Matching faces and voices. Journal of Experimental Psychology: Human Perception and Performance, 33, 905–914. https://doi.org/10.1037/0096-1523.33.4.905 Langlois, J. H., & Roggman, L. A. (1990). Attractive faces are only average. Psychological Science, 1(2), 115–121. https://doi.org/10.1111/j.1467-9280.1990.tb00079.x Lavan, N., Burston, L. F., & Garrido, L. (2019). How many voices did you hear? Natural variability disrupts identity perception from unfamiliar voices. British Journal of Psychology, 110(3), 576–593. https://doi.org/10.1111/bjop.12348 Lavan, N., Burston, L. F., Ladwa, P., Merriman, S. E., Knight, S., & McGettigan, C. (2019). Breaking voice identity perception: Expressive voices are more confusable for listeners. Quarterly Journal of Experimental Psychology, 72(9), 2240–2248. https://doi.org/10.1177/1747021819836890 Lavan, N., Burton, A. M., Scott, S. K., & McGettigan, C. (2019). Flexible voices: Identity perception from variable vocal signals. Psychonomic Bulletin & Review, 26(1), 90–102. https://doi.org/10.3758/s13423-018-1497-7 Lavan, N., Knight, S., Hazan, V., & McGettigan, C. (2019). The effects of high variability training on voice identity learning. Cognition, 193, Article 104026. https://doi.org/10.1016/j.cognition.2019.104026 Lavan, N., Merriman, S. E., Ladwa, P., Burston, L. F., Knight, S., & McGettigan, C. (2019). ‘Please sort these voice recordings into 2 identities’: Effects of task instructions on performance in voice sorting studies. British Journal of Psychology. https://doi.org/10.1111/bjop.12416 Lavan, N., Mileva, M., Burton, M., Young, A., & McGettigan, C. (2020). Trait evaluations of faces and voices: Comparing within-and between-person variability. PsyArXiv. https://doi.org/10.31234/OSF.IO/PCZVM Lavan, N., Scott, S. K., & McGettigan, C. (2016). Impaired generalization of speaker identity in the perception of familiar and unfamiliar voices. Journal of Experimental Psychology: General, 145(12), 1604–1614. https://doi.org/10.1037/xge0000223 Lavan, N., Smith, H. M. J., Jiang, L., & McGettigan, C. (2020). Contributions of mouth movements to identity matching across faces and voices. PsyArXiv. https://doi.org/10.31234/osf.io/t32rz Lenth, R. (2019). emmeans: Estimated marginal means, aka least-squares means (R package, Version 1.4) [Computer software]. https://CRAN.Rproject.org/package=emmeans Mavica, L. W., & Barenholtz, E. (2013). Matching voice and face identity from static images. Journal of Experimental Psychology: Human Perception and Performance, 39, 307–312. https://doi.org/10.1037/a0030945 Murphy, J., Ipser, A., Gaigg, S. B., & Cook, R. (2015). Exemplar variance supports robust learning of facial identity. Journal of Experimental Psychology. Human Perception and Performance, 41(3), 577-581. https://doi.org/10.1037/xhp0000049 Ritchie, K. L., & Burton, A. M. (2017). Learning faces from variability. Quarterly Journal of Experimental Psychology, 70(5), 897–905. https://doi.org/10.1080/17470218.2015.1136656 Saxton, T. K., Caryl, P. G., & Roberts, C. S. (2006). Vocal and facial attractiveness judgments of children, adolescents and adults: The ontogeny of mate choice. Ethology, 112, 1179–1185. https://doi.org/10.1111/j.1439-0310.2006.01278.x Schweinberger, S. R., Robertson, D., & Kaufmann, J. M. (2007). Hearing facial identities. Quarterly Journal of Experimental Psychology, 60(10), 1446-1456. https://doi.org/10.1080/17470210601063589 Smith, H. M. J., Baguley, T. S., Robson, J., Dunn, A. K., & Stacey, P. C. (2019). Forensic voice discrimination by lay listeners: The effect of speech type and background noise on performance. Applied Cognitive Psychology, 33(2), 272–287. https://doi.org/10.1002/acp.3478 Smith, H. M. J., Dunn, A. K., Baguley, T., & Stacey, P. C. (2016a). Concordant cues in faces and voices: Testing the backup signal hypothesis. Evolutionary Psychology, 14(1), Article 1474704916630317. https://doi.org/10.1177/1474704916630317 Smith, H. M. J., Dunn, A. K., Baguley, T., & Stacey, P. C. (2016b). Matching novel face and voice identity using static and dynamic facial images. Attention, Perception, & Psychophysics, 78(3), 868–879. https://doi.org/10.3758/s13414-015-1045-8 Stevenage, S. V., Hale, S., Morgan, Y., & Neil, G. J. (2014). Recognition by association: Within-and cross-modality associative priming with faces and voices. British Journal of Psychology, 105(1), 1–16. https://doi.org/10.1111/bjop.12011 Stevenage, S. V., Hamlin, I., & Ford, B. (2017). Distinctiveness helps when matching static faces and voices. Journal of Cognitive Psychology, 29(3), 289–304. https://doi.org/10.1080/20445911.2016.1272605 Stevenage, S. V., Howland, A., & Tippelt, A. (2011). Interference in eyewitness and earwitness recognition. Applied Cognitive Psychology, 25(1), 112–118. https://doi.org/10.1002/acp.1649 Stevenage, S. V., Hugill, A. R., & Lewis, H. G. (2012). Integrating voice recognition into models of person perception. Journal of Cognitive Psychology, 24(4), 409–419. https://doi.org/10.1080/20445911.2011.642859 Stevenage, S. V., & Neil, G. J. (2014). Hearing faces and seeing voices: The integration and interaction of face and voice processing. Psychologica Belgica, 54(3), 266–281. https://doi.org/10.5334/pb.ar Stevenage, S. V., Neil, G. J., Barlow, J., Dyson, A., Eaton-Brown, C., & Parsons, B. (2013). The effect of distraction on face and voice recognition. Psychological Research, 77(2), 167–175. https://doi.org/10.1007/s00426-012-0450-z Stevenage, S. V., Symons, A. E., Fletcher, A., & Coen, C. (2020). Sorting through the impact of familiarity when processing vocal identity: Results from a voice sorting task. Quarterly Journal of Experimental Psychology, 73(4), 519–536. https://doi.org/10.1177/1747021819888064 Todorov, A., & Porter, J. M. (2014). Misleading first impressions: Different for different facial images of the same person. Psychological Science, 25(7), 1404–1417. https://doi.org/10.1177/0956797614532474 von Kriegstein, K., Kleinschmidt, A., & Giraud, A. L. (2006). Voice recognition and cross-modal responses to familiar speakers' voices in prosopagnosia. Cerebral Cortex, 16(9), 1314-1322. https://doi.org/10.1093/cercor/bhj073 von Kriegstein, K. V., Kleinschmidt, A., Sterzer, P., & Giraud, A. L. (2005). Interaction of face and voice areas during speaker recognition. Journal of Cognitive Neuroscience, 17(3), 367–376. https://doi.org/10.1162/0898929053279577 Woods, K. J., Siegel, M. H., Traer, J., & McDermott, J. H. (2017). Headphone screening to facilitate web-based auditory experiments. Attention, Perception, & Psychophysics. 79, 2064–2072. https://doi.org/10.3758/s13414-017-1361-2 Young, A. W., Frühholz, S., & Schweinberger, S. R. (2020). Face and voice perception: Understanding commonalities and differences. Trends in Cognitive Sciences, 24(5), 398–410. https://doi.org/10.1016/j.tics.2020.02.001 Yovel, G., & Belin, P. (2013). A unified coding strategy for processing faces and voices. Trends in Cognitive Sciences, 17(6), 263–271. https://doi.org/10.1016/j.tics.2013.04.004 Zäske, R., Schweinberger, S. R., & Kawahara, H. (2010). Voice aftereffects of adaptation to speaker identity. Hearing Research, 268(1/2), 38–45. https://doi.org/10.1016/j.heares.2010.04.011