Depression-level assessment from multi-lingual conversational speech data using acoustic and text features
Tóm tắt
Depression is a widespread mental health problem around the world with a significant burden on economies. Its early diagnosis and treatment are critical to reduce the costs and even save lives. One key aspect to achieve that goal is to use technology and monitor depression remotely and relatively inexpensively using automated agents. There has been numerous efforts to automatically assess depression levels using audiovisual features as well as text-analysis of conversational speech transcriptions. However, difficulty in data collection and the limited amounts of data available for research present challenges that are hampering the success of the algorithms. One of the two novel contributions in this paper is to exploit databases from multiple languages for acoustic feature selection. Since a large number of features can be extracted from speech, given the small amounts of training data available, effective data selection is critical for success. Our proposed multi-lingual method was effective at selecting better features than the baseline algorithms, which significantly improved the depression assessment accuracy. The second contribution of the paper is to extract text-based features for depression assessment and use a novel algorithm to fuse the text- and speech-based classifiers which further boosted the performance.
Tài liệu tham khảo
A. Halfin, Depression: the benefits of early and appropriate treatment. Am. J. Manage Care. 13:, 92–7 (2007).
Depression and other common mental disorders: global health estimates. Geneva World Health Organ., 13 (2017).
M. Valstar, J. Gratch, B. Schuller, F. Ringeval, D. Lalanne, M. Torres Torres, S. Scherer, G. Stratou, R. Cowie, M. Pantic, in Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. Avec 2016: Depression, mood, and emotion recognition workshop and challenge (Association for Computing MachineryNew York, 2016), pp. 3–10.
J. C. Mundt, A. P. Vogel, D. E. Feltner, W. R. Lenderking, Vocal acoustic biomarkers of depression severity and treatment response. Biol. Psychiatry. 72(7), 580–587 (2012).
D. J. France, R. G. Shiavi, S. Silverman, M. Silverman, M. Wilkes, Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomed. Eng.47(7), 829–837 (2000).
B. Stasak, J. Epps, R. Goecke, in Proc. Interspeech 2017. Elicitation design for acoustic depression classification: an investigation of articulation effort, linguistic complexity, and word affect (International Speech Communication AssociationFrance, 2017), pp. 834–838. https://doi.org/10.21437/Interspeech.2017-1223.
N. Cummins, S. Scherer, J. Krajewski, S. Schnieder, J. Epps, T. F. Quatieri, A review of depression and suicide risk assessment using speech analysis. Speech Comm.71:, 10–49 (2015).
F. Or, J. Torous, J. -P. Onnela, High potential but limited evidence: using voice data from smartphones to monitor and diagnose mood disorders. Psychiatr. Rehabil. J.40(3), 320 (2017).
O. Simantiraki, P. Charonyktakis, A. Pampouchidou, M. Tsiknakis, M. Cooke, in Proc. Interspeech 2017. Glottal source features for automatic speech-based depression assessment (International Speech Communication AssociationFrance, 2017), pp. 2700–2704. https://doi.org/10.21437/Interspeech.2017-1251.
B. S. Helfer, T. F. Quatieri, J. R. Williamson, D. D. Mehta, R. Horwitz, B. Yu, in Interspeech. Classification of depression state based on articulatory precision (International Speech Communication AssociationFrance, 2013), pp. 2172–2176.
N. Cummins, V. Sethu, J. Epps, J. Krajewski, in Interspeech. Probabilistic acoustic volume analysis for speech affected by depression (International Speech Communication AssociationFrance, 2014), pp. 1238–1242.
B. Vlasenko, H. Sagha, N. Cummins, B. Schuller, in Proc. Interspeech 2017. Implementing gender-dependent vowel-level analysis for boosting speech-based depression recognition (International Speech Communication AssociationFrance, 2017), pp. 3266–3270. https://doi.org/10.21437/Interspeech.2017-887.
A. Afshan, J. Guo, S. J. Park, V. Ravi, J. Flint, A. Alwan. Effectiveness of voice quality features in detecting depression (International Speech Communication AssociationFrance, 2018), pp. 1676–1680.
F. Ringeval, B. Schuller, M. Valstar, N. Cummins, R. Cowie, L. Tavabi, M. Schmitt, S. Alisamir, S. Amiriparian, E. -M. Messner, et al, in Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop. Avec 2019 workshop and challenge: state-of-mind, detecting depression with ai, and cross-cultural affect recognition (Association for Computing MachineryNew York, 2019), pp. 3–12.
F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. André, C. Busso, L. Y. Devillers, J. Epps, P. Laukka, S. S. Narayanan, et al, The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE Trans. Affect. Comput.7(2), 190–202 (2016).
M. Schmitt, F. Ringeval, B. W. Schuller, in Interspeech. At the border of acoustics and linguistics: Bag-of-audio-words for the recognition of emotions in speech (International Speech Communication AssociationFrance, 2016), pp. 495–499.
F. Ringeval, B. Schuller, M. Valstar, R. Cowie, H. Kaya, M. Schmitt, S. Amiriparian, N. Cummins, D. Lalanne, A. Michaud, et al, in Proceedings of the 2018 on Audio/visual Emotion Challenge and Workshop. Avec 2018 workshop and challenge: bipolar disorder and cross-cultural affect recognition (Association for Computing MachineryNew York, 2018), pp. 3–13.
S. Amiriparian, M. Gerczuk, S. Ottl, N. Cummins, M. Freitag, S. Pugachevskiy, A. Baird, B. W. Schuller, in INTERSPEECH. Snore sound classification using image-based deep spectrum features. vol. 434 (International Speech Communication AssociationFrance, 2017), pp. 3512–3516.
J. F. Cohn, T. S. Kruez, I. Matthews, Y. Yang, M. H. Nguyen, M. T. Padilla, F. Zhou, F. De la Torre, in Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference On. Detecting depression from facial actions and vocal prosody (IEEE Computer SocietyLos Alamitos, 2009), pp. 1–7.
M. Kächele, M. Glodek, D. Zharkov, S. Meudt, F. Schwenker, Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. Depression. 1(1), 671–678 (2014).
V. Jain, J. L. Crowley, A. K. Dey, A. Lux, in Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. Depression estimation using audiovisual features and fisher vector encoding (Association for Computing MachineryNew York, 2014), pp. 87–91.
R. Gupta, S. S. Narayanan, in INTERSPEECH. Predicting affective dimensions based on self assessed depression severity (International Speech Communication AssociationFrance, 2016), pp. 1427–1431.
R. Gupta, S. Sahu, C. Espy-Wilson, S. S. Narayanan, in Proc. Interspeech 2017. An affect prediction approach through depression severity parameter incorporation in neural networks (International Speech Communication AssociationFrance, 2017), pp. 3122–3126. https://doi.org/10.21437/Interspeech.2017-120.
J. R. Williamson, T. F. Quatieri, B. S. Helfer, G. Ciccarelli, D. D. Mehta, in Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. Vocal and facial biomarkers of depression based on motor incoordination and timing (Association for Computing MachineryNew York, 2014), pp. 65–72.
B. -Q. Li, L. -L. Hu, L. Chen, K. -Y. Feng, Y. -D. Cai, K. -C. Chou, Prediction of protein domain with MRMR feature selection and analysis. PLoS ONE. 7(6), 39308 (2012).
Y. Cai, T. Huang, L. Hu, X. Shi, L. Xie, Y. Li, Prediction of lysine ubiquitination with MRMR feature selection and analysis. Amino Acids. 42(4), 1387–1395 (2012).
M. Pal, G. M. Foody, Feature selection for classification of hyperspectral data by SVM. IEEE Trans. Geosci. Remote Sens.48(5), 2297–2307 (2010).
Y. Gong, C. Poellabauer, in Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. Topic modeling based multi-modal depression detection (Association for Computing MachineryNew York, 2017), pp. 69–76.
M. A. Hall, Correlation-based feature subset selection for machine learning. Thesis submitted in partial fulfillment of the requirements of the degree of Doctor of Philosophy at the University of Waikato (1998).
S. Alghowinem, R. Goecke, J. Epps, M. Wagner, J. Cohn, in Interspeech 2016. Cross-cultural depression recognition from vocal biomarkers (International Speech Communication AssociationFrance, 2016), pp. 1943–1947.
F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. André, C. Busso, L. Y. Devillers, J. Epps, P. Laukka, S. S. Narayanan, et al, The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE Trans. Affect. Comput.7(2), 190–202 (2015).
R. Gupta, N. Malandrakis, B. Xiao, T. Guha, M. Van Segbroeck, M. Black, A. Potamianos, S. Narayanan, in Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. Multimodal prediction of affective dimensions and depression in human-computer interactions (Association for Computing MachineryNew York, 2014), pp. 33–40.
M. R. Morales, R. Levitan, in Spoken Language Technology Workshop (SLT), 2016 IEEE. Speech vs. text: a comparative analysis of features for depression detection systems (IEEE, 2016), pp. 136–143.
S. Scherer, G. Stratou, J. Gratch, L. -P. Morency, in Interspeech. Investigating voice quality as a speaker-independent indicator of depression and PTSD (International Speech Communication AssociationFrance, 2013), pp. 847–851.
J. R. Williamson, E. Godoy, M. Cha, A. Schwarzentruber, P. Khorrami, Y. Gwon, H. -T. Kung, C. Dagli, T. F. Quatieri, in Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. Detecting depression using vocal, facial and semantic communication cues (Association for Computing MachineryNew York, 2016), pp. 11–18.
E. -M. Rathner, J. Djamali, Y. Terhorst, B. Schuller, N. Cummins, G. Salamon, C. Hunger-Schoppe, H. Baumeister, How did you like 2017? detection of language markers of depression and narcissism in personal narratives. Future. 1(2.58), 0 (2018).
T. Al Hanai, M. M. Ghassemi, J. R. Glass, in Interspeech. Detecting Depression with Audio/Text Sequence Modeling of Interviews (International Speech Communication AssociationFrance, 2018), pp. 1716–1720.
L. Yang, H. Sahli, X. Xia, E. Pei, M. C. Oveneke, D. Jiang, in Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. Hybrid depression classification and estimation from audio video and text information (Association for Computing MachineryNew York, 2017), pp. 45–51.
M. Rodrigues Makiuchi, T. Warnita, K. Uto, K. Shinoda, in Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop. Multimodal fusion of bert-CNN and gated CNN representations for depression detection (Association for Computing MachineryNew York, 2019), pp. 55–63.
V. Mitra, E. Shriberg, D. Vergyri, B. Knoth, R. M. Salomon, in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference On. Cross-corpus depression prediction from speech (IEEE, 2015), pp. 4769–4773.
J. Novikova, A. Balagopalan, in QinAI Workshop at NeurIPS. On Speech Datasets in Machine Learning for Healthcare (Vancouver, 2019).
A. T. Beck, R. A. Steer, G. K. Brown, Beck depression inventory-ii. San Antonio. 78(2), 490–8 (1996).
M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, M. Pantic, in Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. Avec 2014: 3D dimensional affect and depression recognition challenge (Association for Computing MachineryNew York, 2014), pp. 3–10.
J. Gratch, R. Artstein, G. M. Lucas, G. Stratou, S. Scherer, A. Nazarian, R. Wood, J. Boberg, D. DeVault, S. Marsella, et al, in LREC. The distress analysis interview corpus of human and computer interviews (European Language Resources Association (ELRA), 2014), pp. 3123–3128.
K. Kroenke, T. W. Strine, R. L. Spitzer, J. B. Williams, J. T. Berry, A. H. Mokdad, The phq-8 as a measure of current depression in the general population. J. Affect. Disord.114:, 163–173 (2009).
K. L. Smarr, A. L. Keefer, Measures of depression and depressive symptoms: beck depression inventory-ii (bdi-ii), center for epidemiologic studies depression scale (ces-d), geriatric depression scale (gds), hospital anxiety and depression scale (hads), and patient health questionnaire-9 (phq-9). Arthritis Care Res.63(S11), 454–466 (2011).
S. Kung, R. D. Alarcon, M. D. Williams, K. A. Poppe, M. J. Moore, M. A. Frye, Comparing the beck depression inventory-ii (bdi-ii) and patient health questionnaire (phq-9) depression measures in an integrated mood disorders practice. J. Affect. Disord.145(3), 341–343 (2013).
K. Kroenke, R. L. Spitzer, J. B. Williams, The phq-9: validity of a brief depression severity measure. J. Gen. Intern. Med.16(9), 606–613 (2001).
F. Eyben, M. Wöllmer, B. Schuller, in Proceedings of the 18th ACM International Conference on Multimedia. Opensmile: the munich versatile and fast open-source audio feature extractor (Association for Computing MachineryNew York, 2010), pp. 1459–1462.
C. Ding, H. Peng, Minimum redundancy feature selection from microarray gene expression data. J. Bioinforma. Comput. Biol.3(02), 185–205 (2005).
M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, M. Pantic, in Proceedings of the 3rd ACM International Workshop on Audio/visual Emotion Challenge. Avec 2013: the continuous audio/visual emotion and depression recognition challenge (Association for Computing MachineryNew York, 2013), pp. 3–10.
F. Ringeval, B. Schuller, M. Valstar, J. Gratch, R. Cowie, S. Scherer, S. Mozgai, N. Cummins, M. Schmitt, M. Pantic, in Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. Avec 2017: real-life depression, and affect recognition workshop and challenge (Association for Computing MachineryNew York, 2017), pp. 3–9.