Human and computer recognition of regional accents and ethnic groups from British English speech

Computer Speech & Language - Tập 27 - Trang 59-74 - 2013
A. Hanani1, M.J. Russell1, M.J. Carey1
1School of Electronic, Electrical and Computer Engineering, University of Birmingham, Birmingham B15 2TT, UK

Tài liệu tham khảo

Angkititrakul, 2006, Advances in phone-based modeling for automatic accent classification, IEEE Transactions on Audio, Speech and Language Processing, 14, 634, 10.1109/TSA.2005.851980 Arslan, 1996, Language accent classification in American English, Speech Communication, 18, 353, 10.1016/0167-6393(96)00024-6 Barry, 1989, An approach to the problem of regional accent in automatic speech recognition, Computer Speech and Language, 3, 355, 10.1016/0885-2308(89)90003-X Biadsy, 2009, Spoken Arabic dialect identification using phonotactic modeling Biadsy, 2010, Dialect Recognition using Phone-GMM-Supervector-based SVM Kernel, 753 Burget, 2006, Discriminative training techniques for acoustic language identification, I-209 Campbell, 2004, Phonetic speaker recognition with support vector machines, Advances in Neural Information Processing Systems, 16 Campbell, 2006, Support vector machines using GMM supervectors for speaker verification, IEEE Signal Processing Letters, 13, 308, 10.1109/LSP.2006.870086 Campbell, 2009, A Framework for discriminative SVM/GMM systems for language recognition, 2195 Canu, 2005 D’Arcy, 2005, The accents of the British Isles (ABI) corpus, 115 D’Arcy, S.M., 2007. The effect of age and accent on automatic speech recognition performance. Ph.D. Thesis, University of Birmingham, Birmingham, UK. Elmes, 2005 Furui, 1981, Cepstral analysis technique for automatic speaker verificatio, IEEE Transactions on Acoustics Speech and Signal Processing, 29, 254, 10.1109/TASSP.1981.1163530 Gauvain, 1994, Maximum a-posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE Transactions on Speech and Audio Processing, 2, 291, 10.1109/89.279278 Hanani, 2010, Improved language recognition using mixture component statistics, 741 Hanani, A., Russell, M., Carey, M., 2011. Speech-based identification of social groups in a single accent of British English by humans and computers. In: Proc. IEEE ICASSP 2011, pp. 4876–4879. Hermansky, 1994, RASTA processing of speech, IEEE Transactions on Speech and Audio Processing, 2, 578, 10.1109/89.326616 Huang, 2007, Dialect/accent classification using unrestricted audio, IEEE Transactions on Audio, Speech and Language Processing, 15, 453, 10.1109/TASL.2006.881695 Huckvale, 2007, ACCDIST: an accent similarity metric for accent recognition and diagnosis, 258 Hughes, 2005 Humphries, 1997, Using accent-specific pronunciation modelling for improved large vocabulary continuous speech recognition, 2367 Ikeno, 2006, Perceptual recognition cues in native English accent variation: ‘Listener accent, perceived accent, and comprehension’, 401 Lincoln, 1998, A comparison of two unsupervised approaches to accent identification Matějka, 2005, Phonotactic language identification using high quality phoneme recognition, 2237 Miller, 1996, Statistical dialect classification based on mean phonetic features Minematsu, 2005, Mathematical evidence of the acoustic universal structure in speech, 889 NVIDIA, 2007. NVIDIA CUDA Compute Unified Device Architecture: Programming Guide. Pelecanos, 2001, Feature warping for robust speaker verification, 213 Purnell, 1999, Perceptual and phonetic experiments on American English dialect identification, Journal of Language and Social Psychology, 18, 10, 10.1177/0261927X99018001002 Reynolds, 1995, Large population speaker identification using clean and telephone speech, IEEE Signal Processing Letters, 2, 46, 10.1109/97.372913 Richardson, 2009, Discriminative N-gram selection for dialect recognition The SCRIBE Manual, 1998. http://www.phon.ucl.ac.uk/resource/scribe/scribe-manual.htm. Singer, 2003, Acoustic, phonetic, and discriminative approaches to automatic language identification, 1345 Teixeira, 1997, Recognition of non-native accents, 2375 Tjalve, 2005, Pronunciation variation modelling using accent features Torres-Carrasquillo, 2002, Approaches to language identification using Gaussian mixture models and shifted delta cepstral features, 89 Torres-Carrasquillo, 2008, The MITLL NIST LRE 2007 language recognition system Vair, 2006, Channel factors compensation in model and feature domain for speaker recognition Walton, 1994, Speaker race identification from acoustic cues in the vocal signal, Journal of Speech and Hearing Research, 37, 738, 10.1044/jshr.3704.738 Wells, 1982 Wells, 1982 Woehrling, 2006, Identification of regional accents in French: perception and categorization, 1511 Yamagishi, 2010, Thousands of voices for HMM-based speech synthesis – analysis and application of TTS systems built on various ASR corpora, IEEE Transactions on Audio, Speech and Language Processing, 18, 984, 10.1109/TASL.2010.2045237 Young, 2006 Zhai, 2006, Discriminatively trained language models using support vector machines for language identification, Proc. IEEE Speaker and Language Recognition Workshop, Odyssey 06, 1, 10.1109/ODYSSEY.2006.248098 Zissman, 1996, Comparison of four approaches to automatic language identification of telephone speech, IEEE Transactions on Speech and Audio Processing, 4, 31, 10.1109/TSA.1996.481450 Zissman, 1996, Automatic dialect identification of extemporaneous, conversational, Latin American Spanish speech, 777