Universal attribute characterization of spoken languages for automatic spoken language recognition

Computer Speech & Language - Tập 27 - Trang 209-227 - 2013
Sabato Marco Siniscalchi1, Jeremy Reed2, Torbjørn Svendsen3, Chin-Hui Lee4
1Faculty of Engineering and Architecture, Kore University of Enna, Cittadella Universitaria, Enna, Sicily, Italy
2Georgia Tech Research Institute, Georgia Institute of Technology, Atlanta, GA 30332, USA
3Department of Electronics and Telecommunications, Norwegian University of Science and Technology, 7491 Trondheim, Norway
4School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

Tài liệu tham khảo

Adami, 2003, Segmentation of speech for speaker and language recognition, 841 Adda-Decker, 2003, Phonetic knowledge, phonotactics and perceptual validation for automatic language identification, 747 Allen, 1994, How do humans process and recognize speech, IEEE Transactions on Speech and Audio Processing, 2, 567, 10.1109/89.326615 Bellegarda, 2000, Exploiting latent semantic information in statistical language modeling, Proceedings of the IEEE, 88, 1279, 10.1109/5.880084 Berkling, 1994, Analysis of phoneme-based features for language identification Campbell, 2005, Support vector machines for speaker and language recognition, Computer Speech and Language, 20, 210 Campbell, 2006, Support vector machines using GMM supervectors for speaker recognition, IEEE Signal Processing Letters, 13, 308, 10.1109/LSP.2006.870086 Corredor-Ardoy, 1997, A multilingual phoneme and model set: towards a universal base for automatic speech recognition, 355 Davis, 1980, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics Speech and Signal Processing, 28, 357, 10.1109/TASSP.1980.1163420 Dehak, 2010, Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech and Language Processing, 19, 788, 10.1109/TASL.2010.2064307 Deng, 2011, Deep convex network: a scalable architecture for deep learning, 2285 Gao, 2006, A maximal figure-of-merit (MFoM)-learning approach to robust classifier design for text categorization, ACM Transactions on Information Systems, 24, 190, 10.1145/1148020.1148022 Gauvain, 2000, Large-vocabulary continuous speech recognition: advances and applications, Proceedings of the IEEE, 88, 1181, 10.1109/5.880079 Hazen, T.J., 1993. Automatic language identification using a segment-based approach. Ph.D. Thesis, M.S. Thesis. Mass. Inst. Technol., Cambridge, MA, USA. Katagiri, 1998, Pattern recognition using a family of design algorithms based upon generalized probability descent method, Proceedings of the IEEE, 86, 2345, 10.1109/5.726793 Kirchhoff, K., 1999. Robust speech recognition using articulatory information. Ph.D. Thesis. University of Bielefeld, Germany. Kirchhoff, 2002, Mixed-memory Markov models for automatic language identification Lee, 2004, From knowledge-ignorant to knowledge-rich modeling: a new speech research paradigm for next generation automatic speech recognition, 109 Lee, 2000, On adaptive decision rules and decision parameter adaptation for automatic speech recognition, Proceedings of the IEEE, 88, 1241, 10.1109/5.880082 Lee, 1988, A segment model based approach to speech recognition, 501 1996 Li, 2007, A vector space modeling approach to spoken language identification, IEEE Transactions on Audio, Speech and Language Processing, 15, 271, 10.1109/TASL.2006.876860 Martin, 2006, The current state of language recognition: NIST 2005 evaluation results, 1 Martin, 2003, NIST 2003 language recognition evaluation, 1341 Martinez, 2011, Language recognition in iVectors space, 861 Matrouf, 1998, Language identification incorporating lexical information Matějka, 2005, Phonotactic language identification using high quality phoneme recognition, 2237 Mohamed, 2009, Deep belief networks for phone recognition Muthusamy, 1992, The OGI multi-language telephone speech corpus, 895 Muthusamy, 1994, Perceptual benchmarks for automatic language identification Rabiner, 1999, A tutorial on hidden Markov models and selected application in speech recognition, Proceedings of the IEEE, 77, 257, 10.1109/5.18626 Rabiner, 1993 Salton, 1971 Schwarz, 2006, Hierarchical structures of neural networks for phoneme recognition, 325 Seide, 2011, Conversational speech transcription using context-dependent deep neural networks, 437 Singer, 2003, Acoustic, phonetic, and discriminative approaches to automatic language recognition, 1345 Siniscalchi, 2012, Experiments on cross-language attribute detection and phone recognition with minimal target specific training data, IEEE Transactions on Audio, Speech and Language Processing, 20, 875, 10.1109/TASL.2011.2167610 Siniscalchi, 2010, Exploiting context-dependency and acoustic resolution of universal speech attribute models in spoken language recognition, 2718 Siniscalchi, 2009, Exploring universal attribute characterization of spoken languages for spoken language recognition, 168 Siniscalchi, 2008, Toward a detector-based universal phone recognizer, 4261 Soufifar, 2011, iVector approach to phonotactic language recognition, 2913 Stüker, 2003, Multilingual articulatory features Sugiyama, 1991, Automatic language recognition using acoustic features, 813 Torres-Carrasquillo, 2002, Approaches to language identification using Gaussian mixture models and shifted delta cepstral features, 89 Yu, 2012, Boosting attribute and phone estimation accuracy with deep neural networks for detection-based speech recognition, 4169, 10.1109/ICASSP.2012.6288837 Zissman, 1996, Comparison of four approaches to automatic language identification of telephone speech, IEEE Transactions on Speech and Audio Processing, 4, 31, 10.1109/TSA.1996.481450