Identification of related languages from spoken data: Moving from off-line to on-line scenario

Computer Speech & Language - Tập 68 - Trang 101180 - 2021
Petr Cerva1, Lukas Mateju1, Jindrich Zdansky1, Radek Safarik1, Jan Nouza1
1Faculty of Mechatronics, Informatics and Interdisciplinary Studies, Technical University of Liberec, Studentska 2, Liberec 461 17, Czech Republic

Tài liệu tham khảo

web. NIST Language Recognition Evaluations2020. http://nist.gov/itl/iad/mig/lre.cfm, Online (accessed: 2020-05-20). LRE. 2015. The 2015 NIST language recognition evaluation plan (LRE15). LRE, 2017. NIST 2017 language recognition evaluation plan. Abdullah, B. M., Avgustinova, T., Möbius, B., Klakow, D., 2020. Cross-domain adaptation of spoken language identification for related languages: the curious case of slavic languages. 2008.00545. Cai, 2019, Utterance-level end-to-end language identification using attention-based CNN-BLSTM, 5991 Cai, 2018, Insights in-to-end learning scheme for language identification, 5209 Cai, 2018, A novel learnable dictionary encoding layer for end-to-end language identification, 5189 Cai, 2018, Exploring the encoding layer and loss function in end-to-end speaker and language recognition system, 74 Caseiro, 1998, Spoken language identification using the speechdat corpus, 1 Dahl, 2012, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Trans. Audio Speech Lang. Process., 20, 30, 10.1109/TASL.2011.2134090 Dehak, 2011, Language recognition via i-vectors and dimensionality reduction, 857 D’Haro, 2014, Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition, 5342 Fer, 2015, Multilingual bottleneck features for language recognition, 389 Fer, 2017, Multilingually trained bottleneck features in spoken language recognition, Comput. Speech Lang., 46, 252, 10.1016/j.csl.2017.06.008 Fernando, 2017, Bidirectional modelling for short duration language identification, 2809 Ferrer, 2016, Study of senone-based deep neural network approaches for spoken language recognition, IEEE/ACM Trans. Audio Speech Lang. Process., 24, 105, 10.1109/TASLP.2015.2496226 Garcia-Romero, 2016, Stacked long-term TDNN for spoken language recognition, 3226 Gauvain, 2004, Language recognition using phone latices, 1283 Gelly, 2017, Spoken language identification using LSTM-based angular proximity, 2566 Gelly, 2016, A divide-and-conquer approach for language identification based on recurrent neural networks, 3231 Geng, 2016, End-to-end language identification using attention-based recurrent neural networks, 2944 Geng, 2016, Gating recurrent enhanced memory neural networks on language identification, 3280 Gonzalez, 2011, Language recognition in ivectors space, 861 Gonzalez-Dominguez, 2014, Automatic language identification using long short-term memory recurrent neural networks, 2155 Griol, 2020, A data-driven approach to spoken dialog segmentation, Neurocomputing, 391, 292, 10.1016/j.neucom.2019.02.072 Jin, 2018, Lid-senones and their statistics for language identification, IEEE/ACM Trans. Audio Speech Lang. Process., 26, 171, 10.1109/TASLP.2017.2766023 Li, 2007, A vector space modeling approach to spoken language identification, IEEE Trans. Audio Speech Lang. Process., 15, 271, 10.1109/TASL.2006.876860 Li, 2013, Spoken language recognition: from fundamentals to practice, Proc. IEEE, 101, 1136, 10.1109/JPROC.2012.2237151 Lim, 2010, Real-time spoken language identification and recognition for speech-to-speech translation, 307 Liu, 2017, A survey of deep neural network architectures and their applications, Neurocomputing, 234, 11, 10.1016/j.neucom.2016.12.038 Lopez, 2018, End-to-end versus embedding neural networks for language recognition in mismatched conditions, 112 Lopez-Moreno, 2014, Automatic language identification using deep neural networks, 5337 Lozano-Diez, 2018, DNN based embeddings for language recognition, 5184 Lozano-Diez, 2015, An end-to-end approach to language identification in short utterances using convolutional neural networks, 403 Malek, 2019, On practical aspects of multi-condition training based on augmentation for reverberation-/noise-robust speech recognition, 251 Malek, 2018, Robust recognition of conversational telephone speech via multi-condition training and data augmentation, 324 Masumura, 2017, Parallel phonetically aware DNNS and LSTM-RNNS for frame-by-frame discriminative modeling of spoken language identification, 5260 Mateju, 2019, An approach to online speaker change point detection using DNNs and WFSTs, 649 Mateju, 2017, Speech activity detection in online broadcast transcription using deep neural networks and weighted finite state transducers, 5460 Mateju, 2018, Using deep neural networks for identification of slavic languages from acoustic signal, 1803 McLaren, 2016, Exploring the role of phonetic bottleneck features for speaker and language recognition, 5575 Miao, 2019, A new time-frequency attention mechanism for TDNN and cnn-lstm-tdnn, with application to language identification, 4080 Mingote, 2019, Language recognition using triplet neural networks, 4025 Nouza, 2016, ASR for south slavic languages developed in almost automated way, 3868 Okamoto, 2017, Reducing latency for language identification based on large-vocabulary continuous speech recognition, Acoust. Sci. Technol., 38, 38, 10.1250/ast.38.38 Padi, 2019, Attention based hybrid i-vector BLSTM model for language recognition, 1263 Padi, 2019, End-to-end language recognition using attention based hierarchical gated recurrent unit models, 5966 Pesan, 2016, Sequence summarizing neural networks for spoken language recognition, 3285 Povey, 2011, The Kaldi speech recognition toolkit, 1 Rasanen, 2009, An improved speech segmentation quality measure: the r-value, 1851 Richardson, 2015, Deep neural network approaches to speaker and language recognition, IEEE Signal Process. Lett., 22, 1671, 10.1109/LSP.2015.2420092 Richardson, 2015, A unified deep neural network for speaker and language recognition, 1146 Singer, 2012, The MITLL NIST LRE 2011 language recognition system, 209 Siniscalchi, 2014, An artificial neural network approach to automatic speech processing, Neurocomputing, 140, 326, 10.1016/j.neucom.2014.03.005 Snyder, 2018, Spoken language recognition using x-vectors, 105 Snyder, 2018, X-vectors: robust DNN embeddings for speaker recognition, 5329 Song, 2015, Deep bottleneck network based i-vector representation for language identification, 398 V., 2016, An investigation of deep neural network architectures for language recognition in indian languages, 2930 Wan, 2019, Tuplemax loss for language identification, 5976 Zazo, 2016, Evaluation of an LSTM-RNN system in different NIST language recognition frameworks, 231 Zhang, 2015, Feedforward sequential memory networks: A new structure to learn long-term dependency, CoRR Zissman, 1996, Comparison of four approaches to automatic language identification of telephone speech, IEEE Trans. Audio Speech Process., 4, 31, 10.1109/TSA.1996.481450