NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition

Computer Speech & Language - Tập 61 - Trang 101033 - 2020
Kong Aik Lee1, Hitoshi Yamamoto1, Koji Okabe1, Qiongqiong Wang1, Ling Guo1, Takafumi Koshinaka1, Jiacen Zhang2, Koichi Shinoda2
1Biometrics Research Laboratories, NEC Corp., Kanagawa 211-8666, Japan
2Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan

Tài liệu tham khảo

Alam, 2018, Speaker verification in mismatched conditions with frustratingly easy domain adaptation, 176 Anguera, 2012, Speaker diarization: A review of recent research, IEEE Transactions on Audio, Speech, and Language Processing, 20, 356, 10.1109/TASL.2011.2125954 Bengio, 2000, A neural probabilistic language model, 932 Bhattacharya, 2017, Deep speaker embeddings for short-duration speaker verification, 1517 Bonastre, 2015, Forensic speaker recognition: mirages and reality, 255 Brümmer, 2014, Unsupervised domain adaptation for i-vector speaker recognition, 260 Chowdhury, 2017, Attention-based models for text-dependent speaker verification, arXiv preprint arXiv:1710.10470 Chung, 2018, VoxCeleb2: Deep speaker recognition, 1086 Cui, 2015, Data augmentation for deep neural network acoustic modeling, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 1469, 10.1109/TASLP.2015.2438544 Ferrer, 2011, Promoting robustness for speaker modeling in the community: the PRISM evaluation set Garcia-Romero, 2014, Supervised domain adaptation for i-vector based speaker recognition, 4047 Hansen, 2015, Speaker recognition by machines and humans: a tutorial review, IEEE Signal Processing Magazine, 32, 74, 10.1109/MSP.2015.2462851 Hinton, 2012, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, 29, 82, 10.1109/MSP.2012.2205597 Ioffe, 2006, Probabilistic linear discriminant analysis, 531 Jones, 2017, Call my net corpus: A multilingual corpus for evaluation of speaker recognition technology, 2621 Kenny, 2010, Bayesian speaker verification with heavy-tailed priors Kinnunen, 2010, An overview of text-independent speaker recognition: from features to supervectors, Speech Communication, 52, 12, 10.1016/j.specom.2009.08.009 Kinoshita, 2016, A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research, EURASIP Journal on Advances in Signal Processing, 2016, 7, 10.1186/s13634-016-0306-6 Ko, 2017, A study on data augmentation of reverberant speech for robust speech recognition, 5220 Lee, 2019, I4U submission to NIST SRE 2018: Leveraging from a decade of shared experiences, 1497 Lee, 2013, Speaker verification makes its debut in smartphone, IEEE Signal Processing Society Speech and language Technical Committee Newsletter Lee, 2019, The CORAL+ algorithm for unsupervised domain adaptation of PLDA, 5821 Lee, 2018, The NEC-TT speaker verification system for SRE18, NIST SRE 2018 Workshop Lee, 2019, The NEC-TT 2018 speaker verification system, 4355 Li, 2012, Improving wideband speech recognition using mixed-bandwidth training data in CDDNN-HMM, 131 Li, 2015, DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech, 2575 McLaren, 2016, The speakers in the wild (sitw) speaker recognition database, 818, 10.21437/Interspeech.2016-1129 Mikolov, 2013, Distributed representations of words and phrases and their compositionality, 3111 Nagrani, 2017, Voxceleb: A large-scale speaker identification dataset, 2616 Nidadavolu, 2018, Investigation on bandwidth extension for speaker recognition, 1111 National Institute of Standards, 2018, NIST 2018 Speaker Recognition Evaluation Plan, NIST SRE Okabe, 2018, Attentive statistics pooling for deep speaker embedding, 2252 Peddinti, 2015, A time delay neural network architecture for efficient modeling of long temporal contexts, 3214 Prince, 2007, Probabilistic linear discriminant analysis for inferences about identity, 1 Schroff, 2015, FaceNet: A unified embedding for face recognition and clustering, 815 Sell, 2014, Speaker diarization with PLDA i-vector scoring and unsupervised calibration, 413 Silnova, 2018, Fast variational bayes for heavy-tailed plda applied to i-vectors and x-vectors, 72 Snyder, 2015, MUSAN: a music, speech, and noise corpus Snyder, 2017, Deep neural network embeddings for text-independent speaker verification, 999 Snyder, 2018, X-vectors: Robust DNN embeddings for speaker recognition, 5329 Snyder, 2016, Deep neural network-based speaker embeddings for end-to-end speaker verification, 165 SoX – Sound eXchange http://sox.sourceforge.net/. Strang, 2019 Sun, 2016, Return of frustratingly easy domain adaptation, 2058 Tracey, 2018, Vast: A corpus of video annotation for speech technologies, 4318 Variani, 2014, Deep neural networks for small footprint text-dependent speaker verification, 4052 Vaswani, 2017, Attention is all you need, 5998 Villalba, 2019, State-of-the-art speaker recognition for telephone and video speech: the JHU-MIT submission for NIST SRE18, 1488 Villalba, 2018, The JHU-MIT system description for NIST SRE18, NIST SRE 2018 Workshop Wang, 2018, Attention mechanism in speaker recognition: What does it learn in deep speaker embedding?, 1052 Wang, 2017, What does the speaker embedding encode?, 1497 Yamamoto, 2019, Speaker augmentation and bandwidth extension for deep speaker embedding, 406 Zeinali, 2019, How to improve your speaker embeddings extractor in generic toolkits, 6141 Zhang, 2017, End-to-end text-independent speaker verification with triplet loss on short utterances, 1487