End-to-end acoustic modelling for phone recognition of young readers
Tài liệu tham khảo
Abad, 2020, Cross lingual transfer learning for zero-resource domain adaptation, 6909
Airaksinen, 2019, Data augmentation strategies for neural network F0 estimation, 6485
Andrew, 2015, Acoustic modelling with CD-CTC-SMBR LSTM RNNS, 604
Bahdanau, 2015
Bayerl, 2019, A comparison of hybrid and end-to-end models for syllable recognition, 352
Bengio, 2015, Scheduled sampling for sequence prediction with recurrent neural networks, 1171
Bolaños, 2011, FLORA: Fluent oral reading assessment of children’s speech, ACM Trans. Speech Lang. Process., 7, 16, 10.1145/1998384.1998390
Chan, 2016, Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, 4960
Chen, 2020
Chiu, 2018, State-of-the-art speech recognition with sequence-to-sequence models, 4774
Cho, 2018, Multilingual sequence-to-sequence speech recognition: Architecture, transfer learning, and language modeling, 521
Chorowski, J., Bahdanau, D., Cho, K., Bengio, Y., 2014. End-to-end continuous speech recognition using attention-based recurrent NN: First results. In: Proc. of the International Conference on Neural Information Processing Systems (NIPS): Workshop on Deep Learning. pp. 1–10.
Chorowski, 2015, Attention-based models for speech recognition, 577
Chung, J., Gulcehre, C., Cho, K., Bengio, Y., 2015. Gated feedback recurrent neural networks. In: Proc. of the International Conference on Machine Learning (ICML), Vol. 37. pp. 2067–2075.
Dong, 2018, Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition, 5884
Duan, 2020, Cross-lingual transfer learning of non-native acoustic modeling for pronunciation error detection and diagnosis, IEEE/ACM Trans. Audio Speech Lang. Process., 28, 391, 10.1109/TASLP.2019.2955858
Fringi, E., Lehman, J.F., Russell, M.J., 2015. Evidence of phonological processes in automatic recognition of children’s speech. In: Proc. of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Dresden. pp. 1621–1624.
Gales, 2008, The application of hidden Markov models in speech recognition, Found. Trends Signal Process., 1, 195, 10.1561/2000000004
Gerosa, 2006, Acoustic analysis and automatic recognition of spontaneous children’s speech, 1886
Gibson, 2018, Multi-condition deep neural network training, 77
Godde, 2017, Evaluation of reading performance of primary school children: Objective measurements vs. subjective ratings, 23
Graves, 2006, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, 369
Graves, 2013, Speech recognition with deep recurrent neural networks, 6645
He, 2016, Deep residual learning for image recognition, 770
Karita, 2019, Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration, 1408
Karita, 2019, A comparative study on transformer vs RNN in speech applications, 449
Lee, 1999, Acoustics of children’s speech: developmental changes of temporal and spectral parameters, J. Acoust. Soc. Am., 105, 1455, 10.1121/1.426686
Lu, L., Zhang, X., Cho, K., Renals, S., 2015. A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition. In: Proc. of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Dresden. pp. 3249–3253.
Metallinou, A., Cheng, J., 2014. Using deep neural networks to improve proficiency assessment for children english language learners. In: Proc. of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Singapore. pp. 1468–1472.
Mihaylova, 2019, Scheduled sampling for transformers, 351
Mostow, 2001, Evaluating tutors that listen: An overview of project LISTEN, 169
Mugitani, 2012, Development of vocal tract and acoustic features in children, J. Acoust. Soc. Japan, 68, 234
Ng, 2020
Potamianos, 1998, Spoken dialog systems for children, 197
Potamianos, 2003, Robust recognition of children’s speech, IEEE Trans. Speech Audio Process., 11, 603, 10.1109/TSA.2003.818026
Potamianos, 2007, A review of the acoustic and linguistic properties of children’s speech, 22
Povey, 2018, Semi-orthogonal low-rank matrix factorization for deep neural networks, 3743
Povey, 2011, The kaldi speech recognition toolkit, 1
Povey, 2016, Purely sequence-trained neural networks for ASR based on lattice-free MMI, 2751
Proença, 2018
Qian, 2016, Improving DNN-based automatic recognition of non-native children speech with adult speech, 40
Serizel, R., Giuliani, D., 2014. Deep neural network adaptation for children’s and adults’ speech recognition. In: Proc. of the Italian Computational Linguistics Conference (CLiC-It). pp. 137–140.
Shivakumar, 2020, Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations, Comput. Speech Lang., 63
Shivakumar, 2021
Sutskever, I., Vinyals, O., Le, Q.V., 2014. Sequence to sequence learning with neural networks. In: Proc. of the International Conference on Neural Information Processing Systems (NIPS). Cambridge, MA, USA. pp. 3104–3112.
Tong, 2017, Multilingual training and cross-lingual adaptation on CTC-based acoustic model, Speech Commun., 104
Tong, 2017, Transfer learning for children’s speech recognition, 36
Vaswani, 2017, Attention is all you need, 6000
Veselý, K., Ghoshal, A., Burget, L., Povey, D., 2013. Sequence-discriminative training of deep neural networks. In: Proc. of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Lyon. pp. 2345–2349.
Vinyals, O., Le, Q., 2015. A neural conversational model. In: Proc. of the International Conference on Machine Learning (ICML): Deep Learning Workshop.
Vinyals, 2015, Show and tell: A neural image caption generator, 3156
Waibel, 1989, Phoneme recognition using time-delay neural networks, IEEE Trans. Acoust. Speech Signal Process., 37, 328, 10.1109/29.21701
Watanabe, 2017, Hybrid CTC/Attention architecture for end-to-end speech recognition, IEEE J. Sel. Top. Sign. Proces., 11, 1240, 10.1109/JSTSP.2017.2763455
Wu, 2019, Advances in automatic speech recognition for child speech using factored time delay neural network, 1
Xu, 2015, Show, attend and tell: Neural image caption generation with visual attention, 2048
Yeung, 2018, On the difficulties of automatic speech recognition for kindergarten-aged children, 1661
Yong, 2011, Speaker-independent vowel recognition for malay children using time-delay neural network, 565
Yu, 2020
Zhou, 2019