A segmental framework for fully-unsupervised large-vocabulary speech recognition

Computer Speech & Language - Tập 46 - Trang 154-174 - 2017
Herman Kamper1, Aren Jansen2, Sharon Goldwater1
1School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK
2Google, Inc., Mountain View 94043, CA , USA

Tài liệu tham khảo

Abdel-Hamid, 2013, Deep segmental neural networks for speech recognition Badino, 2014, An auto-encoder based approach to unsupervised learning of subword units Badino, 2015, Discovering discrete subword units with binarized autoencoders and hidden-Markov-model encoders Bisani, 2004, Bootstrap estimates for confidence intervals in ASR performance evaluation, 409 Bortfeld, 2005, Mommy and me: familiar names help launch babies into speech-stream segmentation, Psychol. Sci., 16, 298, 10.1111/j.0956-7976.2005.01531.x Chen, 2015, Parallel inference of Dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study Chung, 2013, Unsupervised discovery of linguistic structure including two-level acoustic patterns using three cascaded stages of iterative optimization De Vries, 2014, A smartphone-based ASR data collection tool for under-resourced languages, Speech Commun., 56, 119, 10.1016/j.specom.2013.07.001 Dredze, 2010, NLP on spoken documents without ASR Eimas, 1999, Segmental and syllabic representations in the perception of speech by young infants, J. Acoust. Soc. Am., 105, 1901, 10.1121/1.426726 Feldman, 2009, Learning phonetic categories by learning a lexicon Gillick, 2011, Don’t multiply lightly: quantifying problems with the acoustic model assumptions in speech recognition Gish, 2009, Unsupervised training of an HMM-based speech recognizer for topic classification Goldwater, 2007, A fully Bayesian approach to unsupervised part-of-speech tagging Goldwater, 2009, A Bayesian framework for word segmentation: exploring the effects of context, Cognition, 112, 21, 10.1016/j.cognition.2009.03.008 Heymann, 2013, Unsupervised word segmentation from noisy input Jansen, 2011, Towards unsupervised training of speaker independent acoustic models Jansen, 2013, A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition Jansen, 2013, Weak top-down constraints for unsupervised acoustic model training Jansen, 2011, Efficient spoken term discovery using randomized algorithms Kamper, 2015, Unsupervised neural network based feature extraction using weak top-down constraints Kamper, 2015, Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian model Kamper, 2016, Unsupervised word segmentation and lexicon discovery using acoustic word embeddings, IEEE/ACM Trans. Audio Speech Lang. Process., 24, 669, 10.1109/TASLP.2016.2517567 Kamper, 2014, Unsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings Kamper, 2016, Deep convolutional acoustic word embeddings using word-pair side information Lee, 2012, A nonparametric Bayesian approach to acoustic model discovery Lee, 2015, Unsupervised lexicon discovery from acoustic input, Trans. ACL, 3, 389 Lee, 2013, Enhanced spoken term detection using support vector machines and weighted pseudo examples, IEEE Trans. Audio Speech Lang. Process., 21, 1272, 10.1109/TASL.2013.2248721 Levin, 2013, Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings Levin, 2015, Segmental acoustic indexing for zero resource keyword search Ludusan, 2014, Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems Lyzinski, 2015, An evaluation of graph clustering methods for unsupervised term discovery Martin, 2015, Utterance classification in speech-to-speech translation for zero-resource languages in the hospital administration domain McQueen, 1998, Segmentation of continuous speech using phonotactics, J. Mem. Lang., 39, 21, 10.1006/jmla.1998.2568 Mochihashi, 2009, Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling Murphy, K. P., 2007. Conjugate Bayesian analysis of the Gaussian distribution. URL: http://www.cs.ubc.ca/~murphyk/mypapers.html. Murphy, 2012 Neubig, 2010, Learning a language model from continuous speech Park, 2008, Unsupervised pattern discovery in speech, IEEE Trans. Audio Speech Lang. Process., 16, 186, 10.1109/TASL.2007.909282 Pitt, 2005, The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability, Speech Commun., 45, 89, 10.1016/j.specom.2004.09.001 Räsänen, 2012, Computational modeling of phonetic and lexical learning in early language acquisition: existing models and future directions, Speech Commun., 54, 975, 10.1016/j.specom.2012.05.001 Räsänen, 2015, Unsupervised word discovery from speech using automatic segmentation into syllable-like units Räsänen, 2017, Pre-linguistic rhythmic segmentation of speech into syllabic units, Completed for submission Renshaw, 2015, A comparison of neural network methods for unsupervised representation learning on the Zero Resource Speech Challenge Resnik, 2010, Gibbs sampling for the uninitiated Scott, 2002, Bayesian methods for hidden Markov models, J. Am. Stat. Assoc., 97, 337, 10.1198/016214502753479464 Shum, 2016, On the use of acoustic unit discovery for language recognition, IEEE Trans. Acoust. Speech Signal Process., 24, 1665 Siu, 2014, Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery, Comput. Speech Lang., 28, 210, 10.1016/j.csl.2013.05.002 Sun, 2013, Joint training of non-negative Tucker decomposition and discrete density hidden Markov models, Comput. Speech Lang., 27, 969, 10.1016/j.csl.2012.09.006 Synnaeve, 2014, Phonetics embedding learning with side information Taniguchi, 2016, Symbol emergence in robotics: a survey, Adv. Robotics, 30, 706, 10.1080/01691864.2016.1164622 Thiollière, 2015, A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling Varadarajan, 2008, Unsupervised learning of acoustic sub-word units Versteegh, 2016, The zero resource speech challenge 2015: proposed approaches and results Versteegh, 2015, The Zero Resource Speech Challenge 2015 Walter, 2013, A hierarchical system for word discovery exploiting DTW-based initialization Wilkinson, 2016, Deriving phonetic transcriptions and discovering word segmentations for speech-to-speech translation in low-resource settings, 10.21437/Interspeech.2016-1319 Zeghidour, 2016, Joint learning of speaker and phonetic similarities with Siamese networks, 10.21437/Interspeech.2016-811 Zeghidour, 2016, A deep scattering spectrum-deep Siamese network pipeline for unsupervised acoustic modeling Zeiler, 2013, On rectified linear units for speech processing Zhang, 2010, Towards multi-speaker unsupervised speech pattern discovery Zhang, 2012, Resource configurable spoken query detection using deep Boltzmann machines Zweig, 2010, SCARF: a segmental conditional random field toolkit for speech recognition