Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

IEEE Signal Processing Magazine - Tập 29 Số 6 - Trang 82-97 - 2012
Geoffrey E. Hinton1, Li Deng2, Dong Yu3, George E. Dahl1, Abdelrahman Mohamed4, Navdeep Jaitly1, Andrew Senior5, Vincent Vanhoucke6, Patrick Nguyen6, Tara N. Sainath7, Brian Kingsbury8
1[Computer Science, Univ. Toronto, Toronto, Canada]
2Department of Electrical and Computer Engineering, University of Waterloo, ONT, Canada
3Microsoft Research, Redmond, Washington, USA
4Department of Computer Science, University of Toronto, Toronto, M5S 3G4 Canada
5Google Inc., USA
6Research, Google, Mountainview, California USA
7IBM Thomas J Watson Research Center, USA.
8Electrical engineering, Michigan State University, East Lansing, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

lee, 2009, Unsupervised feature learning for audio classification using convolutional deep belief networks, Advances in Neural Information Processing Systems 22, 1096

10.1109/ICASSP.2010.5495222

dahl, 2010, Phone recognition with the mean-covariance restricted Boltzmann machine, Advances in Neural Information Processing Systems 23, 469

10.1109/TASL.2011.2155060

mohamed, 0, Investigation of full-sequence training of deep belief networks for speech recognition, Proc INTERSPEECH, 2846

halberstadt, 0, Heterogeneous measurements and multiple classifiers for speech recognition, Proc ICSLP

10.1109/ICASSP.2009.4960445

10.1109/IJCNN.1991.155435

he, 2008, Discriminative learning in sequential pattern recognition—A unifying review for optimization-oriented speech recognition, IEEE Signal Processing Mag, 25, 14, 10.1109/MSP.2008.926652

10.1109/ICASSP.2012.6288864

10.1109/TASL.2011.2116010

10.1109/ICASSP.2012.6288833

10.1109/TASL.2011.2129510

10.1109/MSP.2005.1511826

10.1109/ICASSP.1998.674454

10.1109/ICASSP.2011.5947378

10.1109/72.279192

10.1109/ICASSP.2012.6288837

10.1121/1.409839

deng, 0, Use of differential cepstra as acoustic features in hidden trajectory modelling for phonetic recognition, Proc ICASSP, 445

10.1121/1.1420380

10.1006/csla.2001.0182

furui, 2000, Digital Speech Processing, Synthesis, and Recognition

10.1109/ICASSP.2007.367023

10.1109/MSP.2009.932166

10.1162/089976602760128018

10.1162/neco.2006.18.7.1527

hinton, 2010, A practical guide to training restricted Boltzmann machines, Tech Rep UTML TR 2010-003

10.1109/ICASSP.2011.5947494

10.1109/ASRU.2009.5373263

10.1109/TASL.2008.2010286

10.1109/ICASSP.2012.6288863

sainath, 2011, Improvements in using deep belief networks for large vocabulary continuous speech recognition, Speech and Language Algorithm Group IBM Yorktown Heights NY Tech Rep UTML TR 2010-003

deng, 0, Deep convex network: A scalable architecture for speech pattern classification, Proc INTERSPEECH, 2285

martens, 0, Deep learning via Hessian-free optimization, Proc 27th Int Conf Machine Learning, 735

le, 0, On optimization methods for deep learning, Proc 28th Int Conf Machine Learning, 265

10.1109/ICASSP.2012.6288994

plahl, 0, Improved pretraining of deep belief networks using sparse encoding symmetric machines, Proc ICASSP, 4165

rifai, 0, Contractive autoencoders: Explicit invariance during feature extraction, Proc 28th Int Conf Machine Learning, 833

vincent, 2010, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion, J Mach Learn Res, 11, 3371

yu, 2011, Discriminative pretraining of deep neural networks, U S Patent Filing

10.1109/ICASSP.2012.6288333

10.1007/978-3-642-60087-6_20

10.1109/TASL.2006.878265

deng, 2003, Switching dynamic system models for speech articulation and acoustics, Mathematical Foundations of Speech and Language Processing, 115

mohamed, 0, Deep belief networks for phone recognition, Proc NIPS Workshop Deep Learning for Speech Recognition and Related Applications

10.1109/TASL.2011.2109382

10.1038/323533a0

glorot, 0, Understanding the difficulty of training deep feedforward neural networks, Proc AISTATS, 249

10.1162/NECO_a_00052

10.1126/science.1127647

10.1145/1273496.1273556

pearl, 1988, Probabilistic Inference in Intelligent Systems Networks of Plausible Inference

10.1121/1.399423

10.1109/TIT.1986.1057145

10.1109/79.536824

10.1109/TASSP.1981.1163530

10.1109/ICASSP.2000.862024

vanhoucke, 2011, Improving the speed of neural networks on CPUs, Proc Deep Learning and Unsupervised Feature Learning NIPS Workshop

10.1109/ICASSP.1986.1169179

bourlard, 1993, Connectionist Speech Recognition A Hybrid Approach

povey, 0, Boosted MMI for model and feature-space discriminative training, Proc ICASSP, 4057

10.1109/ASRU.2011.6163899

zweig, 0, Speech recognition with segmental conditional random fields: A summary of the JHU CLSP 2010 summer workshop, Proc ICASSP, 5044

jaitly, 0, An application of pretrained deep neural networks to large vocabulary speech recognition

10.1109/TASL.2011.2134090

10.1109/TASL.2011.2165280

yu, 0, Roles of pretraining and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition, Proc NIPS Workshop Deep Learning and Unsupervised Feature Learning

seide, 0, Conversational speech transcription using context-dependent deep neural networks, Proc INTERSPEECH, 437