Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection

Computer Speech & Language - Tập 38 - Trang 1-12 - 2016
Inyoung Hwang1, Hyung-Min Park2, Joon-Hyuk Chang1
1School of Electronic and Computer Engineering, Hanyang University, Seoul 133-791, Republic of Korea
2School of Electronic Engineering, Sogang University, Seoul 121-742, Republic of Korea

Tài liệu tham khảo

Bishop, 1995 Chang, 2001, Speech enhancement: new approaches to soft decision, IEICE Trans. Syst. Inf., E84-D, 1231 Choi, 2012, On using environment classification for statistical model-based speech enhancement, Speech Commun., 54, 477, 10.1016/j.specom.2011.10.009 Ephraim, 1984, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., 32, 1109, 10.1109/TASSP.1984.1164453 Garofolo, 1993, TIMIT acoustic phonetic continuous speech corpus Hinton, 2006, Reducing the dimensionality of data with neural networks, Science, 313, 504, 10.1126/science.1127647 Hinton, 2006, A faster learning algorithm for deep belief nets, Neural Comput., 18, 1527, 10.1162/neco.2006.18.7.1527 Hinton, 2012, Deep neural networks for acoustic modeling in speech recognition, IEEE Signal Process. Mag., 29, 82, 10.1109/MSP.2012.2205597 Hirsch, 2000, The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions Jo, 2009, Statistical model-based voice activity detection using support vector machine, IET Signal Process., 3, 205, 10.1049/iet-spr.2008.0128 Kang, 2008, Discriminative weight training for a statistical model-based voice activity detection, IEEE Signal Process. Lett., 15, 170, 10.1109/LSP.2007.913595 Lee, 2007, Sparse deep belief net model for visual area v2 Mohamed, 2009, Deep belief networks for phone recognition Mohamed, 2012, Acoustic modeling using deep belief networks, IEEE Trans. Audio Speech Lang. Process., 20, 14, 10.1109/TASL.2011.2109382 Platt, 2000, Probabilistic outputs for support vector machines and comparison to regularized likelihood methods Ryant, 2013, Speech activity detection on YouTube using deep neural networks Sangwan, 2007, Environmentally aware voice activity detector Sohn, 1999, A statistical model-based voice activity detection, IEEE Signal Process. Lett., 1, 1, 10.1109/97.736233 Varga, 1993, Assessment for automatic speech recognition, II-NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems, Speech Commum., 12, 247, 10.1016/0167-6393(93)90095-3 Xia, 2014, Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification, Speech Commun., 60, 13, 10.1016/j.specom.2014.02.001 Yu, 2010, Discriminative training for multiple observation likelihood ratio based voice activity detection, IEEE Signal Process. Lett., 17, 897, 10.1109/LSP.2010.2066561 Zhang, 2013, Denoising deep neural networks based voice activity detection Zhang, 2013, Deep belief networks based voice activity detection, IEEE Trans. Audio Speech Lang. Process., 21, 3371