A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments
Tài liệu tham khảo
Allen, 1977, A unified approach to short-time fourier analysis and synthesis, Proc. IEEE, 65, 1558, 10.1109/PROC.1977.10770
Benesty, 2005
Boll, 1979, Suppression of acoustic noise in speech using spectral subtraction, Acoust. Speech Signal Process. IEEE Transa., 27, 113, 10.1109/TASSP.1979.1163209
Cohen, 2001, Speech enhancement for non-stationary noise environments, Signal Process., 81, 2403, 10.1016/S0165-1684(01)00128-1
Dahl, 2012, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, Audio Speech Lang. Process. IEEE Trans., 20, 30, 10.1109/TASL.2011.2134090
Du, 2008, A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions, 569
Du, 2016, A regression approach to single-channel speech separation via high-resolution deep neural networks, Audio Speech Lang. Process. IEEE/ACM Trans., 24, 1424, 10.1109/TASLP.2016.2558822
Du, 2014, Speech separation of a target speaker based on deep neural networks, 473
Ephraim, 1984, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, Acoustics Speech Signal Processing IEEE Trans, 32, 1109, 10.1109/TASSP.1984.1164453
Ephraim, 1985, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, Acoustics Speech Signal Process. IEEE Trans., 33, 443, 10.1109/TASSP.1985.1164550
Fan, 2014, Speech enhancement using segmental nonnegative matrix factorization, 4483
Fu, 2016, SNR-aware convolutional neural network modeling for speech enhancement, 3768, 10.21437/Interspeech.2016-211
Gao, 2015, A unified speaker-dependent speech separation and enhancement system based on deep neural networks, 687
Gao, 2015, Improving deep neural network based speech enhancement in low SNR environments, 75
Hinton, 2012, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Process. Mag., 29, 82, 10.1109/MSP.2012.2205597
Hinton, 2002, Training products of experts by minimizing contrastive divergence, Neural Comput., 14, 1771, 10.1162/089976602760128018
Hinton, 2006, A fast learning algorithm for deep belief nets, Neural Comput., 18, 1527, 10.1162/neco.2006.18.7.1527
Hu, 2010, A tandem algorithm for pitch estimation and voiced speech segregation, Audio Speech Lang. Process. IEEE Trans., 18, 2067, 10.1109/TASL.2010.2041110
Hu, 2013, An unsupervised approach to cochannel speech separation, Audio SpeechLang. Process. IEEE Trans., 21, 122, 10.1109/TASL.2012.2215591
Hu, 2008, Evaluation of objective quality measures for speech enhancement, IEEE Trans. Audio Speech Lang. Process., 16, 229, 10.1109/TASL.2007.911054
Huang, 2014, Deep learning for monaural speech separation, 1562
Huang, 2015, Joint optimization of masks and deep recurrent neural networks for monaural source separation, Audio Speech Lang. Process. IEEE/ACM Trans., 23, 2136, 10.1109/TASLP.2015.2468583
Hwang, 2016, Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection, Comput. Speech Lang., 38, 1, 10.1016/j.csl.2015.11.003
Kamath, 2002, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, 4, IV
Kim, 2015, Adaptive denoising autoencoders: a fine-tuning scheme to learn from test mixtures, 100
Kristjansson, 2004, Single microphone source separation using high resolution signal reconstruction, 2, ii
Lim, 1978, All-pole modeling of degraded speech, Acoustics Speech Signal Process. IEEE Trans., 26, 197, 10.1109/TASSP.1978.1163086
Loizou, 2013
McAulay, 1980, Speech enhancement using a soft-decision noise suppression filter, Acoustics Speech Signal Process. IEEE Trans., 28, 137, 10.1109/TASSP.1980.1163394
Mohammadiha, 2013, Supervised and unsupervised speech enhancement using nonnegative matrix factorization, Audio Speech, Lang. Process. IEEE Trans., 21, 2140, 10.1109/TASL.2013.2270369
Povey, 2011, The kaldi speech recognition toolkit
Roweis, 2000, One microphone source separation, 13, 793
Roweis, 2003, Factorial models and refiltering for speech separation and denoising, 1009
Schmidt, 2006, Single-channel speech separation using sparse non-negative matrix factorization
Shao, 2006, Model-based sequential organization in cochannel speech, Audio Speech Lang. Process. IEEE Trans., 14, 289, 10.1109/TSA.2005.854106
Tu, 2014, Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers, 250
Varga, 1993, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems, Speech Commun., 12, 247, 10.1016/0167-6393(93)90095-3
Vincent, 2006, Performance measurement in blind audio source separation, IEEE Trans. Audio Speech Lang. Process., 14, 1462, 10.1109/TSA.2005.858005
Wang, 1999, Separation of speech from interfering sounds based on oscillatory correlation, Neural Netwo. IEEE Transa., 10, 684, 10.1109/72.761727
Wang, 1999, Separation of speech from interfering sounds based on oscillatory correlation, Neural Netw. IEEE Trans., 10, 684, 10.1109/72.761727
Wang, 2006
Wang, 2015, A universal VAD based on jointly trained deep neural networks, 2282
Wang, 2014, On training targets for supervised speech separation, Audio Speech Lang. Process. IEEE/ACM Trans., 22, 1849, 10.1109/TASLP.2014.2352935
Wang, 2013, Towards scaling up classification-based speech separation, Audio Speech Lang. Process. IEEE Trans., 21, 1381, 10.1109/TASL.2013.2250961
Weninger, 2015, Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, 91
Wu, 2003, A multipitch tracking algorithm for noisy speech, Speech Audio Process. IEEE Trans., 11, 229, 10.1109/TSA.2003.811539
Xu, 2014, Dynamic noise aware training for speech enhancement based on deep neural networks, 2670
Xu, 2014, An experimental study on speech enhancement based on deep neural networks, Signal Process. Lett. IEEE, 21, 65, 10.1109/LSP.2013.2291240
Xu, 2014, Global variance equalization for improving deep neural network based speech enhancement, 71
Xu, 2015, A regression approach to speech enhancement based on deep neural networks, Audio Speech Lang. Process. IEEE/ACM Trans., 23, 7, 10.1109/TASLP.2014.2364452
Xu, 2015, Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement, 1508
Zazo, 2016, Feature learning with raw-waveform CLDNNs for voice activity detection, 3668, 10.21437/Interspeech.2016-268
Zhang, 2016, Boosting contextual information for deep neural network based voice activity detection, IEEE/ACM Trans. Audio Speech Lang. Process., 24, 252, 10.1109/TASLP.2015.2505415
Zhang, 2016, A deep ensemble learning method for monaural speech separation, IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP), 24, 967, 10.1109/TASLP.2016.2536478
Zhang, 2013, Deep belief networks based voice activity detection, IEEE Trans. Audio Speech Lang. Process., 21, 697, 10.1109/TASL.2012.2229986
Zöhrer, 2014, Single channel source separation with general stochastic networks., 978
Zöhrer, 2015, Representation models in single channel source separation, 713