A Survey on Probabilistic Models in Human Perception and Machines
Tóm tắt
Từ khóa
Tài liệu tham khảo
Abdelaziz, 2015, Learning dynamic stream weights for coupled-HMM-based audio-visual speech recognition, IEEE ACM Trans. Audio Speech Lang. Process., 23, 863, 10.1109/TASLP.2015.2409785
Adjoudani, 1996, On the integration of auditory and visual parameters in an HMM-based ASR, Speechreading by Humans and Machines, Models, Systems and Applications of NATO ASI Series F: Computer and Systems Sciences, 461
Ahrens, 2008, Nonlinearities and contextual influences in auditory cortical responses modeled with multilinear spectrotemporal methods, J. Neurosci., 28, 1929, 10.1523/JNEUROSCI.3377-07.2008
Alais, 2004, The ventriloquist effect results from near-optimal bimodal integration, Curr. Biol., 14, 257, 10.1016/j.cub.2004.01.029
Arnold, 2019, Suboptimal human multisensory cue combination, Sci. Rep, 9, 5155, 10.1038/s41598-018-37888-7
Balan, 2002, Microphone array speech enhancement by bayesian estimation of spectral amplitude and phase, IEEE Sensor Array and Multichannel Signal Processing Workshop Proceedings, 209
Battaglia, 2003, Bayesian integration of visual and auditory signals for spatial localization, J. Opt. Soc. Am. A, 20, 1391, 10.1364/JOSAA.20.001391
Brand, 1997, Coupled hidden markov models for complex action recognition, Proceeding IEEE International Conference on Computer Vision and Pattern Recognition, 994, 10.1109/CVPR.1997.609450
Burshtein, 2002, Speech enhancement using a mixture-maximum model, IEEE Trans. Speech Audio Process., 10, 341, 10.1109/TSA.2002.803420
Calabrese, 2011, A generalized linear model for estimating spectrotemporal receptive fields from responses to natural sounds, PLoS ONE, 6, e16104, 10.1371/journal.pone.0016104
Castella, 2010, Convolutive mixtures, Handbook of Blind Source Separation, 281, 10.1016/B978-0-12-374726-6.00013-8
Chazan, 2016, A hybrid approach for speech enhancement using MoG model and neural network phoneme classifier, IEEE ACM Trans. Audio Speech Lang. Process., 24, 2516, 10.1109/TASLP.2016.2618007
Cherry, 1953, Some experiments on the recognition of speech, with one and with two ears, J. Acoust. Soc. Am., 25, 975, 10.1121/1.1907229
Chichilnisky, 2001, A simple white noise analysis of neuronal light responses, Netw. Comput. Neural Syst, 12, 199, 10.1080/713663221
Colonius, 2018, Formal models and quantitative measures of multisensory integration: a selective overview, Eur. J. Neurosci., 51, 1161, 10.1111/ejn.13813
David, 2018, Incorporating behavioral and sensory context into spectro-temporal models of auditory encoding, Heart Res, 360, 107, 10.1016/j.heares.2017.12.021
David, 2012, Task reward structure shapes rapid receptive field plasticity in auditory cortex, Proc. Natl. Acad. Sci. U.S.A, 109, 2144, 10.1073/pnas.1117717109
Deng, 2013, Machine learning paradigms for speech recognition: an overview, IEEE Trans. Audio Speech Lang. Process., 21, 1060, 10.1109/TASL.2013.2244083
Doclo, 2015, Multichannel signal enhancement algorithms for assisted listening devices: exploiting spatial diversity using multiple microphones, IEEE Signal Process. Mag, 32, 18, 10.1109/MSP.2014.2366780
Ephraim, 1992, A bayesian estimation approach for speech enhancement using hidden markov models, IEEE Trans. Signal Process., 40, 725, 10.1109/78.127947
Ephraim, 1984, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Trans. Acoust, 32, 1109, 10.1109/TASSP.1984.1164453
Ephraim, 1985, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust., 33, 443, 10.1109/TASSP.1985.1164550
Ephrat, 2018, Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation, ACM Trans. Graph, 37, 109, 10.1145/3197517.3201357
Ernst, 2007, Learning to integrate arbitrary signals from vision and touch, J. Vis, 7, 7, 10.1167/7.5.7
Ernst, 2012, Optimal multisensory integration: assumptions and limits, The New Handbook of Multisensory Processes, 527, 10.7551/mitpress/8466.003.0048
Ernst, 2004, Merging the senses into a robust percept, Trends Cogn. Sci., 8, 162, 10.1016/j.tics.2004.02.002
Fetsch, 2013, Bridging the gap between theories of sensory cue integration and the physiology of multisensory neurons, Nat. Neurosci, 14, 429, 10.1038/nrn3503
Fritz, 2003, Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex, Nat. Neurosci, 6, 1216, 10.1038/nn1141
Gerkmann, 2012, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay, IEEE Trans. Audio Speech Lang. Process., 20, 1383, 10.1109/TASL.2011.2180896
Ghahramani, 2015, Probabilistic machine learning and artificial intelligence, Nature, 521, 452, 10.1038/nature14541
Hendriks, 2013, DFT-domain based single-microphone noise reduction for speech enhancement - a survey of the state of the art, Synthesis Lectures on Speech and Audio Processing, 1
Hennecke, 1996, Visionary speech: Looking ahead to practical speechreading systems, Speechreading by Humans and Machines, Models, Systems and Applications, Volume 150 of NATO ASI Series F: Computer and Systems Sciences, 331
Hershey, 2004, Audio-visual graphical models for speech processing, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 649
Hershey, 2002, Audio-visual sound separation via hidden markov models, Advances in Neural Information Processing Systems (NIPS), 1173
Hershey, 2016, Deep clustering: discriminative embeddings for segmentation and separation, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 31
Jain, 2000, Statistical pattern recognition: a review, IEEE Trans. Pattern Anal. Mach. Intel. Ligence, 22, 4, 10.1109/34.824819
Jutten, 1991, Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture, Signal Process., 24, 1, 10.1016/0165-1684(91)90079-X
Kay, 1993, Fundamentals of Statistical Signal Processing - Volume 1: Estimation Theory
King, 2018, Recent advances in understanding the auditory cortex, F1000Research, 7, 1555, 10.12688/f1000research.15580.1
Kolossa, 2011, Robust Speech Recognition of Uncertain or Missing Data: Theory and Applications, 1st Edn, 10.1007/978-3-642-21317-5
Körding, 2007, Causal inference in multisensory perception, PLoS ONE, 2, e943, 10.1371/journal.pone.0000943
Krawczyk-Becker, 2016, Fundamental frequency informed speech enhancement in a flexible statistical framework, IEEE ACM Trans. Audio Speech Lang. Proc., 24, 940, 10.1109/TASLP.2016.2533867
Lake, 2017, Building machines that learn and think like people, Behav Brain Sci., 40, e253, 10.1017/S0140525X16001837
Lee, 2015, A single microphone noise reduction algorithm based on the detection and reconstruction of spectro-temporal features, Proc. R. Soc. A Math. Phys. Eng. Sci., 471, 20150309, 10.1098/rspa.2015.0309
Liu, 2012, Use of bimodal coherence to resolve per- mutation problem in convolutive BSS, Signal Process., 92, 1916, 10.1016/j.sigpro.2011.11.007
Lohse, 2020, Neural circuits underlying auditory contrast gain control and their perceptual implications, Nat. Commun, 11, 324, 10.1038/s41467-019-14163-5
Lotter, 2005, Speech enhancement by MAP spectral amplitude estimation using a super-gaussian speech model, EURASIP J. Adv. Signal Process, 2005, 354850, 10.1155/ASP.2005.1110
Ma, 2012, Organizing probabilistic models of perception, Trends Cogn. Sci., 16, 511, 10.1016/j.tics.2012.08.010
Magnotti, 2017, A causal inference model explains perception of the mcgurk effect and other incongruent audiovisual speech, PLoS Comput. Biol, 13, e1005229, 10.1371/journal.pcbi.1005229
Maloney, 2002, Statistical theory and biological vision, Perception and the Physical World: Psychologocal and Philosophical Issues in Perception, 145, 10.1002/0470013427.ch6
Martin, 2001, Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech Audio Process., 9, 504, 10.1109/89.928915
Martin, 2005, Speech enhancement based on minimum mean-square error estimation and supergaussian priors, IEEE Trans. Speech Audio Process., 13, 845, 10.1109/TSA.2005.851927
Meijer, 2019, Integration of audiovisual spatial signals is not consistent with maximum likelihood estimation, Cortex, 119, 74, 10.1016/j.cortex.2019.03.026
Mesgarani, 2012, Selective cortical representation of attended speaker in multi-talker speech perception, Nature, 485, 233, 10.1038/nature11020
Mesgarani, 2014, Mechanisms of noise robust representation of speech in primary auditory cortex, Proc. Natl. Acad. Sci. U.S.A., 111, 1, 10.1073/pnas.1318017111
Meutzner, 2017, Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates, Proceeding ICASSP, 10.1109/ICASSP.2017.7953172
Meyer, 2017, Models of neuronal stimulus-response functions: elaboration, estimation, and evaluation, Front. Syst. Neurosci., 10, 109, 10.3389/fnsys.2016.00109
Nefian, 2002, Dynamic bayesian networks for audio-visual speech recognition, EURASIP J. Adv. Signal Process, 2002, 1274, 10.1155/S1110865702206083
Audio Visual Speech Recognition NetiC. PotamianosG. LuettinJ. MatthewsI. GlotinH. VergyriD. Workshop 2000 Final Report2000
Noppeney, 2018, Causal inference and temporal predictions in audiovisual perception of speech and music, Ann. N. Y. Acad. Sci, 1423, 102, 10.1111/nyas.13615
Padmanabhan, 2015, Machine learning in automatic speech recognition: a survey, IETE Tech. Rev., 32, 240, 10.1080/02564602.2015.1010611
Paninski, 2003, Convergence properties of some spike-triggered analysis techniques, Network: Comput Neural Syst, 14, 437, 10.1088/0954-898X_14_3_304
Parise, 2014, Natural auditory scene statistics shapes human spatial hearing, Proc. Natl. Acad. Sci. U.S.A, 111, 6104, 10.1073/pnas.1322705111
Porter, 1984, Optimal estimators for spectral restoration of noisy speech, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 53, 10.1109/ICASSP.1984.1172545
Potamianos, 2003, Recent advances in the automatic recognition of audiovisual speech, Proc. IEEE, 91, 1306, 10.1109/JPROC.2003.817150
Rabiner, 1989, A tutorial on hidden markov models and selected applications in speech recognition, Proc. IEEE, 77, 257, 10.1109/5.18626
Rabinowitz, 2013, Constructing noise-invariant representations of sound in the auditory pathway, PLoS Biol, 11, e1001710, 10.1371/journal.pbio.1001710
Rabinowitz, 2012, Spectrotemporal contrast kernels for neurons in primary auditory cortex, J. Neurosci., 32, 11271, 10.1523/JNEUROSCI.1715-12.2012
Rao, 2002, Probabilistic Models of the Brain: Perception and Neural Function, 10.7551/mitpress/5583.001.0001
Rehr, 2018, On the importance of super-gaussian speech priors for machine-learning based speech enhancement, IEEE/ACM Trans. Audio Speech Lang. Process., 26, 357, 10.1109/TASLP.2017.2778151
Rehr, 2019, An analysis of noise-aware features in combination with the size and diversity of training data for DNN-based speech enhancement, IEEE International Conference Acoustics Speech Signal Process (ICASSP), 10.1109/ICASSP.2019.8682991
Rivet, 2007, Using a visual voice activity detector to regularize the permutations in blind source separation of convolutive speech mixtures, Proceeding International Conference Digital Signal Processing (DSP), 223
Rivet, 2014, Audiovisual speech source separation: an overview of key methodologies, IEEE Signal Process. Mag, 31, 125, 10.1109/MSP.2013.2296173
Roach, 2006, Resolving multisensory conflict : a strategy for balancing the costs and benefits of audio-visual integration, Proc. R. Soc. B Biol. Sci., 273, 2159, 10.1098/rspb.2006.3578
Rohde, 2015, Statistically optimal multisensory cue integration?: A practical tutorial, Multisens. Res., 1, 10.1163/22134808-00002510
Roweis, 2001, One microphone source separation, Advances in Neural Information Processing Systems 13, 793
Roweis, 2003, Factorial models and refiltering for speech separation and denoising, Eurospeech, 10.21437/Eurospeech.2003-345
Rowland, 2007, A Bayesian model unifies multisensory spatial localization with the physiological properties of the superior colliculus, Exp. Brain Res., 180, 153, 10.1007/s00221-006-0847-2
Sato, 2007, Bayesian inference explains perception of unity and ventriloquism aftereffect : identification of common sources, Neural Comput, 19, 3335, 10.1162/neco.2007.19.12.3335
Schwartz, 2004, Seeing to hear better: evidence for early audio-visual interactions in speech identification, Cognition, 93, B69, 10.1016/j.cognition.2004.01.006
Shams, 2005, Sound-induced flash illusion as an optimal percept, Neuroreport, 16, 1923, 10.1097/01.wnr.0000187634.68504.bb
Sharpee, 2004, Analyzing neural responses to natural signals: maximally informative dimensions, Neural Comput., 16, 223, 10.1162/089976604322742010
Sodoyer, 2002, Separation of audio-visual speech sources: a new approach exploiting the audio-visual coherence of speech stimuli, EURASIP J. Adv. Signal Process, 2002, 1165, 10.1155/S1110865702207015
Theis, 2013, Beyond GLMs: a generative mixture modeling approach to neural system identification, PLoS Comput. Biol, 9, e1003356, 10.1371/journal.pcbi.1003356
Ursino, 2014, Neurocomputational approaches to modelling multisensory integration in the brain: a review, Neural Netw., 60, 141, 10.1016/j.neunet.2014.08.003
Willmore, 2014, Hearing in noisy environments: noise invariance and contrast gain control, J. Physiol., 592, 3371, 10.1113/jphysiol.2014.274886
Willmore, 2016, Incorporating midbrain adaptation to mean sound level improves models of auditory cortical processing, J. Neurosci., 36, 280, 10.1523/JNEUROSCI.2441-15.2016
Wozny, 2010, Probability matching as a computational strategy used in perception, PLoS Comput. Biol, 6, e1000871, 10.1371/journal.pcbi.1000871
Yamins, 2016, Using goal-driven deep learning models to understand sensory cortex, Nat. Neurosci, 19, 356, 10.1038/nn.4244
Yilmaz, 2004, Blind separation of speech mixtures via time-frequency masking, IEEE Trans. Signal Process., 52, 1830, 10.1109/TSP.2004.828896
Yuille, 1996, Bayesian decision theory and psychophysics, Perception as Bayesian Inference, 123, 10.1017/CBO9780511984037.006
Yumoto, 1982, Harmonic to noise ratio as an index of the degree of hoarseness, J. Acoust. Soc. Am., 71, 1544, 10.1121/1.387808
Zhao, 2007, HMM-based gain modeling for enhancement of speech in noise, IEEE Trans. Audio Speech Lang. Process., 15, 882, 10.1109/TASL.2006.885256