Speech quality assessment using 2D neurogram orthogonal moments

Speech Communication - Tập 80 - Trang 34-48 - 2016
Wissam A. Jassim1, Muhammad S.A. Zilany2
1Department of Electrical Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur, 50603 Malaysia
2Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur 50603, Malaysia

Tài liệu tham khảo

Beerends, 2013, Perceptual objective listening quality assessment (POLQA), the third generation ITU-T standard for end-to-end speech quality measurement Part I: temporal alignment, J. Audio Eng. Soc., 61, 366 Bruce, 2003, An auditory-periphery model of the effects of acoustic trauma on auditory nerve responses, J. Acoust. Soc. Am., 113, 369, 10.1121/1.1519544 Côté, 2011, Integral and diagnostic intrusive prediction of speech quality, 10.1007/978-3-642-18463-5 Dubno, 2005, Word recognition in noise at higher-than-normal levels: decreases in scores and increases in masking, J. Acoust. Soc. Am., 118, 914, 10.1121/1.1953107 Flusser, 2009 Hines, 2010, Speech intelligibility from image processing, Speech Commun., 52, 736, 10.1016/j.specom.2010.04.006 Hines, 2015, ViSQOL: an objective speech quality model, EURASIP J. Audio Speech Music Process., 2015, 1, 10.1186/s13636-015-0054-9 Hu, 2006, Subjective comparison of speech enhancement algorithms, 1, I Hu, 2008, Evaluation of objective quality measures for speech enhancement, IEEE Trans. Audio Speech Lang. Process., 16, 229, 10.1109/TASL.2007.911054 Hu, 2007, Subjective comparison and evaluation of speech enhancement algorithms, Speech Commun., 49, 588, 10.1016/j.specom.2006.12.006 Huber, 2006, PEMO-Q a new method for objective audio quality assessment using a model of auditory perception, IEEE Trans. Audio Speech Lang. Process., 14, 1902, 10.1109/TASL.2006.883259 ITU-T, 2014. Perceptual Objective Listening Quality Assessment, Recommendation ITU-T P.863. ITU-T-Recommendations, 2012. G.729 : Coding of Speech at 8 kbit/s Using Conjugate-structure Algebraic-code-excited Linear Prediction (CS-ACELP). ITU-T Study Group 12: Speech Quality Experts Group, 1995. Subjective Test Plan for Characterization of an 8 kbit/s Speech Codec. ITU-T recommendation P.862 Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-end Speech Quality Assessment of Narrow-band Telephone Networks and Speech Codecs, 2001. Jassim, 2012, Face recognition using discrete Tchebichef–Krawtchouk transform, 120 Jassim, 2012, New orthogonal polynomials for speech signal and image processing, IET Signal Process., 6, 713, 10.1049/iet-spr.2011.0004 Kates, 2010, The hearing-aid speech quality index (HASQI), Audio Eng. Soc., 58, 363 sheng Kiang, 1990, Curious oddments of auditory-nerve studies, Hear. Res., 49, 1, 10.1016/0378-5955(90)90091-3 Klatt, 1982, Prediction of perceived phonetic distance from critical-band spectra: a first step, 7, 1278 Koekoek, 2010, Hypergeometric orthogonal polynomials and their q-analogues Kressner, 2013, Evaluating the generalization of the hearing aid speech quality index (HASQI), IEEE Trans. Audio Speech Lang. Process., 21, 407, 10.1109/TASL.2012.2217132 Loizou, 2011, Speech quality assessment, 346, 623 Loizou, 2013 Mamun, 2015, Prediction of speech intelligibility using a neurogram orthogonal polynomial measure (NOPM), IEEE Trans. Audio Speech Lang. Process., 23, 760, 10.1109/TASLP.2015.2401513 Panzer, 1993, A comparison of subjective methods for evaluating speech quality, 224, 59 Pearce, 2000, The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, 29 Preminger, 1995, Quantifying the relation between speech quality and speech intelligibility, J. Speech Lang. Hear. Res., 38, 714, 10.1044/jshr.3803.714 Quackenbush, 1988, Objective measures of speech quality Rix, 2001, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, 2, 749 Rothauser, 1969, IEEE recommended practice for speech quality measurements, IEEE Trans. Audio Electroacoust., 17, 225, 10.1109/TAU.1969.1162058 Smith, 2007 Steeneken, 1992 Studebaker, 1999, Monosyllabic word recognition at higher-than-normal speech and noise levels, J. Acoust. Soc. Am., 105, 2431, 10.1121/1.426848 Supplement 23 to ITU-T P-series recommendations ITU-T Coded-speech Database. 1998. Teng, 2006 Tribolet, 1978, A study of complexity and quality of speech waveform coders, 3, 586 Wee, 2010, Image quality assessment by discrete orthogonal moments, Pattern Recognit., 43, 4055, 10.1016/j.patcog.2010.05.026 Wong, 1998, Effects of high sound levels on responses to the vowel /ε/ in cat auditory nerve, Hear. Res., 123, 61, 10.1016/S0378-5955(98)00098-7 Zilany, 2006, Modeling auditory-nerve responses for high sound pressure levels in the normal and impaired auditory periphery, J. Acoust. Soc. Am., 120, 1446, 10.1121/1.2225512 Zilany, 2007, Representation of the vowel /ε/ in normal and impaired auditory nerve fibers: model predictions of responses in cats, J. Acoust. Soc. Am., 122, 402, 10.1121/1.2735117 Zilany, 2014, Updated parameters and expanded simulation options for a model of the auditory periphery, J. Acoust. Soc. Am., 135, 283, 10.1121/1.4837815 Zilany, 2009, A phenomenological model of the synapse between the inner hair cell and auditory nerve: Long-term adaptation with power-law dynamics, J. Acoust. Soc. Am., 126, 2390, 10.1121/1.3238250