Detection of glottal closure instant and glottal open region from speech signals using spectral flatness measure

Speech Communication - Tập 116 - Trang 30-43 - 2020
Sudarsana Reddy Kadiri1, RaviShankar Prasad2, B. Yegnanarayana3
1Department of Signal Processing and Acoustics, Aalto University, Finland
2Speech and Audio Processing Group, IDIAP, Martigny, Switzerland
3Speech Processing Laboratory, IIIT-Hyderabad, India

Tóm tắt

Từ khóa


Tài liệu tham khảo

Abberton, 1989, Laryngographic assessment of normal voice: a tutorial, Clin. Linguist. Phonet., 3, 263, 10.3109/02699208908985291

Airaksinen, 2014, Quasi closed phase glottal inverse filtering analysis with weighted linear prediction, IEEE/ACM Trans. Audio, Speech Lang. Process., 22, 596, 10.1109/TASLP.2013.2294585

Alku, 1992, Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering, Speech Commun., 11, 109, 10.1016/0167-6393(92)90005-R

Alku, 2011, Glottal inverse filtering analysis of human voice production - a review of estimation and parameterization methods of the glottal excitation and their applications, Sadhana, 36, 623, 10.1007/s12046-011-0041-5

Alku, 2009, Closed phase covariance analysis based on constrained linear prediction for glottal inverse filtering, J. Acoust. Soc. Am., 120, 3289, 10.1121/1.3095801

Ananthapadmanabha, 1979, Epoch extraction from linear prediction residual for identification of closed glottis interval, IEEE Trans. Speech Audio Process., 27, 309, 10.1109/TASSP.1979.1163267

Aneeja, 2015, Single frequency filtering approach for discriminating speech and nonspeech, IEEE/ACM Trans. Audio, Speech Lang. Process., 23, 705, 10.1109/TASLP.2015.2404035

Barney, 2007, The effect of glottal opening on the acoustic response of the vocal tract, Acta Acustica united with Acustica, 93, 1046

Bouzid, 2009, Voice source parameter measurement based on multi-scale analysis of electroglottographic signal, Speech Commun., 51, 782, 10.1016/j.specom.2008.08.004

Brookes, M.,. Voicebox: speech processing toolbox for matlab. Source: https://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html.

Chi, 2007, Subglottal coupling and its influence on vowel formants, J. Acoust. Soc. Am., 122, 1735, 10.1121/1.2756793

Childers, 1984, A critical review of electroglottography, Crit. Rev. Biomed. Eng., 12, 131

Childers, 1994, Measuring and modeling vocal source-tract interaction, IEEE Trans. Biomed. Eng., 41, 663, 10.1109/10.301733

D Alessandro, 2011, Glottal closure instant and voice source analysis using time-scale lines of maximum amplitude, Sadhana, 36, 601, 10.1007/s12046-011-0040-6

Degottex, 2009, Glottal closure instant detection from a glottal shape estimate, 226

Degottex, 2010, Joint estimate of shape and time-synchronization of a glottal source model by phase flatness, 5058

Degottex, 2011, Function of phase-distortion for glottal model estimation, 4608

Degottex, 2011, Phase minimization for glottal model estimation, IEEE Transactions on Acoustics, Speech and Language Processing, 19, 1080, 10.1109/TASL.2010.2076806

Drugman, 2014, Glottal source processing: from analysis to applications, Comput. Speech Lang., 28, 1117, 10.1016/j.csl.2014.03.003

Drugman, 2011, Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation, Speech Commun., 53, 855, 10.1016/j.specom.2011.02.004

Drugman, 2012, A comparative study of glottal source estimation techniques, Comput. Speech Lang., 26, 20, 10.1016/j.csl.2011.03.003

Drugman, 2012, Detection of glottal closure instants from speech signals: a quantitative review, IEEE Trans. Audio Speech Lang. Process., 20, 994, 10.1109/TASL.2011.2170835

Fant, 1995, The LF-model revisited. transformations and frequency domain analysis, Speech Transm. Lab. Q. Progr.Status Report, 36, 119

Fu, 2006, Robust glottal source estimation based on joint source-filter model optimization, IEEE Trans. Audio Speech Lang.Process., 14, 492, 10.1109/TSA.2005.857807

Guerin, 1976, A voice source taking account of coupling with the supraglottal cavities, 1, 47

Henrich, 2011, Analysing and understanding the singing voice : recent progress and open questions, Curr. Bioinform., 6, 362, 10.2174/157489311796904709

Henrich, 2004, On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation, J. Acoust. Soc. Am., 115, 1321, 10.1121/1.1646401

Herbst, 2014, Glottal opening and closing events investigated by electroglottography and super-high-speed video recordings, J. Exper. Biol., 217, 955, 10.1242/jeb.093203

ITU-T, Recommendation, 2005. G.191, software tools for speech and audio coding standardization. Source: http://www.itu.int/rec/T-REC-G.191-200509-I/en.

Jain, 2012, Time-order representation based method for epoch detection from speech signals, J. Intell. Syst., 21, 79

Jain, 2013, Gci identification from voiced speech using the eigen value decomposition of Hankel matrix, 371

Jain, 2014, Event-based method for instantaneous fundamental frequency estimation from voiced speech based on eigenvalue decomposition of the hankel matrix, IEEE/ACM Trans. Audio SpeechLang. Process., 22, 1467, 10.1109/TASLP.2014.2335056

Kadiri, 2018

Kadiri, 2015, Analysis of excitation source features of speech for emotion recognition, 1324

Kadiri, 2017, Epoch extraction from emotional speech using single frequency filtering approach, Speech Commun., 86, 52, 10.1016/j.specom.2016.11.005

Kadiri, 2018, Analysis and detection of phonation modes in singing voice using excitation source features and single frequency filtering cepstral coefficients (SFFCC), 441

Kadiri, 2018, Breathy to tense voice discrimination using zero-time windowing cepstral coefficients (ZTWCCs), 232

Kafentzis, 2011, Glottal inverse filtering using stabilised weighted linear prediction, 5408

Khanagha, 2014, Detection of glottal closure instants based on the microcanonical multiscale formalism, IEEE/ACM Trans. Audio, Speech Lang. Process., 22, 1941, 10.1109/TASLP.2014.2352451

Kominek, 2004, The CMU Arctic speech databases, 223

Krishnamurthy, 1986, Two-channel speech analysis, IEEE Trans. Audio Speech Signal Process., 34, 730, 10.1109/TASSP.1986.1164909

Larsson, 2000, Vocal fold vibrations: high-speed imaging, kymography and acoustic analysis: a preliminary report, Laryngoscope, 110, 2117, 10.1097/00005537-200012000-00028

Laver, 1980

Legát, 2011, On the detection of pitch marks using a robust multi-phase algorithm, Speech Commun., 53, 552, 10.1016/j.specom.2011.01.008

Lieberman, 1963, Some acoustic measures of the fundamental periodicity of normal and pathologic larynges, J. Acoust. Soc. Am., 35, 344, 10.1121/1.1918465

Lohscheller, 2008, Phonovibrography: mapping high-speed movies of vocal fold vibrations into 2-D diagrams for visualizing and analyzing the underlying laryngeal dynamics, IEEE Trans. Med. Imag., 27, 300, 10.1109/TMI.2007.903690

Lulich, 2009, Source-filter interaction in the opposite direction: subglottal coupling and the influence of vocal fold mechanics on vowel spectra during the closed phase, 6, 10.1121/1.3269926

Ma, 1994, A frobenius norm approach to glottal closure detection from the speech signal, IEEE Trans. Speech Audio Process., 2, 258, 10.1109/89.279274

Mehta, 2011, Automated measurement of vocal fold vibratory asymmetry from high-speed videoendoscopy recordings, J. Speech Lang. Hear. Res., 54, 47, 10.1044/1092-4388(2010/10-0026)

Mittal, 2013, Effect of glottal dynamics in the production of shouted speech, J. Acoust. Soc. Am., 133, 3050, 10.1121/1.4796110

Moore, 2008, Critical analysis of the impact of glottal features in the classification of clinical depression in speech, IEEE Trans. Biomed. Eng., 55, 96, 10.1109/TBME.2007.900562

Moulines, 1990, Detection of the glottal closure by jumps in the statistical properties of the speech signal, Speech Commun., 9, 401, 10.1016/0167-6393(90)90017-4

Murty, 2008, Epoch extraction from speech signals, IEEE Trans. Audio Speech Lang. Process., 16, 1602, 10.1109/TASL.2008.2004526

Naylor, 2007, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm, IEEE Trans. Audio Speech Lang. Process., 15, 34, 10.1109/TASL.2006.876878

Prasad, 2016, Determination of glottal open regions by exploiting changes in the vocal tract system characteristics, J. Acoust. Soc. Am., 140, 666, 10.1121/1.4958681

Prathosh, 2013, Epoch extraction based on integrated linear prediction residual using plosion index, IEEE Trans. Audio Speech Lang. Process., 21, 2471, 10.1109/TASL.2013.2273717

Ramesh, 2013, Detection of glottal opening instants using Hilbert envelope, 44

Rao, 2007, Determination of instants of significant excitation in speech using hilbert envelope and group-delay function, IEEE Signal Process. Letters, 14, 762, 10.1109/LSP.2007.896454

Rothenberg, 1981, Acoustic interaction between the glottal source and the vocal tract, Vocal Fold Physiol., 305

Rothenberg, 1988, Monitoring vocal fold abduction through vocal fold contact area, J. Speech Hear. Res., 31, 338, 10.1044/jshr.3103.338

Schleusing, 2013, Joint source-filter optimization for accurate vocal tract estimation using differential evolution, IEEE Trans. Audio Speech Lang. Process., 21, 1560, 10.1109/TASL.2013.2255275

Silva, 2009, Jitter estimation algorithms for detection of pathological voices, EURASIP J. Adv. Signal Process., 10.1155/2009/567875

Source: https://covarep.github.io/covarep/.

Source: https://geostat.bordeaux.inria.fr/index.php/downloads.html.

Stevens, 1977, Physics of laryngeal behavior and larynx models, Phonetica, 34, 264, 10.1159/000259885

Thati, 2013, Synthesis of laughter by modifying excitation characteristics, J. Acoust. Soc. Am., 133, 3072, 10.1121/1.4798664

Thomas, 2012, Estimation of glottal closing and opening instants in voiced speech using the yaga algorithm, IEEE Trans. Audio Speech Lang. Process., 20, 82, 10.1109/TASL.2011.2157684

Thomas, 2009, The sigma algorithm: a glottal activity detector for electroglottographic signals, IEEE Trans. Audio Speech Lang. Process., 17, 1557, 10.1109/TASL.2009.2022430

Titze, 2004, Theory of glottal airflow and source-filter interaction in speaking and singing, Acta Acustica united with Acustica, 90, 641

Titze, 2008, Nonlinear source filter coupling in phonation: theory, J. Acoust. Soc. Am., 123, 2733, 10.1121/1.2832337

Veldhuis, 1998, A computationally efficient alternative for the liljencrants-Fant model and its perceptual evaluation, J. Acoust. Soc. Am., 103, 566, 10.1121/1.421103

Walker, 2007, A review of glottal waveform analysis, Springer Lecture Notes Comput. Sci. (LNCS), 4391, 1, 10.1007/978-3-540-71505-4_1

Wong, 1979, Least squares glottal inverse filtering from the acoustic speech waveform, IEEE Trans. Audio Speech Signal Process., 27, 350, 10.1109/TASSP.1979.1163260

Yan, 2006, Automatic tracing of vocal-fold motion from high-speed digital images, IEEE Trans. Biomed. Eng., 53, 1394, 10.1109/TBME.2006.873751

Yegnanarayana, 2011, Epoch-based analysis of speech signals, Sadhana, 36, 651, 10.1007/s12046-011-0046-0

Yegnanarayana, 2013, Spectro-temporal analysis of speech signals using zero-time windowing and group delay function, Speech Commun., 55, 782, 10.1016/j.specom.2013.02.007

Yegnanarayana, 1998, Extraction of vocal-tract system characteristics from speech signals, IEEE TASP, 6, 313