Glottal source processing: From analysis to applications

Computer Speech & Language - Tập 28 Số 5 - Trang 1117-1138 - 2014
Thomas Drugman1, Paavo Alku2, Abeer Alwan3, B. Yegnanarayana4
1TCTS lab - University of Mons, Belgium
2Department of Signal Processing and Acoustics, Aalto University, Finland#TAB#
3Speech Processing and Auditory Perception Laboratory, University of California Los Angeles, USA
4[Speech and Vision Laboratory, International Institute of Information Technology Hyderabad, India]

Tóm tắt

Từ khóa


Tài liệu tham khảo

Agiomyrgiannakis, 2009, ARX-LF-based source-filter methods for voice modification and transformation, 3589

Akande, 2005, Estimation of the vocal tract transfer function with application to glottal wave analysis, Speech Commun., 46, 15, 10.1016/j.specom.2005.01.007

Alku, 1992, Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering, Speech Commun., 11, 109, 10.1016/0167-6393(92)90005-R

Alku, 2011, Glottal inverse filtering analysis of human voice production – a review of estimation and parameterization methods of the glottal excitation and their applications, Sadhana, 36, 623, 10.1007/s12046-011-0041-5

Alku, 2002, Normalized amplitude quotient for parameterization of the glottal flow, J. Acoust. Soc. Am., 112, 701, 10.1121/1.1490365

Alku, 2009, Closed phase covariance analysis based on constrained linear prediction for glottal inverse filtering, J. Acoust. Soc. Am., 120, 3289, 10.1121/1.3095801

Alku, 1997, Parabolic spectral parameter – a new method for quantification of the glottal flow, Speech Commun., 22, 67, 10.1016/S0167-6393(97)00020-4

Alku, 1996, Amplitude domain quotient for characterization of the glottal volume velocity waveform estimated by inverse filtering, Speech Commun., 18, 131, 10.1016/0167-6393(95)00040-2

Ananthapadmanabha, 1979, Epoch extraction from linear prediction residual for identification of closed glottis interval, IEEE Trans. Speech Audio Process., 27, 309, 10.1109/TASSP.1979.1163267

Avanzini, 2008, Simulation of vocal fold oscillation with a pseudo-one-mass physical model, Speech Commun., 50, 95, 10.1016/j.specom.2007.07.002

Bapineedu, 2009, Analysis of Lombard speech using excitation source information, IEEE Trans. Audio Speech Lang. Process., 1091

Barra, 2007, On the limitations of voice conversion techniques in emotion identification tasks

Berezina, 2010, Autoregressive modeling of voiced speech, 5042

Boersma, 2001, Praat, a system for doing phonetics by computer, Glot Int., 5, 341

Bozkurt, 2007, Chirp group delay analysis of speech signals, Speech Commun., 49, 159, 10.1016/j.specom.2006.12.004

Bozkurt, 2005, Zeros of z-transform representation with application to source-filter separation in speech, IEEE Sig. Process. Lett., 12, 344, 10.1109/LSP.2005.843770

Cabral, 2005, Pitch-synchronous time-scaling for prosodic and voice quality transformations, 1137

Cabral, 2007, Towards an improved modeling of the glottal source in statistical parametric speech synthesis

Cabral, 2008, Glottal spectral separation for parametric speech synthesis, 1829

Chen, 2012, Estimating the voice source in noise

Chetouani, 2009, Investigation on lp-residual representations for speaker identification, Pattern Recogn., 42, 487, 10.1016/j.patcog.2008.08.008

Childers, 1995, Glottal source modeling for voice conversion, Speech Commun., 16, 127, 10.1016/0167-6393(94)00050-K

Childers, 1991, Vocal quality factors: analysis, synthesis, and perception, J. Acoust. Soc. Am., 90, 2394, 10.1121/1.402044

Chu, 2012, Safe: A statistical approach to f0 estimation under clean and noisy conditions, IEEE Trans. Audio Speech Lang. Process., 20, 933, 10.1109/TASL.2011.2168518

de Cheveigne, 1991, Speech f0 extraction based on lickliders pitch perception model, ICPhS, 218

de Cheveigne, 2002, Yin, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am., 111, 1917, 10.1121/1.1458024

Degottex, 2011, Phase minimization for glottal model estimation, IEEE Trans. Audio Speech Lang. Process., 19, 1080, 10.1109/TASL.2010.2076806

Degottex, 2011, Pitch transposition and breathiness modification using a glottal source model and its adapted vocal-tract filter, 5128

del Pozo, 2006, Continuous tracheoesophageal speech repair

Ding, 1998, Determining polarity of speech signals based on gradient of spurious glottal waveforms, 857

Drioli, 2005, A flow waveform-matched low-dimensional glottal model based on physical knowledge, J. Acoust. Soc. Am., 117, 3184, 10.1121/1.1861234

Dromey, 1992, Glottal airflow and electroglottographic measures of vocal function at multiple intensities, J. Voice, 6, 44, 10.1016/S0892-1997(05)80008-6

Drugman, 2013, Residual excitation skewness for automatic speech polarity detection, IEEE Sig. Process. Lett., 20, 387, 10.1109/LSP.2013.2249661

Drugman, 2011, Joint robust voicing detection and pitch estimation based on residual harmonics, Interspeech, 1973, 10.21437/Interspeech.2011-519

Drugman, 2009, Glottal closure and opening instant detection from speech signals, 2891

Drugman, 2010, A comparative evaluation of pitch modification techniques, EUSIPCO

Drugman, 2010, Glottal-based analysis of the Lombard effect, Interspeech, 2610, 10.21437/Interspeech.2010-257

Drugman, 2010, On the potential of glottal signatures for speaker recognition, Interspeech, 10.21437/Interspeech.2010-156

Drugman, 2012, Detecting speech polarity with high-order statistics, Cognitive Computation Journal

Drugman, 2012, The deterministic plus stochastic model of the residual signal and its applications, IEEE Trans. on Audio Speech and Language Processing, 20, 968, 10.1109/TASL.2011.2169787

Drugman, 2009, Chirp decomposition of speech signals for glottal source estimation.

Drugman, 2009, Complex cepstrum-based decomposition of speech for glottal source estimation, Interspeech, 116, 10.21437/Interspeech.2009-27

Drugman, 2011, Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation, Speech Commun., 53, 855, 10.1016/j.specom.2011.02.004

Drugman, 2012, A comparative study of glottal source estimation techniques, Computer Speech and Language, 26, 20, 10.1016/j.csl.2011.03.003

Drugman, 2009, On the mutual information between source and filter contributions for voice pathology detection, Interspeech, 1463, 10.21437/Interspeech.2009-447

Drugman, 2011, Phase-based information for voice pathology detection, 4612

Drugman, 2012, Modeling the creaky excitation for parametric speech synthesis, Interspeech, 10.21437/Interspeech.2012-364

Drugman, 2012, Detection of glottal closure instants from speech signals: a quantitative review, IEEE Trans. on Audio Speech and Language Processing, 20, 994, 10.1109/TASL.2011.2170835

Drugman, 2009, A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis, Interspeech, 10.21437/Interspeech.2009-148

Drugman, 2009, Using a pitch-synchronous residual codebook for hybrid HMM/frame selection speech synthesis, 3793

El-Jaroudi, 1991, Discrete all-pole modeling, IEEE Trans. on Signal Processing, 39, 411, 10.1109/78.80824

Fant, 1961, A new anti-resonance circuit for inverse filtering, Speech Transmission Laboratory Quarterly Progress and Status Report, 2, 1

Fant, 1970, 15

Fant, 1995, The LF-model revisited transformations and frequency domain analysis., 119

Fant, 1962, Indirect studies of glottal cycles by synchronous inverse filtering and photo-electrical glottography, Speech Transmission Laboratory Quarterly Progress and Status Report, 3, 1

Fant, 1985, A four-parameter model of glottal flow, STL-QPSR, 26, 1

Fant, 1985, A four-parameter model of glottal flow, Speech Transmission Laboratory Quarterly Progress and Status Report, 26, 1

Frohlich, 2001, Sim simultaneous inverse filtering and matching of a glottal flow model for acoustic speech signals, J. Acoust. Soc. Am., 110, 479, 10.1121/1.1379076

Frokjaer-Jensen, 1973, Registration of voice quality, Bruel&Kjaer Technical Review, 3, 3

Fu, 2006, Robust glottal source estimation based on joint source-filter model optimization, IEEE Trans. on Audio Speech and Language Processing, 14, 492, 10.1109/TSA.2005.857807

Fujisaki, 1986, Proposal and evaluation of models for the glottal source waveform, 1605

Ghosh, 2011, Joint source-filter optimization for robust glottal source estimation in the presence of shimmer and jitter, Speech Commun., 53, 98, 10.1016/j.specom.2010.07.004

Gobl, 2003, Amplitude-based source parameters for measuring voice quality, ISCA VOQUAL, 151

Gold, 1969, Parallel processing techniques for estimating pitch periods of speech in the time domain, J. Acoust. Soc. Am., 46, 442, 10.1121/1.1911709

Gomez-Vilda, 2009, Glottal source biometrical signature for voice pathology detection, Speech Commun., 51, 10.1016/j.specom.2008.09.005

Gordon, 2001, Phonation types: a cross-linguistic overview, J. Phonet., 29, 383, 10.1006/jpho.2001.0147

Govind, 2011, Neutral to target emotion conversion using source and suprasegmental information, Interspeech, 2969, 10.21437/Interspeech.2011-743

Granqvist, 2003, Simultaneous analysis of vocal fold vibration and transglottal airflow: exploring a new experimental set-up, J. Voice, 17, 319, 10.1067/S0892-1997(03)00070-5

Gudnason, 2008, Voice source cepstrum coefficients for speaker identification, 4821

Gudnason, 2012, Data-driven voice source waveform analysis and synthesis, Speech Commun., 54, 199, 10.1016/j.specom.2011.08.003

Guerchi, 2000, Low-rate quantization of spectral information in a 4kb/spitch-synchronous CELP coder, 111

Hedelin, 1986, High quality glottal lpc-vocoding, 11, 465

Howell, 1992, Acoustic analysis and perception of vowels in children's and teenagers’ stuttered speech, J. Acoust. Soc. Am., 91, 1697, 10.1121/1.402449

Isaksson, 1989, Inverse glottal filtering using a parameterized input model, Signal Processing, 18, 435, 10.1016/0165-1684(89)90085-6

Iseli, 2007, Age, sex, and vowel dependencies of acoustic measures related to the voice source, J. Acoust. Soc. Am., 121, 2283, 10.1121/1.2697522

Isshiki, 1981, Vocal efficiency index, 193

Jankowski, 1995, Measuring fine structure in speech: application to speaker identification, 325

Joseph, 2006, Extracting formants from short segments using group delay functions, 1009

Kane, 2003, Improved automatic detection of creak, Comput. Speech Lang., 27, 1028, 10.1016/j.csl.2012.11.002

Kane, 2013, Automatic manual user strategies for precise voice source analysis, Speech Commun., 55, 397, 10.1016/j.specom.2012.12.004

Kane, 2013, Evaluation of glottal closure instant detection in a range of voice qualities, Speech Commun., 55, 295, 10.1016/j.specom.2012.08.011

Kasi, 2002, Yet another algorithm for pitch tracking, 1, 361

Kasuya, 1999, Joint estimation of voice source and vocal tract parameters as applied to the study of voice source dynamics, Int. Congress of Phonetic Sciences, 2505

Kawahara, 1999, Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of f0 and periodicity, Eurospeech, 6, 2781, 10.21437/Eurospeech.1999-613

Kinnunen, 2009, On separating glottal source and vocal tract information in telephony speaker verification, 4545

Klatt, 1987, Review of text-to-speech conversion for english, J. Acoust. Soc. Am., 82, 737, 10.1121/1.395275

Kreiman, 2012, Variability in the relationships among voice quality, harmonic amplitudes, open quotient, and glottal area waveform shape in sustained phonation, J. Acoust. Soc. Am., 132, 2625, 10.1121/1.4747007

Kreiman, 2011

Krishnamurthy, 1986, Two-channel speech analysis, IEEE Trans. Audio Speech Signal Process., 34, 730, 10.1109/TASSP.1986.1164909

Kumar, 2009, Analysis of laugh signals for detecting in continuous speech., 1591

Lahat, 1987, A spectral autocorrelation method for measurement of the fundamental frequency of noise-corrupted speech, IEEE Trans. on Audio Speech and Signal Processing, 35, 741, 10.1109/TASSP.1987.1165224

Lanchantin, 2010, A hmm-based speech synthesis system using a new glottal source and vocal-tract separation method, 4630

Lauri, 1997, Effects of prolonged oral reading on time-based glottal flow waveform parameters with special reference to gender differences, Folia Phoniat. Logopaed., 49, 234, 10.1159/000266461

Laver, 1980

Li, 2012, Automatic LF-model fitting to the glottal source waveform by extended kalman filtering, EUSIPCO, 2772

Lieberman, 1963, Some acoustic measures of the fundamental periodicity of normal and pathologic larynges, J. Acoust. Soc. Am., 35, 344, 10.1121/1.1918465

Lindqvist-Gauffin, 1964, Inverse filtering. Instrumentation and techniques, 1

Lorenzo-Trueba, 2012, Towards glottal source controllability in expressive speech synthesis

Ma, 1994, A Frobenius norm approach to glottal closure detection from the speech signal, IEEE Trans. Speech Audio Process., 2, 258

Maia, 2007, An excitation model for HMM-based speech synthesis based on residual modeling

Markel, 1972, The SIFT algorithm for fundamental frequency estimation, IEEE Trans. Audio Electroacoust., 20, 367, 10.1109/TAU.1972.1162410

Mathews, 1961, Inverse filtering. instrumentation and techniques, J. Acoust. Soc. Am., 33, 179, 10.1121/1.1908614

McGowan, 1988, An aeroacoustic approach to phonation, J. Acoust. Soc. Am., 83, 696, 10.1121/1.396165

Milenkovic, 1986, Glottal inverse filtering by joint estimation of an ar system with a linear input model, IEEE Trans. Audio Speech Signal Process., 34, 28, 10.1109/TASSP.1986.1164778

Miller, 1959, Nature of the vocal cord wave, J. Acoust. Soc. Am., 31, 667, 10.1121/1.1907771

Monsen, 1977, Study of variations in the male and female glottal wave, J. Acoust. Soc. Am., 62, 981, 10.1121/1.381593

Monzo, 2007, Discriminating expressive speech styles by voice quality parameterization, ICPhS, 2081

Moore, 2008, Critical analysis of the impact of glottal features in the classification of clinical depression in speech, IEEE Trans. Biomed. Eng., 55, 96, 10.1109/TBME.2007.900562

Murphy, 1999, Perturbation-free measurement of the harmonics-to-noise ratio in voice signals using pitch synchronous harmonic analysis, J. Acoust. Soc. Am., 105, 2866, 10.1121/1.426901

Murphy, 2008, Investigation of a glottal related harmonics-to-noise ratio and spectral tilt as indicators of glottal noise in synthesized and human voice signals, J. Acoust. Soc. Am., 123, 1642, 10.1121/1.2832651

Murty, 2006, Combining evidence from residual phase and mfcc features for speaker recognition, IEEE Sig. Process. Lett., 13, 52, 10.1109/LSP.2005.860538

Murty, 2009

Murty, 2008, Epoch extraction from speech signals., IEEE Trans. Audio Speech Lang. Process., 16, 1602, 10.1109/TASL.2008.2004526

Murty, 2009, Characterization of glottal activity from speech signals, IEEE Sig. Process. Lett., 16

Nakatsui, 1970, Method of observation of glottal-source wave using digital inverse filtering in time domain, J. Acoust. Soc. Am., 47, 664, 10.1121/1.1911947

Naylor, 2007, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm, IEEE Trans. Audio Speech Lang. Process., 15, 34, 10.1109/TASL.2006.876878

Noll, 1967, Cepstrum pitch determination, J. Acoust. Soc. Am., 41, 293, 10.1121/1.1910339

Oppenheim, 1968, Homomorphic analysis of speech, IEEE TAE, 16, 221

Ozdas, 2004, Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk, IEEE Trans. Biomed. Eng., 51, 1530, 10.1109/TBME.2004.827544

Pati, 2008, Non-parametric vector quantization of excitation source information for speaker recognition, TENCON, 1

Plumpe, 1999, Modeling of the glottal flow derivative waveform with application to speaker identification, IEEE Trans. Audio Speech Lang. Process., 7, 569, 10.1109/89.784109

Pozo, 2008, The linear transformation of lf glottal waveforms for voice conversion, Interspeech, 1457, 10.21437/Interspeech.2008-420

Prasanna, 2006, Extraction of speaker-specific excitation information from linear prediction residual of speech, Speech Commun., 48, 1243, 10.1016/j.specom.2006.06.002

Prasanna, 2004

Qi, 1995, Enhancement of female esophageal and tracheoesophageal speech, J. Acoust. Soc. Am., 98, 2461, 10.1121/1.413279

Quatieri, 2002

Quatieri, 2012, Vocal-source biomarkers for depression: A link to psychomotor activity.

Raitio, 2011, HMM-based speech synthesis utilizing glottal inverse filtering, IEEE Trans. Audio Speech Lang. Process., 19, 153, 10.1109/TASL.2010.2045239

Rao, 2006, Prosody modification using instants of significant excitation, IEEE Signal Process. Lett., 14, 972

Reynolds, 2002, An overview of automatic speaker recognition technology, 4, 4072

Riegelsberger, 1993, Glottal source estimation: methods of applying the LF-model to inverse filtering, 542

Rosenberg, 1971, Effects of the glottal pulse shape on the quality of natural vowels, J. Acoust. Soc. Am., 49, 583, 10.1121/1.1912389

Rothenberg, 1973, A new inverse-filtering technique for deriving the glottal air flow waveform during voicing, J. Acoust. Soc. Am., 53, 1632, 10.1121/1.1913513

Roux, 2007, Single and multiple f0 contour estimation through parametric spectrogram modeling of speech in noisy environments, IEEE Trans. Audio Speech Lang. Process., 15, 1135, 10.1109/TASL.2007.894510

Sakaguchi, 2000, The effect of polarity inversion of speech on human perception and data hiding as application, 917

Saratxaga, 2009, Use of harmonic phase information for polarity detection in speech signals, 1075

Seshadri, 2009, Perceived loudness of speech based on the characteristics of excitation source, J. Acoust. Soc. Am., 126, 2061, 10.1121/1.3203668

Sha, 2004, Multiband statistical learning for f0 estimation in speech, 661

Sharifzadeh, 2010, Recontruction of normal sounding speech for laryngectomy patients through a modified celp codec, IEEE Trans. Biomed. Eng., 57, 10.1109/TBME.2010.2053369

Shue, 2010, A new voice source model based on high-speed imaging and its application to voice source estimation, 5134

Silva, 2009, Jitter estimation algorithms for detection of pathological voices, EURASIP J. Adv. Sig. Process., 10.1155/2009/567875

Slyh, 2004, Glottal modeling and closed-phase analysis for speaker recognition, ODYS, 315

Smits, 1995, Determination of instants of significant excitation in speech using group delay function, IEEE Trans. Speech Audio Process., 3, 325, 10.1109/89.466662

Strik, 1993, Fitting a LF-model to inverse filtered signals, Eurospeech, 103, 10.21437/Eurospeech.1993-45

Strube, 1974, Determination of the instant of glottal closure from the speech wave, J. Acoust. Soc. Am., 56, 1625, 10.1121/1.1903487

Strube, 1974, Determination of the instant of glottal closures from the speech wave, J. Acoust. Soc. Am., 56, 1625, 10.1121/1.1903487

Sturmel, 2007, A comparative evaluation of the zeros of z transform representation for voice source estimation, Interspeech, 558

Sun, 2009, Investigating glottal parameters for differentiating emotional categories with similar prosodics, IEEE ICASSP, 4509

Sundberg, 1999, Effects of subglottal pressure on professional baritone singers’ voice sources, J. Acoust. Soc. Am., 105, 1965, 10.1121/1.426731

Swamy, 2007, Determining number of speakers from multispeaker speech signals using excitation source information, IEEE Signal Process. Lett., 14, 481, 10.1109/LSP.2006.891333

Szekely, 2011, Clustering expressive speech styles in audiobooks using glottal source parameters, Interspeech, 2409, 10.21437/Interspeech.2011-627

Tahon, 2012, Usual voice quality features and glottal features for emotional valence detection

Talkin, 1995, Robust algorithm for pitch tracking, Speech Coding Synth., 497

Thomas, 2012, Estimation of glottal closing and opening instants in voiced speech using the yaga algorithm, IEEE Trans. Audio, Speech Lang. Process., 20, 82, 10.1109/TASL.2011.2157684

Timcke, 1958, Laryngeal vibrations: measurements of the glottic wave, Arch. Otolaryngol., 68, 1, 10.1001/archotol.1958.00730020005001

Titze, 1992, Vocal intensity in speakers and singers, J. Acoust. Soc. Am., 91, 2936, 10.1121/1.402929

Tsanas, 2012, Novel speech signal processing algorithms for high-accuracy classification of parkinson's disease, IEEE Trans. Biomed. Eng., 59, 1264, 10.1109/TBME.2012.2183367

Tuan, 1999, Robust glottal closure detection using the wavelet transform., 2805

van den Berg, 1958, Myoelastic-aerodynamic theory of voice production, J. Speech Hear. Res., 1, 227, 10.1044/jshr.0103.227

Vasilakis, 2009, Voice pathology detection based on short-term jitter estimations in running speech, Folia Phoniatr Logop., 61, 153, 10.1159/000219951

Veeneman, 1985, Automatic glottal inverse filtering from speech and electroglottographic signals, IEEE Trans. Audio Speech Signal Process., 33, 369, 10.1109/TASSP.1985.1164544

Veldhuis, 1998, A computationally efficient alternative for the liljencrants-fant model and its perceptual evaluation, J. Acoust. Soc. Am., 103, 566, 10.1121/1.421103

Vilkman, 2004, Occupational safety and health aspects of voice and speech professions, Folia Phoniat. Logopaed., 56, 220, 10.1159/000078344

Vilkman, 1997, Loading changes in time based parameters of glottal flow waveforms in different ergonomic conditions, Folia Phoniat. Logopaed., 49, 247, 10.1159/000266463

Walker, 2007, A review of glottal waveform analysis, Springer Lect. Notes Comput. Sci. (LNCS), 4391, 1, 10.1007/978-3-540-71505-4_1

Wong, 1979, Least squares glottal inverse filtering from the acoustic speech waveform, IEEE Trans. Audio Speech Sig. Process., 27, 350, 10.1109/TASSP.1979.1163260

Yegnanarayana, 2009, Event-based instantaneous fundamental frequency estimation from speech signals, IEEE Trans. Audio Speech Lang. Process., 17, 614, 10.1109/TASL.2008.2012194

Yegnanarayana, 2009, Event-based instantaneous fundamental frequency estimation from speech signals, IEEE Trans. Audio Speech Lang. Process., 17, 614, 10.1109/TASL.2008.2012194

Yegnanarayana, 2005, Processing of reverberant speech for time-delay estimation, IEEE Trans. Speech Audio Process., 13, 1110, 10.1109/TSA.2005.853005

Yegnanarayana, 2001, Source and system features for speaker recognition using aann models, 409

Yegnanarayana, 1998, Extraction of vocal-tract system characteristics from speech signals, IEEE Trans. Audio Speech Process., 6, 313, 10.1109/89.701359

Yoshimura, 2001, Mixed-excitation for HMM-based speech synthesis, 2259

Zen, 2009, Statistical parametric speech synthesis, Speech Commun., 51, 1039, 10.1016/j.specom.2009.04.004