Glottal source processing: From analysis to applications
Tóm tắt
Từ khóa
Tài liệu tham khảo
Agiomyrgiannakis, 2009, ARX-LF-based source-filter methods for voice modification and transformation, 3589
Akande, 2005, Estimation of the vocal tract transfer function with application to glottal wave analysis, Speech Commun., 46, 15, 10.1016/j.specom.2005.01.007
Alku, 1992, Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering, Speech Commun., 11, 109, 10.1016/0167-6393(92)90005-R
Alku, 2011, Glottal inverse filtering analysis of human voice production – a review of estimation and parameterization methods of the glottal excitation and their applications, Sadhana, 36, 623, 10.1007/s12046-011-0041-5
Alku, 2002, Normalized amplitude quotient for parameterization of the glottal flow, J. Acoust. Soc. Am., 112, 701, 10.1121/1.1490365
Alku, 2009, Closed phase covariance analysis based on constrained linear prediction for glottal inverse filtering, J. Acoust. Soc. Am., 120, 3289, 10.1121/1.3095801
Alku, 1997, Parabolic spectral parameter – a new method for quantification of the glottal flow, Speech Commun., 22, 67, 10.1016/S0167-6393(97)00020-4
Alku, 1996, Amplitude domain quotient for characterization of the glottal volume velocity waveform estimated by inverse filtering, Speech Commun., 18, 131, 10.1016/0167-6393(95)00040-2
Ananthapadmanabha, 1979, Epoch extraction from linear prediction residual for identification of closed glottis interval, IEEE Trans. Speech Audio Process., 27, 309, 10.1109/TASSP.1979.1163267
Avanzini, 2008, Simulation of vocal fold oscillation with a pseudo-one-mass physical model, Speech Commun., 50, 95, 10.1016/j.specom.2007.07.002
Bapineedu, 2009, Analysis of Lombard speech using excitation source information, IEEE Trans. Audio Speech Lang. Process., 1091
Barra, 2007, On the limitations of voice conversion techniques in emotion identification tasks
Berezina, 2010, Autoregressive modeling of voiced speech, 5042
Boersma, 2001, Praat, a system for doing phonetics by computer, Glot Int., 5, 341
Bozkurt, 2007, Chirp group delay analysis of speech signals, Speech Commun., 49, 159, 10.1016/j.specom.2006.12.004
Bozkurt, 2005, Zeros of z-transform representation with application to source-filter separation in speech, IEEE Sig. Process. Lett., 12, 344, 10.1109/LSP.2005.843770
Cabral, 2005, Pitch-synchronous time-scaling for prosodic and voice quality transformations, 1137
Cabral, 2007, Towards an improved modeling of the glottal source in statistical parametric speech synthesis
Cabral, 2008, Glottal spectral separation for parametric speech synthesis, 1829
Chen, 2012, Estimating the voice source in noise
Chetouani, 2009, Investigation on lp-residual representations for speaker identification, Pattern Recogn., 42, 487, 10.1016/j.patcog.2008.08.008
Childers, 1995, Glottal source modeling for voice conversion, Speech Commun., 16, 127, 10.1016/0167-6393(94)00050-K
Childers, 1991, Vocal quality factors: analysis, synthesis, and perception, J. Acoust. Soc. Am., 90, 2394, 10.1121/1.402044
Chu, 2012, Safe: A statistical approach to f0 estimation under clean and noisy conditions, IEEE Trans. Audio Speech Lang. Process., 20, 933, 10.1109/TASL.2011.2168518
de Cheveigne, 1991, Speech f0 extraction based on lickliders pitch perception model, ICPhS, 218
de Cheveigne, 2002, Yin, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am., 111, 1917, 10.1121/1.1458024
Degottex, 2011, Phase minimization for glottal model estimation, IEEE Trans. Audio Speech Lang. Process., 19, 1080, 10.1109/TASL.2010.2076806
Degottex, 2011, Pitch transposition and breathiness modification using a glottal source model and its adapted vocal-tract filter, 5128
del Pozo, 2006, Continuous tracheoesophageal speech repair
Ding, 1998, Determining polarity of speech signals based on gradient of spurious glottal waveforms, 857
Drioli, 2005, A flow waveform-matched low-dimensional glottal model based on physical knowledge, J. Acoust. Soc. Am., 117, 3184, 10.1121/1.1861234
Dromey, 1992, Glottal airflow and electroglottographic measures of vocal function at multiple intensities, J. Voice, 6, 44, 10.1016/S0892-1997(05)80008-6
Drugman, 2013, Residual excitation skewness for automatic speech polarity detection, IEEE Sig. Process. Lett., 20, 387, 10.1109/LSP.2013.2249661
Drugman, 2011, Joint robust voicing detection and pitch estimation based on residual harmonics, Interspeech, 1973, 10.21437/Interspeech.2011-519
Drugman, 2009, Glottal closure and opening instant detection from speech signals, 2891
Drugman, 2010, A comparative evaluation of pitch modification techniques, EUSIPCO
Drugman, 2010, Glottal-based analysis of the Lombard effect, Interspeech, 2610, 10.21437/Interspeech.2010-257
Drugman, 2010, On the potential of glottal signatures for speaker recognition, Interspeech, 10.21437/Interspeech.2010-156
Drugman, 2012, Detecting speech polarity with high-order statistics, Cognitive Computation Journal
Drugman, 2012, The deterministic plus stochastic model of the residual signal and its applications, IEEE Trans. on Audio Speech and Language Processing, 20, 968, 10.1109/TASL.2011.2169787
Drugman, 2009, Chirp decomposition of speech signals for glottal source estimation.
Drugman, 2009, Complex cepstrum-based decomposition of speech for glottal source estimation, Interspeech, 116, 10.21437/Interspeech.2009-27
Drugman, 2011, Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation, Speech Commun., 53, 855, 10.1016/j.specom.2011.02.004
Drugman, 2012, A comparative study of glottal source estimation techniques, Computer Speech and Language, 26, 20, 10.1016/j.csl.2011.03.003
Drugman, 2009, On the mutual information between source and filter contributions for voice pathology detection, Interspeech, 1463, 10.21437/Interspeech.2009-447
Drugman, 2011, Phase-based information for voice pathology detection, 4612
Drugman, 2012, Modeling the creaky excitation for parametric speech synthesis, Interspeech, 10.21437/Interspeech.2012-364
Drugman, 2012, Detection of glottal closure instants from speech signals: a quantitative review, IEEE Trans. on Audio Speech and Language Processing, 20, 994, 10.1109/TASL.2011.2170835
Drugman, 2009, A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis, Interspeech, 10.21437/Interspeech.2009-148
Drugman, 2009, Using a pitch-synchronous residual codebook for hybrid HMM/frame selection speech synthesis, 3793
El-Jaroudi, 1991, Discrete all-pole modeling, IEEE Trans. on Signal Processing, 39, 411, 10.1109/78.80824
Fant, 1961, A new anti-resonance circuit for inverse filtering, Speech Transmission Laboratory Quarterly Progress and Status Report, 2, 1
Fant, 1970, 15
Fant, 1995, The LF-model revisited transformations and frequency domain analysis., 119
Fant, 1962, Indirect studies of glottal cycles by synchronous inverse filtering and photo-electrical glottography, Speech Transmission Laboratory Quarterly Progress and Status Report, 3, 1
Fant, 1985, A four-parameter model of glottal flow, STL-QPSR, 26, 1
Fant, 1985, A four-parameter model of glottal flow, Speech Transmission Laboratory Quarterly Progress and Status Report, 26, 1
Frohlich, 2001, Sim simultaneous inverse filtering and matching of a glottal flow model for acoustic speech signals, J. Acoust. Soc. Am., 110, 479, 10.1121/1.1379076
Frokjaer-Jensen, 1973, Registration of voice quality, Bruel&Kjaer Technical Review, 3, 3
Fu, 2006, Robust glottal source estimation based on joint source-filter model optimization, IEEE Trans. on Audio Speech and Language Processing, 14, 492, 10.1109/TSA.2005.857807
Fujisaki, 1986, Proposal and evaluation of models for the glottal source waveform, 1605
Ghosh, 2011, Joint source-filter optimization for robust glottal source estimation in the presence of shimmer and jitter, Speech Commun., 53, 98, 10.1016/j.specom.2010.07.004
Gobl, 2003, Amplitude-based source parameters for measuring voice quality, ISCA VOQUAL, 151
Gold, 1969, Parallel processing techniques for estimating pitch periods of speech in the time domain, J. Acoust. Soc. Am., 46, 442, 10.1121/1.1911709
Gomez-Vilda, 2009, Glottal source biometrical signature for voice pathology detection, Speech Commun., 51, 10.1016/j.specom.2008.09.005
Gordon, 2001, Phonation types: a cross-linguistic overview, J. Phonet., 29, 383, 10.1006/jpho.2001.0147
Govind, 2011, Neutral to target emotion conversion using source and suprasegmental information, Interspeech, 2969, 10.21437/Interspeech.2011-743
Granqvist, 2003, Simultaneous analysis of vocal fold vibration and transglottal airflow: exploring a new experimental set-up, J. Voice, 17, 319, 10.1067/S0892-1997(03)00070-5
Gudnason, 2008, Voice source cepstrum coefficients for speaker identification, 4821
Gudnason, 2012, Data-driven voice source waveform analysis and synthesis, Speech Commun., 54, 199, 10.1016/j.specom.2011.08.003
Guerchi, 2000, Low-rate quantization of spectral information in a 4kb/spitch-synchronous CELP coder, 111
Hedelin, 1986, High quality glottal lpc-vocoding, 11, 465
Howell, 1992, Acoustic analysis and perception of vowels in children's and teenagers’ stuttered speech, J. Acoust. Soc. Am., 91, 1697, 10.1121/1.402449
Isaksson, 1989, Inverse glottal filtering using a parameterized input model, Signal Processing, 18, 435, 10.1016/0165-1684(89)90085-6
Iseli, 2007, Age, sex, and vowel dependencies of acoustic measures related to the voice source, J. Acoust. Soc. Am., 121, 2283, 10.1121/1.2697522
Isshiki, 1981, Vocal efficiency index, 193
Jankowski, 1995, Measuring fine structure in speech: application to speaker identification, 325
Joseph, 2006, Extracting formants from short segments using group delay functions, 1009
Kane, 2003, Improved automatic detection of creak, Comput. Speech Lang., 27, 1028, 10.1016/j.csl.2012.11.002
Kane, 2013, Automatic manual user strategies for precise voice source analysis, Speech Commun., 55, 397, 10.1016/j.specom.2012.12.004
Kane, 2013, Evaluation of glottal closure instant detection in a range of voice qualities, Speech Commun., 55, 295, 10.1016/j.specom.2012.08.011
Kasi, 2002, Yet another algorithm for pitch tracking, 1, 361
Kasuya, 1999, Joint estimation of voice source and vocal tract parameters as applied to the study of voice source dynamics, Int. Congress of Phonetic Sciences, 2505
Kawahara, 1999, Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of f0 and periodicity, Eurospeech, 6, 2781, 10.21437/Eurospeech.1999-613
Kinnunen, 2009, On separating glottal source and vocal tract information in telephony speaker verification, 4545
Klatt, 1987, Review of text-to-speech conversion for english, J. Acoust. Soc. Am., 82, 737, 10.1121/1.395275
Kreiman, 2012, Variability in the relationships among voice quality, harmonic amplitudes, open quotient, and glottal area waveform shape in sustained phonation, J. Acoust. Soc. Am., 132, 2625, 10.1121/1.4747007
Kreiman, 2011
Krishnamurthy, 1986, Two-channel speech analysis, IEEE Trans. Audio Speech Signal Process., 34, 730, 10.1109/TASSP.1986.1164909
Kumar, 2009, Analysis of laugh signals for detecting in continuous speech., 1591
Lahat, 1987, A spectral autocorrelation method for measurement of the fundamental frequency of noise-corrupted speech, IEEE Trans. on Audio Speech and Signal Processing, 35, 741, 10.1109/TASSP.1987.1165224
Lanchantin, 2010, A hmm-based speech synthesis system using a new glottal source and vocal-tract separation method, 4630
Lauri, 1997, Effects of prolonged oral reading on time-based glottal flow waveform parameters with special reference to gender differences, Folia Phoniat. Logopaed., 49, 234, 10.1159/000266461
Laver, 1980
Li, 2012, Automatic LF-model fitting to the glottal source waveform by extended kalman filtering, EUSIPCO, 2772
Lieberman, 1963, Some acoustic measures of the fundamental periodicity of normal and pathologic larynges, J. Acoust. Soc. Am., 35, 344, 10.1121/1.1918465
Lindqvist-Gauffin, 1964, Inverse filtering. Instrumentation and techniques, 1
Lorenzo-Trueba, 2012, Towards glottal source controllability in expressive speech synthesis
Ma, 1994, A Frobenius norm approach to glottal closure detection from the speech signal, IEEE Trans. Speech Audio Process., 2, 258
Maia, 2007, An excitation model for HMM-based speech synthesis based on residual modeling
Markel, 1972, The SIFT algorithm for fundamental frequency estimation, IEEE Trans. Audio Electroacoust., 20, 367, 10.1109/TAU.1972.1162410
Mathews, 1961, Inverse filtering. instrumentation and techniques, J. Acoust. Soc. Am., 33, 179, 10.1121/1.1908614
McGowan, 1988, An aeroacoustic approach to phonation, J. Acoust. Soc. Am., 83, 696, 10.1121/1.396165
Milenkovic, 1986, Glottal inverse filtering by joint estimation of an ar system with a linear input model, IEEE Trans. Audio Speech Signal Process., 34, 28, 10.1109/TASSP.1986.1164778
Monsen, 1977, Study of variations in the male and female glottal wave, J. Acoust. Soc. Am., 62, 981, 10.1121/1.381593
Monzo, 2007, Discriminating expressive speech styles by voice quality parameterization, ICPhS, 2081
Moore, 2008, Critical analysis of the impact of glottal features in the classification of clinical depression in speech, IEEE Trans. Biomed. Eng., 55, 96, 10.1109/TBME.2007.900562
Murphy, 1999, Perturbation-free measurement of the harmonics-to-noise ratio in voice signals using pitch synchronous harmonic analysis, J. Acoust. Soc. Am., 105, 2866, 10.1121/1.426901
Murphy, 2008, Investigation of a glottal related harmonics-to-noise ratio and spectral tilt as indicators of glottal noise in synthesized and human voice signals, J. Acoust. Soc. Am., 123, 1642, 10.1121/1.2832651
Murty, 2006, Combining evidence from residual phase and mfcc features for speaker recognition, IEEE Sig. Process. Lett., 13, 52, 10.1109/LSP.2005.860538
Murty, 2009
Murty, 2008, Epoch extraction from speech signals., IEEE Trans. Audio Speech Lang. Process., 16, 1602, 10.1109/TASL.2008.2004526
Murty, 2009, Characterization of glottal activity from speech signals, IEEE Sig. Process. Lett., 16
Nakatsui, 1970, Method of observation of glottal-source wave using digital inverse filtering in time domain, J. Acoust. Soc. Am., 47, 664, 10.1121/1.1911947
Naylor, 2007, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm, IEEE Trans. Audio Speech Lang. Process., 15, 34, 10.1109/TASL.2006.876878
Oppenheim, 1968, Homomorphic analysis of speech, IEEE TAE, 16, 221
Ozdas, 2004, Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk, IEEE Trans. Biomed. Eng., 51, 1530, 10.1109/TBME.2004.827544
Pati, 2008, Non-parametric vector quantization of excitation source information for speaker recognition, TENCON, 1
Plumpe, 1999, Modeling of the glottal flow derivative waveform with application to speaker identification, IEEE Trans. Audio Speech Lang. Process., 7, 569, 10.1109/89.784109
Pozo, 2008, The linear transformation of lf glottal waveforms for voice conversion, Interspeech, 1457, 10.21437/Interspeech.2008-420
Prasanna, 2006, Extraction of speaker-specific excitation information from linear prediction residual of speech, Speech Commun., 48, 1243, 10.1016/j.specom.2006.06.002
Prasanna, 2004
Qi, 1995, Enhancement of female esophageal and tracheoesophageal speech, J. Acoust. Soc. Am., 98, 2461, 10.1121/1.413279
Quatieri, 2002
Quatieri, 2012, Vocal-source biomarkers for depression: A link to psychomotor activity.
Raitio, 2011, HMM-based speech synthesis utilizing glottal inverse filtering, IEEE Trans. Audio Speech Lang. Process., 19, 153, 10.1109/TASL.2010.2045239
Rao, 2006, Prosody modification using instants of significant excitation, IEEE Signal Process. Lett., 14, 972
Reynolds, 2002, An overview of automatic speaker recognition technology, 4, 4072
Riegelsberger, 1993, Glottal source estimation: methods of applying the LF-model to inverse filtering, 542
Rosenberg, 1971, Effects of the glottal pulse shape on the quality of natural vowels, J. Acoust. Soc. Am., 49, 583, 10.1121/1.1912389
Rothenberg, 1973, A new inverse-filtering technique for deriving the glottal air flow waveform during voicing, J. Acoust. Soc. Am., 53, 1632, 10.1121/1.1913513
Roux, 2007, Single and multiple f0 contour estimation through parametric spectrogram modeling of speech in noisy environments, IEEE Trans. Audio Speech Lang. Process., 15, 1135, 10.1109/TASL.2007.894510
Sakaguchi, 2000, The effect of polarity inversion of speech on human perception and data hiding as application, 917
Saratxaga, 2009, Use of harmonic phase information for polarity detection in speech signals, 1075
Seshadri, 2009, Perceived loudness of speech based on the characteristics of excitation source, J. Acoust. Soc. Am., 126, 2061, 10.1121/1.3203668
Sha, 2004, Multiband statistical learning for f0 estimation in speech, 661
Sharifzadeh, 2010, Recontruction of normal sounding speech for laryngectomy patients through a modified celp codec, IEEE Trans. Biomed. Eng., 57, 10.1109/TBME.2010.2053369
Shue, 2010, A new voice source model based on high-speed imaging and its application to voice source estimation, 5134
Silva, 2009, Jitter estimation algorithms for detection of pathological voices, EURASIP J. Adv. Sig. Process., 10.1155/2009/567875
Slyh, 2004, Glottal modeling and closed-phase analysis for speaker recognition, ODYS, 315
Smits, 1995, Determination of instants of significant excitation in speech using group delay function, IEEE Trans. Speech Audio Process., 3, 325, 10.1109/89.466662
Strik, 1993, Fitting a LF-model to inverse filtered signals, Eurospeech, 103, 10.21437/Eurospeech.1993-45
Strube, 1974, Determination of the instant of glottal closure from the speech wave, J. Acoust. Soc. Am., 56, 1625, 10.1121/1.1903487
Strube, 1974, Determination of the instant of glottal closures from the speech wave, J. Acoust. Soc. Am., 56, 1625, 10.1121/1.1903487
Sturmel, 2007, A comparative evaluation of the zeros of z transform representation for voice source estimation, Interspeech, 558
Sun, 2009, Investigating glottal parameters for differentiating emotional categories with similar prosodics, IEEE ICASSP, 4509
Sundberg, 1999, Effects of subglottal pressure on professional baritone singers’ voice sources, J. Acoust. Soc. Am., 105, 1965, 10.1121/1.426731
Swamy, 2007, Determining number of speakers from multispeaker speech signals using excitation source information, IEEE Signal Process. Lett., 14, 481, 10.1109/LSP.2006.891333
Szekely, 2011, Clustering expressive speech styles in audiobooks using glottal source parameters, Interspeech, 2409, 10.21437/Interspeech.2011-627
Tahon, 2012, Usual voice quality features and glottal features for emotional valence detection
Talkin, 1995, Robust algorithm for pitch tracking, Speech Coding Synth., 497
Thomas, 2012, Estimation of glottal closing and opening instants in voiced speech using the yaga algorithm, IEEE Trans. Audio, Speech Lang. Process., 20, 82, 10.1109/TASL.2011.2157684
Timcke, 1958, Laryngeal vibrations: measurements of the glottic wave, Arch. Otolaryngol., 68, 1, 10.1001/archotol.1958.00730020005001
Titze, 1992, Vocal intensity in speakers and singers, J. Acoust. Soc. Am., 91, 2936, 10.1121/1.402929
Tsanas, 2012, Novel speech signal processing algorithms for high-accuracy classification of parkinson's disease, IEEE Trans. Biomed. Eng., 59, 1264, 10.1109/TBME.2012.2183367
Tuan, 1999, Robust glottal closure detection using the wavelet transform., 2805
van den Berg, 1958, Myoelastic-aerodynamic theory of voice production, J. Speech Hear. Res., 1, 227, 10.1044/jshr.0103.227
Vasilakis, 2009, Voice pathology detection based on short-term jitter estimations in running speech, Folia Phoniatr Logop., 61, 153, 10.1159/000219951
Veeneman, 1985, Automatic glottal inverse filtering from speech and electroglottographic signals, IEEE Trans. Audio Speech Signal Process., 33, 369, 10.1109/TASSP.1985.1164544
Veldhuis, 1998, A computationally efficient alternative for the liljencrants-fant model and its perceptual evaluation, J. Acoust. Soc. Am., 103, 566, 10.1121/1.421103
Vilkman, 2004, Occupational safety and health aspects of voice and speech professions, Folia Phoniat. Logopaed., 56, 220, 10.1159/000078344
Vilkman, 1997, Loading changes in time based parameters of glottal flow waveforms in different ergonomic conditions, Folia Phoniat. Logopaed., 49, 247, 10.1159/000266463
Walker, 2007, A review of glottal waveform analysis, Springer Lect. Notes Comput. Sci. (LNCS), 4391, 1, 10.1007/978-3-540-71505-4_1
Wong, 1979, Least squares glottal inverse filtering from the acoustic speech waveform, IEEE Trans. Audio Speech Sig. Process., 27, 350, 10.1109/TASSP.1979.1163260
Yegnanarayana, 2009, Event-based instantaneous fundamental frequency estimation from speech signals, IEEE Trans. Audio Speech Lang. Process., 17, 614, 10.1109/TASL.2008.2012194
Yegnanarayana, 2009, Event-based instantaneous fundamental frequency estimation from speech signals, IEEE Trans. Audio Speech Lang. Process., 17, 614, 10.1109/TASL.2008.2012194
Yegnanarayana, 2005, Processing of reverberant speech for time-delay estimation, IEEE Trans. Speech Audio Process., 13, 1110, 10.1109/TSA.2005.853005
Yegnanarayana, 2001, Source and system features for speaker recognition using aann models, 409
Yegnanarayana, 1998, Extraction of vocal-tract system characteristics from speech signals, IEEE Trans. Audio Speech Process., 6, 313, 10.1109/89.701359
Yoshimura, 2001, Mixed-excitation for HMM-based speech synthesis, 2259