Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis

Speech Communication - Tập 55 - Trang 278-294 - 2013
Gilles Degottex1, Pierre Lanchantin1, Axel Roebel1, Xavier Rodet1
1Ircam – CNRS-UMR9912-STMS, Analysis-Synthesis Team, 1 Place Igor Stravinsky, 75004 Paris, France

Tài liệu tham khảo

Alku, 1999, A method for generating natural-sounding speech stimuli for cognitive brain research, Clin. Neurophysiol., 110, 1329, 10.1016/S1388-2457(99)00088-7

Assembly, T.I.R., 2003. ITU-R BS.1284-1: EN-General methods for the subjective assessment of sound quality. Technical Report. ITU.

Bechet, 2001, Liaphon: un système complet de phonetisation de textes, Traitement Automatique des Langues, 42, 47

de Cheveigne, 2002, YIN, A fundamental frequency estimator for speech and music, J. Acoust. Society Amer., 111, 1917, 10.1121/1.1458024

Degottex, 2011, Phase minimization for glottal model estimation, IEEE Trans. Audio Speech Lang. Process., 19, 1080, 10.1109/TASL.2010.2076806

Fant, 1995, The LF-model revisited. Transformations and frequency domain analysis, STL-QPSR, 36, 119

Flanagan, J.L., Golden, R.M., 1966. Phase Vocoder. Technical Report. The Bell System Technical Journal.

Gales, 1999, Semi-tied covariance matrices for hidden markov models, IEEE Trans. Speech Audio Process., 7, 272, 10.1109/89.759034

Griffin, 1988, Multiband excitation vocoder, IEEE Trans. Acoust. Speech Signal Process., 36, 1223, 10.1109/29.1651

Henrich, N., 2001. Etude de la source glottique en voix parlée et chantée. Ph.D. thesis. UPMC, France (In French).

Hermes, 1991, Synthesis of breathy vowels: some research methods, Speech Comm., 10, 497, 10.1016/0167-6393(91)90053-V

Imai, 1979, Spectral envelope extraction by improved cepstral method, Electron. Comm., 10

Kawahara, 1999, Restructuring speech representations using a pitch-adaptative time-frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds, Speech Comm., 27, 187, 10.1016/S0167-6393(98)00085-5

Kim, 2007, Two-band excitation for HMM-based speech synthesis, IEICE – Trans. Inf. Systems, 378, 10.1093/ietisy/e90-1.1.378

Markel, 1976

McAulay, 1986, Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech Signal Process., 34, 744, 10.1109/TASSP.1986.1164910

Miller, 1959, Nature of the vocal cord wave, J. Acoust. Soc. Amer., 31, 667, 10.1121/1.1907771

Oppenheim, 1968, Nonlinear filtering of multiplied and convolved signals, Proc. IEEE, 56, 1264, 10.1109/PROC.1968.6570

Pantazis, 2010, Adaptive AM–FM signal decomposition with application to speech analysis, IEEE Trans. Audio Speech Lang. Process., 19, 290, 10.1109/TASL.2010.2047682

Peeters, G., 2001. Modeles et modification du signal sonore adaptees a ses caracteristiques locales. Ph.D. thesis. UPMC, France (In French).

Raitio, 2011, HMM-based speech synthesis utilizing glottal inverse filtering, IEEE Trans. Audio Speech Lang. Process., 19, 153, 10.1109/TASL.2010.2045239

Rodet, 1984, The CHANT project: from synthesis of the singing voice to synthesis in general, Comput. Music J., 8, 15, 10.2307/3679810

Roebel, 2007, On cepstral and all-pole based spectral envelope modeling with unknown model order, Pattern Recognition Lett., 28, 1343, 10.1016/j.patrec.2006.11.021

Stevens, 1971, Airflow and turbulence noise for fricative and stop consonants: static considerations, J. Acoust. Soc. Amer., 50, 1180, 10.1121/1.1912751

Stylianou, 2001, Applying the harmonic plus noise model in concatenative speech synthesis, IEEE Trans. Speech Audio Process., 9, 21, 10.1109/89.890068

Tooher, M., McKenna, J.G., 2003. Variation of the glottal LF parameters across F0, vowels, and phonetic environment. In: Proc. ISCA Voice Quality: Functions, Analysis and Synthesis (VOQUAL), pp. 41–46.

Yeh, C., 2008. Multiple fundamental frequency estimation of polyphonic recordings. Ph.D. thesis. UPMC-Ircam. France.

Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A., Tokuda, K., 2007. The HMM-based speech synthesis system (HTS) version 2.0. In: Proc. ISCA Workshop on Speech Synthesis (SSW). <http://hts.sp.nitech.ac.jp>.

Zivanovic, 2008, Adaptive threshold determination for spectral peak classification, Comput. Music J., 32, 57, 10.1162/comj.2008.32.2.57