Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis

Speech Communication - Tập 55 - Trang 278-294 - 2013

Gilles Degottex¹, Pierre Lanchantin¹, Axel Roebel¹, Xavier Rodet¹

¹Ircam – CNRS-UMR9912-STMS, Analysis-Synthesis Team, 1 Place Igor Stravinsky, 75004 Paris, France

Tài liệu tham khảo

Agiomyrgiannakis, Y., Rosec, O., 2009. ARX-LF-based source-filter methods for voice modification and transformation. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3589–3592.

Agiomyrgiannakis, Y., Rosec, O., 2008. Towards flexible speech coding for speech synthesis: an LF + Modulated Noise Vocoder. In: Proc. Interspeech, pp. 1849–1852.

Alku, 1999, A method for generating natural-sounding speech stimuli for cognitive brain research, Clin. Neurophysiol., 110, 1329, 10.1016/S1388-2457(99)00088-7

Assembly, T.I.R., 2003. ITU-R BS.1284-1: EN-General methods for the subjective assessment of sound quality. Technical Report. ITU.

Banno, H., Lu, J., Nakamura, S., Shikano, K., Kawahara, H., 1998. Efficient representation of short-time phase based on group delay, in: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 861–864.

Bechet, 2001, Liaphon: un système complet de phonetisation de textes, Traitement Automatique des Langues, 42, 47

Bonada, J., 2008. Voice processing and synthesis by performance sampling and spectral models. Ph.D. thesis. Universitat Pompeu Fabra. Spain.

Cabral, J.P., 2010. HMM-based speech synthesis using an Acoustic Glottal Source Model. Ph.D. thesis. CSTR, University of Edinburgh, UK.

Cabral, J., Renals, S., Richmond, K., Yamagishi, J., 2008. Glottal spectral separation for parametric speech synthesis. In: Proc. Interspeech, Brisbane, Australia, pp. 1829–1832.

Cabral, J., Renals, S., Yamagishi, J., Richmond, K., 2011. HMM-based speech synthesiser using the LF-model of the glottal source. In: IEEE Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 4704–4707.

de Cheveigne, 2002, YIN, A fundamental frequency estimator for speech and music, J. Acoust. Society Amer., 111, 1917, 10.1121/1.1458024

Degottex, G., 2010. Glottal source and vocal tract separation. Ph.D. thesis. UPMC-Ircam. France.

Degottex, 2011, Phase minimization for glottal model estimation, IEEE Trans. Audio Speech Lang. Process., 19, 1080, 10.1109/TASL.2010.2076806

Degottex, G., Roebel, A., Rodet, X., 2011b. Pitch transposition and breathiness modification using a glottal source model and its adapted vocal tract filter. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5128–5131.

del Pozo, A., Young, S., 2008. The linear transformation of LF glottal waveforms for voice conversion. In: Proc. Interspeech, pp. 1457–1460.

Drugman, T., Wilfart, G., Dutoit, T., 2009b. A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis. In: Interspeech.

Drugman, T., Bozkurt, B., Dutoit, T., 2009a. Complex cepstrum-based decomposition of speech for glottal source estimation. In: Proc. Interspeech, pp. 116–119.

Fant, 1995, The LF-model revisited. Transformations and frequency domain analysis, STL-QPSR, 36, 119

Fant, 1985, A four-parameter model of glottal flow, STL-QPSR, 26, 1

Flanagan, J.L., Golden, R.M., 1966. Phase Vocoder. Technical Report. The Bell System Technical Journal.

Gales, 1999, Semi-tied covariance matrices for hidden markov models, IEEE Trans. Speech Audio Process., 7, 272, 10.1109/89.759034

Griffin, 1988, Multiband excitation vocoder, IEEE Trans. Acoust. Speech Signal Process., 36, 1223, 10.1109/29.1651

Hamon, C., Mouline, E., Charpentier, F., 1989. A diphone synthesis system based on time-domain prosodic modifications of speech. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 238–241.

Hedelin, P., 1984. A glottal LPC-vocoder. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 21–24.

Henrich, N., 2001. Etude de la source glottique en voix parlée et chantée. Ph.D. thesis. UPMC, France (In French).

Hermes, 1991, Synthesis of breathy vowels: some research methods, Speech Comm., 10, 497, 10.1016/0167-6393(91)90053-V

Imai, 1979, Spectral envelope extraction by improved cepstral method, Electron. Comm., 10

Kawahara, H., Estill, J., Fujimura, O., 2001. Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT. In: MAVEBA.

Kawahara, 1999, Restructuring speech representations using a pitch-adaptative time-frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds, Speech Comm., 27, 187, 10.1016/S0167-6393(98)00085-5

Kim, 2007, Two-band excitation for HMM-based speech synthesis, IEICE – Trans. Inf. Systems, 378, 10.1093/ietisy/e90-1.1.378

Lanchantin, P., Morris, A.C., Rodet, X., Veaux, C., 2008. Automatic phoneme segmentation with relaxed textual constraints. In: Proc. Language Resources and Evaluation Conference, pp. 2403–2407.

Lanchantin, P., Degottex, G., Rodet, X., 2010. A HMM-based speech synthesis system using a new glottal source and vocal tract separation method. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, USA, pp. 4630–4633.

Laroche, J., Stylianou, Y., Moulines, E., 1993. HNS: Speech modification based on a harmonic+noise model. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 550–553.

Markel, 1976

McAulay, 1986, Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech Signal Process., 34, 744, 10.1109/TASSP.1986.1164910

Mehta, D., Quatieri, T.F., 2005. Synthesis, analysis, and pitch modification of the breathy vowel. In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 199–202.

Miller, 1959, Nature of the vocal cord wave, J. Acoust. Soc. Amer., 31, 667, 10.1121/1.1907771

Oppenheim, 1968, Nonlinear filtering of multiplied and convolved signals, Proc. IEEE, 56, 1264, 10.1109/PROC.1968.6570

Pantazis, 2010, Adaptive AM–FM signal decomposition with application to speech analysis, IEEE Trans. Audio Speech Lang. Process., 19, 290, 10.1109/TASL.2010.2047682

Peeters, G., 2001. Modeles et modification du signal sonore adaptees a ses caracteristiques locales. Ph.D. thesis. UPMC, France (In French).

Raitio, 2011, HMM-based speech synthesis utilizing glottal inverse filtering, IEEE Trans. Audio Speech Lang. Process., 19, 153, 10.1109/TASL.2010.2045239

Rodet, 1984, The CHANT project: from synthesis of the singing voice to synthesis in general, Comput. Music J., 8, 15, 10.2307/3679810

Roebel, 2007, On cepstral and all-pole based spectral envelope modeling with unknown model order, Pattern Recognition Lett., 28, 1343, 10.1016/j.patrec.2006.11.021

Stevens, 1971, Airflow and turbulence noise for fricative and stop consonants: static considerations, J. Acoust. Soc. Amer., 50, 1180, 10.1121/1.1912751

Stylianou, Y., 1996. Harmonic plus noise models for speech combined with statistical methods, for Speech and Speaker Modification. Ph.D. thesis. TelecomParis. France.

Stylianou, 2001, Applying the harmonic plus noise model in concatenative speech synthesis, IEEE Trans. Speech Audio Process., 9, 21, 10.1109/89.890068

Tokuda, K., Masuko, T., Yamada, T., Kobayashi, T., Imai, S., 1995. An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features. In: Proc. Eurospeech, pp. 757–760.

Tokuda, 2002, Multi-space probability distribution HMM, IEICE Trans. Inf. Systems, E85-D, 455

Tokuda, K., Zen, H., Black, A., 2002b. An HMM-based speech synthesis system applied to English. In: Proc. IEEE Workshop on Speech synthesis, pp. 227–230.

Tooher, M., McKenna, J.G., 2003. Variation of the glottal LF parameters across F0, vowels, and phonetic environment. In: Proc. ISCA Voice Quality: Functions, Analysis and Synthesis (VOQUAL), pp. 41–46.

Valbret, H., Moulines, E., Tubach, J., 1992. Voice transformation using PSOLA technique. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 145–148.

Vincent, D., Rosec, O., Chonavel, T., 2007. A new method for speech synthesis and transformation based on an ARX-LF source-filter decomposition and HNM modeling. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 525–528.

Yeh, C., 2008. Multiple fundamental frequency estimation of polyphonic recordings. Ph.D. thesis. UPMC-Ircam. France.

Young, S., 1994. The HTK hidden markov model toolkit: design and philosophy. Technical Report. University of Cambridge.

Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T., 2004. Hidden semi-Markov model based speech synthesis. In: Proc. of ICSLP.

Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A., Tokuda, K., 2007. The HMM-based speech synthesis system (HTS) version 2.0. In: Proc. ISCA Workshop on Speech Synthesis (SSW). <http://hts.sp.nitech.ac.jp>.

Zivanovic, 2008, Adaptive threshold determination for spectral peak classification, Comput. Music J., 32, 57, 10.1162/comj.2008.32.2.57

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA