Excitation modelling using epoch features for statistical parametric speech synthesis

Computer Speech & Language - Tập 60 - Trang 101029 - 2020

M Kiran Reddy¹, K Sreenivasa Rao¹

¹Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, India

Tài liệu tham khảo

Adiga, 2013, Significance of instants of significant excitation for source modelling, 1677 Airaksinen, 2018, A comparison between STRAIGHT, glottal, and sinusoidal vocoding in statistical parametric speech synthesis, IEEE/ACM Trans. Audio Speech Lang. Process., 26, 1658, 10.1109/TASLP.2018.2835720 Cabral, 2013, Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification, 1082 Cabral, 2011, HMM-based speech synthesiser using the LF-model of the glottal source, 4704 Csapó, 2014, Statistical parametric speech synthesis with a novel codebook-based excitation model, Intell. Decis. Technol., 8, 289, 10.3233/IDT-140197 Cui, 2018, A new glottal neural vocoder for speech synthesis, 2017 CMU Arctic Speech Synthesis Databases. (online). Available: http://festvox.org/cmu_arctic/. Drugman, 2012, The deterministic plus stochastic model of the residual signal and its applications, IEEE Trans. Audio Speech Lang. Process., 20, 968, 10.1109/TASL.2011.2169787 Drugman, 2009, Using a pitch-synchronous residual codebook for hybrid HMM/frame selection speech synthesis, 3793 Haque, 2017, Modification of energy spectra, epoch parameters and prosody for emotion conversion, Int. J. Speech Technol., 20, 15, 10.1007/s10772-016-9386-9 HMM-based Speech Synthesis System (HTS). (online). Available: http://hts.sp.nitech.ac.jp/. Hwang, 2018, A unified framework for the generation of glottal signals in deep learning-based parametric speech synthesis systems, 912 Kadiri, 2015, Analysis of excitation source features of speech for emotion recognition, 1324 Kawahara, 1999, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency based f0 extraction: possible role of a repetitive structure in sounds, Speech Commun., 27, 187, 10.1016/S0167-6393(98)00085-5 Kim, 2007, Two-band excitation for HMM-based speech synthesis, IEICE Trans. Inf. Syst., E90-D, 378, 10.1093/ietisy/e90-1.1.378 King, 2011, The blizzard challenge 2011 Koishida, 2000, A 16kbit/s wideband CELP-based speech coder using mel-generalized cepstral analysis, IEICE Trans. Inf. Syst., E83-D, 876 Koolagudi, 2012, Recognition of emotions from speech using excitation source features, Procedia Eng., 38, 3409, 10.1016/j.proeng.2012.06.394 Krom, 1993, A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals, J. Speech Hear. Res., 36, 254, 10.1044/jshr.3602.254 Ling, 2015, Deep learning for acoustic modelling in parametric speech generation: a systematic review of existing techniques and future trends, IEEE Signal Process. Mag., 32, 35, 10.1109/MSP.2014.2359987 Maia, 2007, An excitation model for HMM-based speech synthesis based on residual modelling Murty, 2008, Epoch extraction from speech signals, IEEE Trans. Audio Speech Lang. Process., 16, 1602, 10.1109/TASL.2008.2004526 Narendra, 2017, Parameterization of excitation signal for improving the quality of HMM-based speech synthesis system, Circuits Syst. Signal Process., 36, 3650, 10.1007/s00034-016-0476-3 Perceptual evaluation of speech quality (PESQ), 2000. An objective method for end-to-end speech quality assessment of narrow band telephone networks and speech codecs, ITU-T Draft Recommendation P.862. Raitio, 2014, Voice source modelling using deep neural networks for statistical parametric speech synthesis, 2290 Raitio, 2011, Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis, 4564 Raitio, 2011, HMM-based speech synthesis utilizing glottal inverse filtering, IEEE Trans. Audio Speech Lang. Process., 19, 153, 10.1109/TASL.2010.2045239 Reddy, 2017, Robust pitch extraction method for the HMM-based speech synthesis system, IEEE Signal Process. Lett., 24, 1133, 10.1109/LSP.2017.2712646 Reddy, 2018, Inverse filter based excitation model for HMM-based speech synthesis system, IET Signal Process., 12, 544, 10.1049/iet-spr.2017.0546 Seshadri, 2009, Perceived loudness of speech based on the characteristics of glottal excitation source, J. Acoust. Soc. Am., 126, 2061, 10.1121/1.3203668 Shen, 2018, Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions, 4779 Shinoda, 2001, MDL-based context-dependent subword modelling for speech recognition, Acoust. Sci. Technol., 21, 79 Tamamori, 2017, Speaker-dependent wavenet vocoder, 1118 Thati, 2012, Analysis of breathy voice based on excitation characteristics of speech production, 1 Toda, 2007, A speech parameter generation algorithm considering global variance for HMM-based speech synthesis, IEICE Trans. Inform. Syst., E90-D, 816, 10.1093/ietisy/e90-d.5.816 Tokuda, 2013, Speech synthesis based on hidden markov models, Proc. IEEE, 101, 1234, 10.1109/JPROC.2013.2251852 Wakita, 1976, Residual energy of linear prediction applied to vowel and speaker recognition, IEEE Trans. Acoust. Speech Signal Process., 24, 270, 10.1109/TASSP.1976.1162797 Wen, 2013, Pitch-scaled spectrum based excitation model for HMM-based speech synthesis, J. Signal Process. Syst., 74, 423, 10.1007/s11265-013-0862-z Yoshimura, 2001, Mixed excitation for HMM-based speech synthesis, 2259 Young, S. J., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P., 2006. The hidden markov model toolkit (HTK) version 3.4. (Online). Available: http://htk.eng.cam.ac.uk/. Zen, 2013, Statistical parametric speech synthesis using deep neural networks, 7962 Zen, 2007, Details of the nitech HMM-based speech synthesis system for the blizzard challenge 2005, IEICE Trans. Inf. Syst., 90, 325, 10.1093/ietisy/e90-1.1.325 Zen, 2008, The nitech-NAIST HMM-based speech synthesis system for the blizzard challenge 2006, IEICE Trans. Inform. Syst., E91-D, 1764, 10.1093/ietisy/e91-d.6.1764

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA