Excitation modelling using epoch features for statistical parametric speech synthesis
Tài liệu tham khảo
Adiga, 2013, Significance of instants of significant excitation for source modelling, 1677
Airaksinen, 2018, A comparison between STRAIGHT, glottal, and sinusoidal vocoding in statistical parametric speech synthesis, IEEE/ACM Trans. Audio Speech Lang. Process., 26, 1658, 10.1109/TASLP.2018.2835720
Cabral, 2013, Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification, 1082
Cabral, 2011, HMM-based speech synthesiser using the LF-model of the glottal source, 4704
Csapó, 2014, Statistical parametric speech synthesis with a novel codebook-based excitation model, Intell. Decis. Technol., 8, 289, 10.3233/IDT-140197
Cui, 2018, A new glottal neural vocoder for speech synthesis, 2017
CMU Arctic Speech Synthesis Databases. (online). Available: http://festvox.org/cmu_arctic/.
Drugman, 2012, The deterministic plus stochastic model of the residual signal and its applications, IEEE Trans. Audio Speech Lang. Process., 20, 968, 10.1109/TASL.2011.2169787
Drugman, 2009, Using a pitch-synchronous residual codebook for hybrid HMM/frame selection speech synthesis, 3793
Haque, 2017, Modification of energy spectra, epoch parameters and prosody for emotion conversion, Int. J. Speech Technol., 20, 15, 10.1007/s10772-016-9386-9
HMM-based Speech Synthesis System (HTS). (online). Available: http://hts.sp.nitech.ac.jp/.
Hwang, 2018, A unified framework for the generation of glottal signals in deep learning-based parametric speech synthesis systems, 912
Kadiri, 2015, Analysis of excitation source features of speech for emotion recognition, 1324
Kawahara, 1999, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency based f0 extraction: possible role of a repetitive structure in sounds, Speech Commun., 27, 187, 10.1016/S0167-6393(98)00085-5
Kim, 2007, Two-band excitation for HMM-based speech synthesis, IEICE Trans. Inf. Syst., E90-D, 378, 10.1093/ietisy/e90-1.1.378
King, 2011, The blizzard challenge 2011
Koishida, 2000, A 16kbit/s wideband CELP-based speech coder using mel-generalized cepstral analysis, IEICE Trans. Inf. Syst., E83-D, 876
Koolagudi, 2012, Recognition of emotions from speech using excitation source features, Procedia Eng., 38, 3409, 10.1016/j.proeng.2012.06.394
Krom, 1993, A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals, J. Speech Hear. Res., 36, 254, 10.1044/jshr.3602.254
Ling, 2015, Deep learning for acoustic modelling in parametric speech generation: a systematic review of existing techniques and future trends, IEEE Signal Process. Mag., 32, 35, 10.1109/MSP.2014.2359987
Maia, 2007, An excitation model for HMM-based speech synthesis based on residual modelling
Murty, 2008, Epoch extraction from speech signals, IEEE Trans. Audio Speech Lang. Process., 16, 1602, 10.1109/TASL.2008.2004526
Narendra, 2017, Parameterization of excitation signal for improving the quality of HMM-based speech synthesis system, Circuits Syst. Signal Process., 36, 3650, 10.1007/s00034-016-0476-3
Perceptual evaluation of speech quality (PESQ), 2000. An objective method for end-to-end speech quality assessment of narrow band telephone networks and speech codecs, ITU-T Draft Recommendation P.862.
Raitio, 2014, Voice source modelling using deep neural networks for statistical parametric speech synthesis, 2290
Raitio, 2011, Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis, 4564
Raitio, 2011, HMM-based speech synthesis utilizing glottal inverse filtering, IEEE Trans. Audio Speech Lang. Process., 19, 153, 10.1109/TASL.2010.2045239
Reddy, 2017, Robust pitch extraction method for the HMM-based speech synthesis system, IEEE Signal Process. Lett., 24, 1133, 10.1109/LSP.2017.2712646
Reddy, 2018, Inverse filter based excitation model for HMM-based speech synthesis system, IET Signal Process., 12, 544, 10.1049/iet-spr.2017.0546
Seshadri, 2009, Perceived loudness of speech based on the characteristics of glottal excitation source, J. Acoust. Soc. Am., 126, 2061, 10.1121/1.3203668
Shen, 2018, Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions, 4779
Shinoda, 2001, MDL-based context-dependent subword modelling for speech recognition, Acoust. Sci. Technol., 21, 79
Tamamori, 2017, Speaker-dependent wavenet vocoder, 1118
Thati, 2012, Analysis of breathy voice based on excitation characteristics of speech production, 1
Toda, 2007, A speech parameter generation algorithm considering global variance for HMM-based speech synthesis, IEICE Trans. Inform. Syst., E90-D, 816, 10.1093/ietisy/e90-d.5.816
Tokuda, 2013, Speech synthesis based on hidden markov models, Proc. IEEE, 101, 1234, 10.1109/JPROC.2013.2251852
Wakita, 1976, Residual energy of linear prediction applied to vowel and speaker recognition, IEEE Trans. Acoust. Speech Signal Process., 24, 270, 10.1109/TASSP.1976.1162797
Wen, 2013, Pitch-scaled spectrum based excitation model for HMM-based speech synthesis, J. Signal Process. Syst., 74, 423, 10.1007/s11265-013-0862-z
Yoshimura, 2001, Mixed excitation for HMM-based speech synthesis, 2259
Young, S. J., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P., 2006. The hidden markov model toolkit (HTK) version 3.4. (Online). Available: http://htk.eng.cam.ac.uk/.
Zen, 2013, Statistical parametric speech synthesis using deep neural networks, 7962
Zen, 2007, Details of the nitech HMM-based speech synthesis system for the blizzard challenge 2005, IEICE Trans. Inf. Syst., 90, 325, 10.1093/ietisy/e90-1.1.325
Zen, 2008, The nitech-NAIST HMM-based speech synthesis system for the blizzard challenge 2006, IEICE Trans. Inform. Syst., E91-D, 1764, 10.1093/ietisy/e91-d.6.1764