Model architectures to extrapolate emotional expressions in DNN-based text-to-speech

Speech Communication - Tập 126 - Trang 35-43 - 2021

Katsuki Inoue¹, Sunao Hara¹, Masanobu Abe¹, Nobukatsu Hojo², Yusuke Ijima²

¹Graduate school of Interdisciplinary Science and Engineering in Health Systems, Okayama University, Japan

²NTT Corporation, Japan

Tài liệu tham khảo

An, Shumin, Ling, Zhenhua, Dai, Lirong, 2017. Emotional statistical parametric speech synthesis using LSTM-RNNs. In: Proceedings of APSIPA ASC. pp. 1613–1616. Caruana, 1997, Multitask learning, Mach. Learn., 28, 41, 10.1023/A:1007379606734 Dehak, 2010, Front-end factor analysis for speaker verification, IEEE Trans. Audio Speech Language Process., 19, 788, 10.1109/TASL.2010.2064307 Fan, Yuchen, Qian, Yao, Soong, Frank K., He, Lei, 2015. Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis. In: Proceedings of ICASSP, pp. 4475–4479. Fan, Yuchen, Qian, Yao, Xie, Feng-Long, Soong, Frank K., 2014. TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Proceedings of interspeech. pp. 1964–1968. Hojo, 2018, DNN-based speech synthesis using speaker codes, IEICE Trans. Inform. Syst., 101, 462, 10.1587/transinf.2017EDP7165 Inoue, 2017, An investigation to transplant emotional expressions in DNN-based TTS synthesis, 1253 Jaime, 2018, Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis, Speech Commun., 99, 135, 10.1016/j.specom.2018.03.002 Jaime, Lorenzo-Trueba, Roberto, Barra-Chicote, Watts, Oliver, Montero, Juan Manuel, 2013. Towards speaking style transplantation in speech synthesis. In: 8th ISCA Speech Synthesis Workshop. pp. 159–163. Kanagawa, Hiroki, Nose, Takashi, Kobayashi, Takao, 2013. Speaker-independent style conversion for HMM-based expressive speech synthesis. In: Proceedings of ICASSP. pp. 7864–7868. Kawahara, 1999, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of repetitive structure in sounds, Speech Commun., 27, 187, 10.1016/S0167-6393(98)00085-5 Li, Bo, Zen, Heiga, 2016. Multi-language multi-speaker acoustic modeling for LSTM-RNN based statistical parametric speech synthesis. In: Proceedings of INTERSPEECH. pp. 2468–2472. Luong, Hieu-Thi, Takaki, Shinji, Henter, Gustav Eje, Yamagishi, Junichi, 2017. Adapting and controlling DNN-based speech synthesis using input codes. In: Proceedings of ICASSP. pp. 4905–4909. Ohtani, Yamato, Nasu, Yu, Morita, Masahiro, Akamine, Masami, 2015. Emotional transplant in statistical speech synthesis based on emotion additive model. In: Proceedings of INTERSPEECH. pp. 274–278. Qian, Yao, Fan, Yuchen, Hu, Wenping, Soong, Frank K., 2014. On the training aspects of deep neural network (DNN) for parametric TTS synthesis. In: Proceedings of ICASSP, pp. 3829–3833. Silén, Hanna, Helander, Elina, Nurminen, Jani, Gabbouj, MoncefSilén, 2012. Ways to implement global variance in statistical speech synthesis. In: Proceedings of INTERSPEECH. pp. 1436–1439. Snyder, David, Garcia-Romero, Daniel, Sell, Gregory, Povey, Daniel, Khudanpur, Sanjeev, 2018. X-vectors: Robust DNN embeddings for speaker recognition. In: Proceedings of ICASSP. pp. 5329–5333. Toda, 2007, A speech parameter generation algorithm considering global variance for HMM-based speech synthesis, IEICE Trans. Inform. Syst., 90, 816, 10.1093/ietisy/e90-d.5.816 Tokuda, Keiichi, Yoshimura, Takayoshi, Masuko, Takashi, Kobayashi, Takao, Kitamura, Tadashi, 2000. Speech parameter generation algorithms for HMM-based speech synthesis. In: Proceedings of ICASSP. pp. 1315–1318. Variani, Ehsan, Lei, Xin, McDermott, Erik, Moreno, Ignacio Lopez, Gonzalez-Dominguez, Javier, 2014. Deep neural networks for small footprint text-dependent speaker verification. In: Proceedings of ICASSP. pp. 4052–4056. Watts, Oliver, Henter, Gustav Eje, Merritt, Thomas, Wu, Zhizheng, King, Simon, 2016. From HMMs to DNNs: where do the improvements come from? In: Proceedings of ICASSP. pp. 5505–5509. Wu, Zhizheng, Swietojanski, Pawel, Veaux, Christophe, Renals, Steve, King, Simon, 2015. A study of speaker adaptation for DNN-based speech synthesis. In: Proceedings of INTERSPEECH, pp. 879–883. Wu, Zhizheng, Valentini-Botinhao, Cassia, Watts, Oliver, King, Simon, 2015. Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis. In: Proceedings of ICASSP, pp. 4460–4464. Yamagishi, 2009, Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm, IEEE Trans. Audio Speech Language Process., 17, 66, 10.1109/TASL.2008.2006647 Yamagishi, 2003, A training method of average voice model for HMM-based speech synthesis, IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 86, 1956 Yang, Hongwu, Zhang, Weizhao, Zhi, Pengpeng, 2018. A DNN-based emotional speech synthesis by speaker adaptation. In: Proceedings of APSIPA ASC. pp. 633–637. Young, 2006, 3, 75 Zen, Heiga, Senior, Andrew, Schuster, Mike, 2013. Statistical parametric speech synthesis using deep neural networks. In: Proceedings of ICASSP. pp. 7962–7966.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver