Factorized WaveNet for voice conversion with limited data
Tài liệu tham khảo
Adiga, 2018, On the use of WaveNet as a statistical vocoder, 5674
Adriana, R., Nicolas, B., Ebrahimi, K.S., Antoine, C., Carlo, G., Yoshua, B., 2015. Fitnets: Hints for thin deep nets. In: Proc. ICLR.
Augasta, 2013, Pruning algorithms of neural networks—a comparative study, Open Comput. Sci., 3, 105, 10.2478/s13537-013-0109-x
Ba, 2014, Do deep nets really need to be deep?, 2654
Cheng, 2018, Model compression and acceleration for deep neural networks: The principles, progress, and challenges, IEEE Signal Process. Mag., 35, 126, 10.1109/MSP.2017.2765695
Çişman, 2017, Sparse representation of phonetic features for voice conversion with and without parallel data, 677
Du, 2019, Wavenet factorization with singular value decomposition for voice conversion, 152
Du, 2020, Effective wavenet adaptation for voice conversion with limited data, 7779
Engel, 2017, Neural audio synthesis of musical notes with wavenet autoencoders, 1068
Ezzine, 2017, A comparative study of voice conversion techniques: A review, 1
Fan, 2015, Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis, 4475
Gibiansky, 2017, Deep voice 2: Multi-speaker neural text-to-speech, 2962
Han, 2015, Learning both weights and connections for efficient neural network, 1135
Hinton, 2015, Distilling the knowledge in a neural network
Kalchbrenner, 2018, Efficient neural audio synthesis, 2410
Kobayashi, 2017, Statistical voice conversion with wavenet-based waveform generation, 1138
Kominek, 2004, The CMU arctic speech databases
Krizhevsky, 2012, Imagenet classification with deep convolutional neural networks, 1097
Lee, 2006, Map-based adaptation for speech conversion using adaptation data selection and non-parallel training
Liu, 2018, Wavenet vocoder with limited training data for voice conversion, 1983
Lu, 2019, One-shot voice conversion with global speaker embeddings, 669
Lu, 2019, A compact framework for voice conversion using wavenet conditioned on phonetic posteriorgrams, 6810
Machado, A.F., Queiroz, M., 2010. Voice conversion: A critical survey. In: Proc. Sound and Music Computing (SMC). pp. 1–8.
Manzelli, 2018, Conditioning deep generative raw audio models for structured automatic music
Mohammadi, 2017, An overview of voice conversion systems, Speech Commun., 88, 65, 10.1016/j.specom.2017.01.008
Molchanov, 2017, Pruning convolutional neural networks for resource efficient inference
Mor, 2018, A universal music translation network
Morise, 2016, WORLD: a vocoder-based high-quality speech synthesis system for real-time applications, IEICE Trans. Inf. Syst., 99, 1877, 10.1587/transinf.2015EDP7457
Niwa, 2018, Statistical voice conversion based on wavenet, 5289
Paine, 2016
Paul, 1992, The design for the wall street journal-based CSR corpus, 357
Povey, 2018, Semi-orthogonal low-rank matrix factorization for deep neural networks, 3743
Prabhavalkar, 2016, On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition, 5970
Prenger, 2019, Waveglow: A flow-based generative network for speech synthesis, 3617
Sainath, 2013, Low-rank matrix factorization for deep neural network training with high-dimensional output targets, 6655
Shen, 2018, Natural tts synthesis by conditioning wavenet on mel spectrogram predictions, 4779
Sisman, 2018, A voice conversion framework with tandem feature sparse representation and speaker-adapted wavenet vocoder, 1978
Sun, 2016, Phonetic posteriorgrams for many-to-one voice conversion without parallel data training, 1
Tamamori, 2017, Speaker-dependent wavenet vocoder, 1118
Tian, 2019, A speaker-dependent wavenet for voice conversion with non-parallel data, 201
Tian, 2018, Average modeling approach to voice conversion with non-parallel data, 227
Tobing, P.L., Wu, Y.-C., Toda, T., 2020. Baseline system of voice conversion challenge 2020 with cyclic variational autoencoder and parallel WaveGAN. In: Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020. pp. 155–159.
Toda, 2007, Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory, IEEE Trans. Audio Speech Lang. Process., 15, 2222, 10.1109/TASL.2007.907344
Toda, 2006, Eigenvoice conversion based on Gaussian mixture model, 2446
Tucker, 2016, Model compression applied to small-footprint keyword spotting, 1878
Valin, 2019, LPCNet: Improving neural speech synthesis through linear prediction, 5891
van den Oord, 2016, WaveNet: A generative model for raw audio, 125
Veaux, 2017
Wu, 2015, A study of speaker adaptation for DNN-based speech synthesis
Wu, 2016, On the use of i-vectors and average voice model for voice conversion without parallel data, 1
Xue, 2013, Restructuring of deep neural network acoustic models with singular value decomposition, 2365
Xue, 2014, Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network, 6359
Yamagishi, 2007, Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training, IEICE Trans. Inf. Syst., 90, 533, 10.1093/ietisy/e90-d.2.533
Yamagishi, 2007, Model adaptation approach to speech synthesis with diverse voices and styles, 4, IV
Yamamoto, 2020, Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram, 6199
Yi, Z., Huang, W.-C., Tian, X., Yamagishi, J., Das, R.K., Kinnunen, T., Ling, Z., Toda, T., 2020. Voice Conversion Challenge 2020—Intra-lingual semi-parallel and cross-lingual voice conversion. In: Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020. pp. 80–98.
Yu, 2016, Multi-scale context aggregation by dilated convolutions
Yu, 2011, Improved bottleneck features using pretrained deep neural networks