ILATalk: a new multilingual text-to-speech synthesizer with machine learning

International Journal of Speech Technology - Tập 19 - Trang 55-64 - 2015
Saleh M. Abu-Soud1
1Department of Software Engineering, Princess Sumaya University for Technology, Amman, Jordan

Tóm tắt

In this paper, a new multilingual text-to-speech system based on inductive learning has been developed. This system is called ILATalk. It is composed of three phases: the analysis phase, learning phase, and synthesis phase. It can accept any language; all what is needed is to store the data set that contains the training examples that are generated from a representative and selected subset of words from the required language in addition to the associated phonemes of the language in data tables to be used as input to the system. The system has been thoroughly tested with many sets of experiments with various parameters and sizes, and compared with two known approaches: ID3 and NN Backpropagation. The results obtained showed that ILATalk produces correct phonemes with high accuracy and out-performs these algorithms in most cases.

Tài liệu tham khảo

Abu-Soud, S. (1997). “A framework for integrating decision support systems and expert systems with machine learning”. In Proceeding of the 10th International Conference on Industrial and Engineering Applications of AI and ES. Hassan M. H. & Abu-Soud, S. (2000). “A parallel inductive learning algorithm”. AMSE Journal, France, Dec 2000. Abu-Soud, S. M., & Al-Ibrahim, A. (2009). DRILA: A distributed relational inductive learning algorithm. WSEAS Transactions on Computers, 8(6), 988–999. Abu-Soud, S. M., & Tolun, M. R. (1999a). “DCL: a disjunctive learning algorithm for rule extraction”. In Multiple approaches to intelligent systems (pp. 669–678). Berlin Heidelberg: Springer. Abu-Soud, S. M., & Tolun, M. R. (1999b) “A disjunctive concept learning algorithm for rule generation”. In Applied Informatics-Proceedings. Bakiri, G. & Dietterich, T. G. (1993). “Performance comparison between human engineered and machine learned letter-to-sound rules for English: A machine learning success story”. In Proceedings of the 18th International Conference on the Applications of Computer and Statistics to Science and Society, Cairo, Egypt. Bill, B. (1990). “The mothertongue: English and how it got that way”. Chen, S. H., Hwang, S. H., & Wang, Y.-R. (1998). An RNN-based prosodic information synthesizer for Mandarin text-to-speech. IEEE Transactions on Speech and Audio Processing, 6(3), 226–239. Dietterich, T. G., Hild, H. & Bakiri, G. (1990). “A comparative study of ID3 and backpropagation for English text-to-speech mapping”. ML. Dutoit, T. (1997). High-quality text-to-speech synthesis: An overview. Journal of Electrical and Electronics Engineering Australia, 17, 25–36. Golding, A. R., & Rosenbloom, P. S. (1996). Improving accuracy by combining rule-based and case-based reasoning. Artificial Intelligence, 87(1), 215–254. Hirschberg, J., & Prieto, P. (1996). Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Communication, 18(3), 281–290. http://www.eupedia.com/forum/threads/29850-Number-of-phonemes-(vowels-consonants)-by-language-in-Europe. Accessed June 28, 2015. Huang, X., et al. (1996). “Whistler: A trainable text-to-speech system”. In Proceedings of the Fourth International Conference on Spoken Language, 1996. ICSLP 96. Vol. 4. IEEE. Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning, an artificial intelligence approach (pp. 463–482). Tioga: Palo Alto, CA. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing (Vol. 1). Cambridge, MA: MIT Press. Sasirekha, D., & Chandra, E. (2012). Text to speech: A simple tutorial. International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, 2(1), March 2012. Sejnowski, T. L., & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text. Complex Systems, 1, 145–168. Stas, T., David, M., Slava, Sh., & Zvi, K. (2010). A hybrid text-to-speech system that combines concatenative and statistical synthesis units. CCIT Report #777, Irwin and Joan Jacobs center for communication and information technologies, Haifa 3200, Nov 2010. Tolun, M. R., & Abu-Soud S. M. (1998). An Inductive Learning Algorithm for Production Rule Discovery. The International Journal of Expert Systems with Applications, 14(3), 361–370. Tolun, M. R., Sever, H., Uludag, M., & Abu-Soud, S. M. (1999). ILA-2: An inductive learning algorithm for knowledge discovery. Cybernetics & Systems, 30(7), 609–628.