ILATalk: a new multilingual text-to-speech synthesizer with machine learning

International Journal of Speech Technology - Tập 19 - Trang 55-64 - 2015

Saleh M. Abu-Soud¹

¹Department of Software Engineering, Princess Sumaya University for Technology, Amman, Jordan

Tóm tắt

In this paper, a new multilingual text-to-speech system based on inductive learning has been developed. This system is called ILATalk. It is composed of three phases: the analysis phase, learning phase, and synthesis phase. It can accept any language; all what is needed is to store the data set that contains the training examples that are generated from a representative and selected subset of words from the required language in addition to the associated phonemes of the language in data tables to be used as input to the system. The system has been thoroughly tested with many sets of experiments with various parameters and sizes, and compared with two known approaches: ID3 and NN Backpropagation. The results obtained showed that ILATalk produces correct phonemes with high accuracy and out-performs these algorithms in most cases.

Tài liệu tham khảo

Abu-Soud, S. (1997). “A framework for integrating decision support systems and expert systems with machine learning”. In Proceeding of the 10th International Conference on Industrial and Engineering Applications of AI and ES. Hassan M. H. & Abu-Soud, S. (2000). “A parallel inductive learning algorithm”. AMSE Journal, France, Dec 2000. Abu-Soud, S. M., & Al-Ibrahim, A. (2009). DRILA: A distributed relational inductive learning algorithm. WSEAS Transactions on Computers, 8(6), 988–999. Abu-Soud, S. M., & Tolun, M. R. (1999a). “DCL: a disjunctive learning algorithm for rule extraction”. In Multiple approaches to intelligent systems (pp. 669–678). Berlin Heidelberg: Springer. Abu-Soud, S. M., & Tolun, M. R. (1999b) “A disjunctive concept learning algorithm for rule generation”. In Applied Informatics-Proceedings. Bakiri, G. & Dietterich, T. G. (1993). “Performance comparison between human engineered and machine learned letter-to-sound rules for English: A machine learning success story”. In Proceedings of the 18th International Conference on the Applications of Computer and Statistics to Science and Society, Cairo, Egypt. Bill, B. (1990). “The mothertongue: English and how it got that way”. Chen, S. H., Hwang, S. H., & Wang, Y.-R. (1998). An RNN-based prosodic information synthesizer for Mandarin text-to-speech. IEEE Transactions on Speech and Audio Processing, 6(3), 226–239. Dietterich, T. G., Hild, H. & Bakiri, G. (1990). “A comparative study of ID3 and backpropagation for English text-to-speech mapping”. ML. Dutoit, T. (1997). High-quality text-to-speech synthesis: An overview. Journal of Electrical and Electronics Engineering Australia, 17, 25–36. Golding, A. R., & Rosenbloom, P. S. (1996). Improving accuracy by combining rule-based and case-based reasoning. Artificial Intelligence, 87(1), 215–254. Hirschberg, J., & Prieto, P. (1996). Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Communication, 18(3), 281–290. http://www.eupedia.com/forum/threads/29850-Number-of-phonemes-(vowels-consonants)-by-language-in-Europe. Accessed June 28, 2015. Huang, X., et al. (1996). “Whistler: A trainable text-to-speech system”. In Proceedings of the Fourth International Conference on Spoken Language, 1996. ICSLP 96. Vol. 4. IEEE. Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning, an artificial intelligence approach (pp. 463–482). Tioga: Palo Alto, CA. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing (Vol. 1). Cambridge, MA: MIT Press. Sasirekha, D., & Chandra, E. (2012). Text to speech: A simple tutorial. International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, 2(1), March 2012. Sejnowski, T. L., & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text. Complex Systems, 1, 145–168. Stas, T., David, M., Slava, Sh., & Zvi, K. (2010). A hybrid text-to-speech system that combines concatenative and statistical synthesis units. CCIT Report #777, Irwin and Joan Jacobs center for communication and information technologies, Haifa 3200, Nov 2010. Tolun, M. R., & Abu-Soud S. M. (1998). An Inductive Learning Algorithm for Production Rule Discovery. The International Journal of Expert Systems with Applications, 14(3), 361–370. Tolun, M. R., Sever, H., Uludag, M., & Abu-Soud, S. M. (1999). ILA-2: An inductive learning algorithm for knowledge discovery. Cybernetics & Systems, 30(7), 609–628.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA