Arabic broadcast news transcription system
Tóm tắt
Từ khóa
Tài liệu tham khảo
Alghamdi, M. (2000). Arabic phonetics. Riyadh: Attaoobah.
Algamdi, M. (2003). KACST Arabic phonetics database. In The fifteenth international congress of phonetics science (pp. 3109–3112). Barcelona.
Alghamdi, M., Elshafei, M., & Almuhtasib, H. (2002). Speech units for Arabic text-to-speech. In The fourth workshop on computer and information sciences (pp. 199–212).
Ali, M., Elshafei, M., Alghamdi, M., Al-Muhtaseb, H., & Al-Najjar, A. (2008). Generation of Arabic phonetic dictionaries for speech recognition. In The 5th international conference on innovations in information technology, United Arab Emirates, December 2008.
Alimi, A. M., & Ben Jemaa, M. (2002). Beta fuzzy neural network application in recognition of spoken isolated Arabic words. International Journal of Control and Intelligent Systems, Special Issue on Speech Processing Techniques and Applications, 30(2).
Alotaibi, Y. A. (2004). Spoken Arabic digits recognizer using recurrent neural networks. In Proceedings of the fourth IEEE international symposium on signal processing and information technology (pp. 195–199), 18–21 Dec. 2004.
Al-Otaibi, F. A. H. (2001). Speaker-dependant continuous Arabic speech recognition. M.Sc. Thesis, King Saud University.
Bahi, H., & Sellami, M. (2003). A hybrid approach for Arabic speech recognition. In ACS/IEEE international conference on computer systems and applications, 14–18 July 2003.
Baker, J. K. (1975). Stochastic modeling for automatic speech understanding. In R. Reddy (Ed.), Speech recognition (pp. 521–542). New York: Academic Press.
Bellagarda, J., & Nahamoo, D. (1988). Tied-mixture continuous parameter models for large vocabulary isolated speech recognition. In Proc. IEEE international conference on acoustics, speech, and signal processing.
Billa, J., Noamany, M., Srivastava, A., Liu, D., Stone, R., Xu, J., Makhoul, J., & Kubala, F. (2002). Audio indexing of Arabic broadcast news. In Proceedings (ICASSP ’02). IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. I-5–I-8).
Clarkson, P., & Rosenfeld, R. (1997). Statistical language modeling using the CMU-Cambridge toolkit. In Proceedings of the 5th European conference on speech communication and technology, Rhodes, Greece, Sept. 1997.
Digalakis, V., Monaco, P., & Murveit, H. (1996). Genones: Generalized mixture tying in continuous hidden Markov model-based speech recognizers. IEEE Transactions on Speech and Audio Processing, 4(4), 281–289.
El Choubassi, M. M., El Khoury, H. E., Alagha, C. E. J., Skaf, J. A., & Al-Alaoui, M. A. (2003). Arabic speech recognition using recurrent neural networks. In Proceedings of the 3rd IEEE international symposium on signal processing and information technology (ISSPIT) (pp. 543–547), Dec. 2003.
El-Ramly, S. H., Abdel-Kader, N. S., & El-Adawi, R. (2002). Neural networks used for speech recognition. In Radio science conference (NRSC 2002). Proceedings of the nineteenth national (pp. 200–207), March 2002.
Elshafei-Ahmed, M. (1991). Toward an Arabic text-to-speech system. The Arabian Journal of Science and Engineering, 16(4B), 565–583.
Elshafei, M., Almuhtasib, H., & Alghamdi, M. (2002). Techniques for high quality text-to-speech. Information Science, 140(3–4), 255–267.
Elshafei, M., Al-Muhtaseb, H., & Alghamdi, M. (2006a). Statistical methods for automatic diacritization of Arabic text. In Proceedings 18th national computer conference NCC’18, Riyadh, March 26–29, 2006.
Elshafei, M., Al-Muhtaseb, H., & Alghamdi, M. (2006b). Machine generation of Arabic diacritical marks. In Proceedings of the 2006 international conference on machine learning; models, technologies, and applications (MLMTA’06), June 2006, USA.
Garofolo, J., Voorhees, E., Auzanne, C., Stanford, V., & Lund, B. (1997). Design and preparation of the 1996 HUB-4 broadcast news benchmark test corpora. In Proceedings of the DARPA speech recognition workshop (pp. 15–21). Chantilly: Morgan Kaufmann.
Hagen, S. (2007). The IBM 2006 GALE Arabic ASR system. In ICASSP, 2007.
Huang, X., Alleva, F., Hon, H. W., Hwang, M. Y., & Rosenfeld, R. (1993). The SPHINX-II speech recognition system: an overview. Computer Speech and Language, 7(2), 137–148.
Huang, X., Acero, A., & Hon, H. (2001). Spoken language processing. Englewood Cliffs: Prentice-Hall.
Hwang, M. Y., & Huang, X. (1993). Shared-distribution hidden Markov models for speech recognition. IEEE Transactions on Speech and Audio Processing, 1(4), 414–420.
Hwang, M. Y., Huang, X. D., & Alleva, F. (1993). Predicting unseen triphones with senones. In Proc. IEEE international conference on acoustics, speech, and signal processing.
Jelinek, F. (1976). Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64(4), 532–555.
Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge: MIT Press.
Kirchhofl, K., Bilmes, J., Das, S., Duta, N., Egan, M., Ji, G., He, F., Henderson, J., Liu, D., Noamany, M., Schoner, P., Schwartz, R., & Vergyri, D. (2003). Novel approaches to Arabic speech recognition: report from the 2002 John-Hopkins summer workshop. In ICASSP 2003 (pp. I-344–I-347).
Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., & Wolf, P. (2003). Design of the CMU Sphinx-4 decoder. In Proceedings of the 8th European conference on speech communication and technology (pp. 1181–1184), Geneve, Switzerland, Sept. 2003.
Lee, K. F. (1988). Large vocabulary speaker-independent continuous speech recognition: the SPHINX system. PhD Thesis, Carnegie Mellon University.
Lee, K. F., Hon, H. W., & Reddy, R. (1990). An overview of the SPHINX speech recognition system. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(1), 35–45.
Ortmanns, S., Eiden, A., & Ney, H. (1998). Improved lexical tree search for large vocabulary speech recognition. In Proc. IEEE int. conf. on acoustics, speech and signal proc.
Placeway, P., Chen, S., Eskenazi, M., Jain, U., Parikh, V., Raj, B., Ravishankar, M., Rosenfeld, R., Seymore, K., Siegler, M., Stern, R., & Thayer, E. (1997). The 1996 HUB-4 Sphinx-3 system. In Proceedings of the DARPA speech recognition workshop. Chantilly: DARPA, Feb. 1997. http://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdf .
Price, P., Fisher, W. M., Bernstein, J., & Pallett, D. S. (1988). The DARPA 1000-word resource management database for continuous speech recognition. In Proceedings of the international conference on acoustics, speech and signal processing (Vol. 1, pp. 651–654). New York: IEEE.
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2).
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
Ravishankar, M. K. (1996). Efficient algorithms for speech recognition. PhD Thesis (CMU Technical Report CS-96-143), Carnegie Mellon University, Pittsburgh, PA.
Siegler, M., Jain, U., Raj, B., & Stern, R. M. (1997). Automatic segmentation, classification and clustering of broadcast news audio. In Proc. DARPA speech recognition workshop, Feb. 1997.
Singh, R., Raj, B., & Stern, R. M. (1999). Automatic clustering and generation of contextual questions for tied states in hidden Markov models. In Proc. IEEE int. conf. on acoustics, speech and signal proc.
Sphinx-4 trainer design (2003). http://www.speech.cs.cmu.edu/cgi-bin/cmusphinx/twiki/view/Sphinx4/Train%erDesign .
Young, S. (1994). The HTK hidden Markov model toolkit: design and philosophy (Tech. Rep. CUED/FINFENG/, TR152). Cambridge University Engineering Department, UK, Sept. 1994.
Young, S. (1996). A review of large-vocabulary continuous-speech recognition. IEEE Signal Processing Magazine, 45–57.
Young, S. J., Kershaw, D., Odell, J. J., Ollason, D., Valtchev, V., & Woodland, P. C. (1999). The HTK book. Entropic.