A hybrid statistical and deep learning based technique for Persian part of speech tagging
Tóm tắt
In part of speech (POS) tagging, the main challenge is to predict the right tags for both in-vocabulary (IV) and out-of-vocabulary (OOV) words. Therefore, artificial neural networks, such as multi-layer perceptron (MLP) and long short term memory (LSTM), which seem to be efficient because of their high generality capability, have been applied to POS tagging to overcome this challenge. In this research, using word vectors as the input of MLP and LSTM neural networks, we do POS tagging in Persian language and compare the results of the neural models with a second-order hidden Markov model (HMM) which in fact is our benchmark. To investigate the effect of the number of hidden layers, we use both a single-layer and a two-layer MLP and LSTM neural network. Also, we have applied a bidirectional LSTM neural network to investigate the effect of a bidirectional learning algorithm on Persian POS tagging. The results obtained from different models in this research show that neural models have a far better performance in predicting the correct POS tags for OOV words, which can be due to their higher generality. Therefore, we have proposed a hybrid model which is a combination of the HMM and a single-layer bidirectional LSTM model as an innovative model in POS tagging. This hybrid model is successful in improving both HMM and neural models, increasing the accuracy to 97.29%.
Tài liệu tham khảo
Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the third conference on applied natural language processing, pp. 152–155. Association for Computational Linguistics (1992)
Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the sixth conference on applied natural language processing, pp. 224–231. Association for Computational Linguistics (2000)
Søgaard, A.: Semisupervised condensed nearest neighbor for part-of-speech tagging. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, vol. 2, pp. 48–-52. Association for Computational Linguistics (2011)
Schmid, H.: Part-of-speech tagging with neural networks. In: Proceedings of the 15th conference on Computational linguistics, vol. 1, pp. 172–176. Association for Computational Linguistics (1994)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Fonseca, E.R., Rosa, J.L.G., Aluísio, S.M.: Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese. J. Braz. Comput. Soc. 21(1), 2 (2015)
Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning, pp. 160–167. ACM, New York (2008)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv:1301.3781
Rojas, R.: Neural networks: a systematic introduction, pp.151–184. Springer, Berlin, New York (1996)
Olah, C.: Understanding LSTM networks. IOP Publishing PhysicsWeb (2015). https://colah.github.io/posts/2015-08-Understanding-LSTMs/. Accessed Aug 2015
Nakamura, M., Shikano, M.: A study of English word category prediction based on neutral networks. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 731–734. IEEE (1989)
Tortajada, S., Castro, M.J., Pla, F.: Part-of-Speech tagging based on artificial neural networks. In: 2nd Language & Technology Conf. Proc, pp. 414–418 (2005)
Santos, C.N.D., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1818–1826 (2014)
Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association (2012)
Yao, K., Zweig, G., Hwang, M.Y., Shi, Y., Yu, D. Recurrent neural networks for language understanding. In: Interspeech, pp. 2524–2528 (2013)
Sundermeyer, M., Alkhouli, T., Wuebker, J., Ney, H.: Translation modeling with bidirectional recurrent neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 14–25 (2014)
Wang, P., Qian, Y., Soong, F.K., He, L., Zhao, H.: Part-of-speech tagging with bidirectional long short-term memory recurrent neural network (2015). arXiv:1510.06168
Saravani, S.H., Bahrani, M., Veisi, H., Besharati, S.: Persian language modeling using recurrent neural networks. In: 2018 9th International Symposium on Telecommunications (IST), pp. 207–210. IEEE (2018)
Plank, B., Søgaard, A., Goldberg, Y.: Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss (2016). arXiv:1604.05529
Eghbalzadeh, H., Hosseini, B., Khadivi, S., Khodabakhsh, A.: Persica: a Persian corpus for multi-purpose text mining and natural language processing. In: 6th International Symposium on Telecommunications (IST), pp. 1207–1214. IEEE (2012)
Bijankhan, M.: The role of the corpus in writing a grammar: an introduction to a software. Iran. J. Linguist. 19(2), 48–67 (2004)
Mikolov, T., Karafiát, M., Burget, L., Černocký, J., Khudanpur, S.: Recurrent neural network based language model. In: Eleventh annual conference of the international speech communication association (2010)
TensorFlow: IOP Publishing PhysicsWeb (2018). https://www.tensorflow.org/. Accessed 30 Jul 2018