Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Sử dụng Học Sâu không Giám sát để Tóm tắt Tự động Tài liệu Tiếng Ả Rập

Arabian Journal for Science and Engineering - Tập 43 - Trang 7803-7815 - 2018

Nabil Alami¹, Noureddine En-nahnahi¹, Said Alaoui Ouatik¹, Mohammed Meknassi¹

¹Faculty of Science Dhar EL Mahraz, Laboratory of Informatics and Modeling (LIM), Sidi Mohamed Ben Abdellah University, Fez, Morocco

Tóm tắt

Hệ thống tóm tắt văn bản tiếng Ả Rập truyền thống (ATS) dựa trên biểu diễn bag-of-words, điều này dẫn đến dữ liệu đầu vào thưa thớt và có độ chiều cao. Do đó, việc giảm chiều là rất cần thiết để tăng cường khả năng phân biệt của các đặc trưng. Trong bài báo này, chúng tôi trình bày một phương pháp mới cho ATS sử dụng mô hình biến thể auto-encoder (VAE) để học không gian đặc trưng từ dữ liệu đầu vào có độ chiều cao. Chúng tôi khám phá nhiều biểu diễn đầu vào như tần suất thuật ngữ (tf), tf-idf cũng như từ vựng cục bộ và toàn cầu. Tất cả các câu được xếp hạng dựa trên biểu diễn tiềm ẩn được tạo ra bởi VAE. Chúng tôi nghiên cứu tác động của việc sử dụng VAE với hai phương pháp tóm tắt, đó là phương pháp dựa trên đồ thị và phương pháp dựa trên truy vấn. Các thí nghiệm trên hai tập dữ liệu chuẩn được thiết kế đặc biệt cho ATS cho thấy rằng VAE sử dụng biểu diễn tf-idf của từ vựng toàn cầu cung cấp một không gian đặc trưng phân biệt rõ ràng hơn và cải thiện độ hồi tưởng của các mô hình khác. Kết quả thí nghiệm xác nhận rằng phương pháp được đề xuất dẫn đến hiệu suất tốt hơn so với hầu hết các phương pháp tóm tắt trích xuất tiên tiến nhất cho cả phương pháp tóm tắt dựa trên đồ thị và dựa trên truy vấn.

Từ khóa

#Tóm tắt văn bản tiếng Ả Rập #Học sâu #Auto-encoder biến thể #Mô hình học không giám sát #Tần suất thuật ngữ #tf-idf #Phương pháp tóm tắt dựa trên đồ thị #Phương pháp tóm tắt dựa trên truy vấn

Tài liệu tham khảo

Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958) Ferreira, R.; de Souza Cabral, L.; Freitas, F.; Lins, R.D.; de Frana Silva, G.; Simske, S.J.; Favaro, L.: A multi-document summarization system based on statistics and linguistic treatment. Expert Syst. Appl. 41(13), 5780–5787 (2014) Ferreira, R.; De Souza, L.; Dueire, R.; et al.: Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40(14), 5755–5764 (2013). https://doi.org/10.1016/j.eswa.2013.04.023 Erkan, G.; Radev, D.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004) Baralis, E.; Cagliero, L.; Mahoto, N.; Fiori, A.: GRAPHSUM : discovering correlations among multiple terms for graph-based summarization. Inf. Sci. 249, 96–109 (2013). https://doi.org/10.1016/j.ins.2013.06.046 Mihalcea, R.; Tarau, P.: TextRank: Bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Spain, pp. 404–411 (2004) Fattah, M.A.: A hybrid machine learning model for multi-document summarization. Appl. Intell. 40(4), 592–600 (2014). https://doi.org/10.1007/s10489-013-0490-0 Alguliyev, R.M.; Aliguliyev, R.M.; Isazade, N.R.: An unsupervised approach to generating generic summaries of documents. Appl. Soft Comput. 34, 236–250 (2015). https://doi.org/10.1016/j.asoc.2015.04.050 Yang, L.; Cai, X.; Zhang, Y.; Shi, P.: Enhancing sentence-level clustering with ranking-based clustering framework for theme-based summarization. Inf. Sci. 260, 37–50 (2014). https://doi.org/10.1016/j.ins.2013.11.026 Yousefi-Azar, M.; Hamey, L.: Text summarization using unsupervised deep learning. Expert Syst. Appl. 68, 93–105 (2017). https://doi.org/10.1016/j.eswa.2016.10.017 Akbarizadeh, G.: Segmentation of SAR satellite images using cellular learning automata and adaptive chains. J. Remote Sens. Technol. pp. 44–51 (2013). https://doi.org/10.18005/jrst0102003 Akbarizadeh, G.; Moghaddam, A.E.: Detection of lung nodules in CT scans based on unsupervised feature learning and fuzzy inference. J. Med. Imaging Health Inform. 6(2), 477–483 (2016). https://doi.org/10.1166/jmihi.2016.1720 Rahmani, M.; Akbarizadeh, G.: Unsupervised feature learning based on sparse coding and spectral clustering for segmentation of synthetic aperture radar images. IET Comput. Vision 9(5), 629–638 (2015). https://doi.org/10.1049/iet-cvi.2014.0295 Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009) Krizhevsky, A.; Sutskever, I.; Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12), Lake Tahoe, Nevada, USA, pp. 1090–1098 (2012) Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. In: Proceedings of the 2nd International Conference On Learning Representation (ICLR2014), Banff, Canada (2014) Donahue, J.; Anne Hendricks, L.; Rohrbach, M.; Venugopalan, S.; Guadarrama, S.; Saenko, K.; Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017) Er, M.J.; Zhang, Y.; Wang, N.; Pratama, M.: Attention pooling-based convolutional neural network for sentence modelling. Inf. Sci. 373, 388–403 (2016). https://doi.org/10.1016/j.ins.2016.08.084 Li, F.; Zhang, M.; Tian, B.; Chen, B.; Fu, G.; Ji, D.: Recognizing irregular entities in biomedical text via deep neural networks. Pattern Recognit. Lett. (2017). https://doi.org/10.1016/j.patrec.2017.06.009 Ayinde, B.O.; Zurada, J.M.: Deep learning of constrained autoencoders for enhanced understanding of data. IEEE Trans. Neural Netw. Learn. Syst. 99, 1–11 (2017). https://doi.org/10.1109/tnnls.2017.2747861 Firat, O.; Cho, K.; Sankaran, B.; Yarman Vural, F.T.; Bengio, Y.: Multi-way, multilingual neural machine translation. Comput. Speech Lang. 45, 236–252 (2017). https://doi.org/10.1016/j.csl.2016.10.006 Zhong, Sh; Liu, Y.; Li, B.; Long, J.: Query-oriented unsupervised multi-document summarization via deep learning model. Expert Syst. Appl. 42(21), 8146–8155 (2015) Kingma, D.P.; Welling, M.: Auto-encoding variational bayes. In: Proceedings of the International Conference on Learning Representations, Banff, Canada (2014) Li, H.; Misra, S.: Prediction of subsurface NMR T2 distributions in a shale petroleum system using variational autoencoder-based neural networks. IEEE Geosci. Remote Sens. Lett. 14(12), 2395–2397 (2017). https://doi.org/10.1109/lgrs.2017.2766130 Akbarizadeh, G.; Tirandaz, Z.; Kooshesh, M.: A new curvelet based texture classification approach for land cover recognition of SAR satellite images. Malays. J. Comput. Sci. 27(3), 218–239 (2014) Ahmadi, N.; Akbarizadeh, G.: Hybrid robust iris recognition approach using iris image pre-processing, two-dimensional gabor features and multi-layer perceptron neural network/PSO. IET Biom. (2017). https://doi.org/10.1049/iet-bmt.2017.0041 Wang, L.; Zhang, J.; Liu, P.; Choo, K.-K.R.; Huang, F.: Spectral-spatial multi-feature-based deep learning for hyperspectral remote sensing image classification. Soft. Comput. 21(1), 213–221 (2016). https://doi.org/10.1007/s00500-016-2246-3 Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning—ICML ’08. https://doi.org/10.1145/1390156.1390294 (2008) Noda, K.; Yamaguchi, Y.; Nakadai, K.; Okuno, H.G.; Ogata, T.: Audio-visual speech recognition using deep learning. Appl. Intell. 42(4), 722–737 (2014). https://doi.org/10.1007/s10489-014-0629-7 Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791 Kim, E.; Corte-Real, M.; Baloch, Z.: A deep semantic mobile application for thyroid cytopathology. In: Medical Imaging 2016: PACS and Imaging Informatics: Next Generation and Innovations (2016). https://doi.org/10.1117/12.2216468 Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017). https://doi.org/10.1038/nature21056 Gulshan, V.; Peng, L.; Coram, M.; et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316(22), 2402 (2016). https://doi.org/10.1001/jama.2016.17216 Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969) Heu, J.U.; Qasim, I.; Lee, D.H.: FoDoSu: multi-document summarization exploiting semantic analysis based on social Folksonomy. Inf. Process. Manag. 51(1), 212–225 (2015). https://doi.org/10.1016/j.ipm.2014.06.003 Fang, H.; Lu, W.; Wu, F.; Zhang, Y.; Shang, X.; Shao, J.; Zhuang, Y.: Topic aspect-oriented summarization via group selection. Neurocomputing 149, 1613–1619 (2015). https://doi.org/10.1016/j.neucom.2014.08.031 Denil, M.; Demiraj, A.; de Freitas, N.: Extraction of salient sentences from labelled documents. arXiv preprint arXiv:1412.6815 (2014) Ha, J.W.; Kang, D.; Pyo, H.; Kim, J.: News2Images: automatically summarizing news articles into image-based contents via deep learning. In: 3rd International Workshop on News Recommendation and Analytics (INRA 2015) (with RECSYS 2015), Vienna, Austria (2015) Cao, Z.; Wei, F.; Dong, L.; Li, S.; Zhou, M.: Ranking with recursive neural networks and its application to multi-document summarization. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, pp. 2153–2159 (2015) Rezende, D.J.; Mohamed, S.; Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on International Conference on Machine Learning (ICML’14), vol. 32, Beijing, China, pp. 1278–1286 (2014) Hinton, G.E.; Osindero, S.; The, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006) Hinton, G.E.; Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006) Kingma, D.P.; Mohamed, S.; Rezende, D.J.; Welling, M.: Semi-supervised learning with deep generative models. In: Proceedings of Neural Information Processing Systems (NIPS’14), pp. 3581–3589 (2014) El-Haj, M.; Kruschwitz, U.; Fox, C.: Using mechanical Turk to create a corpus of Arabic summaries. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), Valletta, Malta, pp. 36–39, in the Language Resources (LRs) and Human Language Technologies (HLT) for Semitic Languages workshop held in conjunction with the 7th international language resources and evaluation conference (2010) Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of workshop on text summarization branches out, post-conference workshop of ACL, pp. 74–81 (2004) Mashechkin, I.V.; Petrovskiy, M.I.; Popov, D.S.; Tsarev, D.V.: Automatic text summarization using latent semantic analysis. Program. Comput. Softw. 37(6), 299–305 (2011). https://doi.org/10.1134/s0361768811060041 Brin, S.; Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998). https://doi.org/10.1016/s0169-7552(98)00110-x

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA