Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

ESGify: Phân loại tự động các rủi ro về Môi trường, Xã hội và Quản trị Công ty

Doklady Mathematics - 2024

A. Kazakov¹, S. Denisova¹, I. Barsola¹, E. Kalugina¹, I. Molchanova¹, I. Egorov¹, A. Kosterina¹, E. Tereshchenko¹, L. Shutikhina¹, I. Doroshchenko¹, N. Sotiriadi¹, S. Budennyy^2,1

¹Sber AI Lab, Moscow, Russia

²Artificial Intelligence Research Institute (AIRI), Moscow, Russia

Tóm tắt

Sự nhận thức ngày càng tăng về các yếu tố về Môi trường, Xã hội và Quản trị Công ty (ESG) trong quy trình ra quyết định tài chính đã thúc đẩy nhu cầu về các công cụ đánh giá rủi ro ESG hiệu quả và toàn diện. Trong nghiên cứu này, chúng tôi giới thiệu một mô hình Xử lý Ngôn ngữ Tự nhiên (NLP) mã nguồn mở mang tên “ESGify”, dựa trên kiến trúc MPNet-base và nhằm phân loại văn bản trong khuôn khổ các rủi ro ESG. Chúng tôi cũng trình bày một phương pháp phân loại rủi ro ESG theo cấp bậc và chi tiết, tận dụng chuyên môn của các chuyên gia ESG và các thực tiễn tốt nhất toàn cầu. Được hỗ trợ bởi một tập dữ liệu đa nhãn đã được chú thích thủ công gồm 2000 bài báo và điều chỉnh miền với các văn bản từ các báo cáo về tính bền vững, ESGify được phát triển để tự động hóa quy trình phân loại rủi ro ESG theo phương pháp đã được thiết lập. Chúng tôi so sánh các kỹ thuật tăng cường dựa trên việc dịch ngược và các Mô hình Ngôn ngữ Lớn (LLMs) để cải thiện chất lượng mô hình và đạt được chất lượng mô hình F1 trọng số đạt 0.5 trong tập dữ liệu với 47 lớp. Kết quả này vượt trội hơn ChatGPT 3.5 với một lời nhắc đơn giản. Trọng số mô hình và tài liệu hướng dẫn được lưu trữ trên Github https://github.com/sb-ai-lab/ESGify theo giấy phép Apache 2.0.

Từ khóa

#ESG #Xử lý Ngôn ngữ Tự nhiên #Phân loại rủi ro #Mô hình Ngôn ngữ Lớn #Tính bền vững

Tài liệu tham khảo

F. Bell and G. Vuuren, “The impact of climate risk on corporate credit risk,” Cogent Econ. Finance 10 (1), 2148362 (2022). K. Song, X. Tan, T. Qin, J. Lu, and T.-Y. Liu, “MPNet: Masked and permuted pre-training for language understanding” (2020). https://doi.org/10.48550/arXiv.2004.09297 “What are the principles for responsible investment?” https://www.unpri.org/about-us/what-are-the-principles-for-responsible-investment “IFC performance standards on environmental and social sustainability” (2012). https://www.ifc.org/en/types/insights-reports/2012/publications-handbook-pps World Bank Environmental and Social Framework (World Bank, Washington, DC, 2016). EBRD Environmental and Social Risk Management Toolkit for Financial Intermediaries. The European Bank for Reconstruction and Development (EBRD) (2016). Consolidated Set of the GRI Standards. Global Reporting Initiative (2023). SASB Standards. IFRS Foundation (2023). ESG Ratings Methodology: MSCI ESG Research LLC (2023). CSA Handbook 2023: Corporate Sustainability Assessment (S&P Global, 2023). O. Lee, H. Joo, H. Choi, and M. Cheon, “Proposing an integrated approach to analyzing ESG data via machine learning and deep learning algorithms,” Sustainability 14 (14), 8745 (2022). https://doi.org/10.3390/su14148745 J. Park, W. Choi, and S.-U. Jung, “Exploring trends in environmental, social, and governance themes and their sentimental value over time,” Front. Psychol. 13, 890435 (2022). https://doi.org/10.3389/fpsyg.2022.890435 D. Araci, “FinBERT: Financial sentiment analysis with pre-trained language models” (2019). https://doi.org/10.48550/arXiv.1908.10063 T. Nugent, N. Stelea, and J. L. Leidner, “Detecting environmental, social and governance (ESG) topics using domain-specific language models and data augmentation,” in Flexible Query Answering Systems: Proceedings of the 14th International Conference, FQAS 2021, Bratislava, Slovakia, September 19–24, 2021 (Springer, Berlin, 2021), pp. 157–169. J. Bogatinovski, L. Todorovski, S. Džeroski, and D. Kocev, “Comprehensive comparative study of multi-label classification methods,” Expert Syst. Appl. 203, 117215 (2022). A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with pointer-generator networks,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, July 2017 (Association for Computational Linguistics, 2017), Vol. 1, pp. 1073–1083. L. Pérez-Mayos, M. Ballesteros, and L. Wanner, “How much pretraining data do language models need to learn syntax?” (2021). https://doi.org/10.48550/arXiv.2109.03160 A. Akbik, D. Blythe, and R. Vollgraf, “Contextual string embeddings for sequence labeling,” in 27th International Conference on Computational Linguistics (COLING 2018) (2018), pp. 1638–1649. K. Sechidis, G. Tsoumakas, and I. Vlahavas, “On the stratification of multi-label data,” Machine Learning and Knowledge Discovery in Databases (2011), pp. 145–158. P. Szymański and T. Kajdanowicz, “A network perspective on stratification of multi-label data,” in Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, ECML-PKDD, Skopje, Macedonia, Ed. by L. Torgo, B. Krawczyk, P. Branco, and N. Moniz, Proceedings of Machine Learning Research (2017), Vol. 74, pp. 22–35. V. Marivate and T. Sefara, “Improving short text classification through global augmentation methods,” in International Cross-Domain Conference for Machine Learning and Knowledge Extraction (Springer, 2020), pp. 385–399. H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M.-A. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, and T. Scialom, “Llama 2: Open foundation and fine-tuned chat models” (2023). S. A. Budennyy, V. D. Lazarev, N. N. Zakharenko, A. N. Korovin, O. A. Plosskaya, D. V. Dimitrov, V. S. Akhripkin, I. V. Pavlov, I. V. Oseledets, I. S. Barsola, I. V. Egorov, A. A. Kosterina, and L. E. Zhukov, “eco2AI: Carbon emissions tracking of machine learning models as the first step towards sustainable AI,” Dokl. Math. 106, Suppl. 1, S118–S128 (2023).

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA