Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Giải Mã Nghĩa Từ Qua Học Máy từ Dữ Liệu Chưa Được Ghi Nhãn

Springer Science and Business Media LLC - Tập 19 - Trang 27-38 - 2003

Seong-Bae Park¹, Byoung-Tak Zhang¹, Yung Taek Kim¹

¹Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, Korea

Tóm tắt

Trong bài báo này, chúng tôi mô tả một phương pháp học máy để giải mã nghĩa từ bằng cách sử dụng dữ liệu chưa được ghi nhãn. Phương pháp của chúng tôi dựa trên việc lấy mẫu có chọn lọc bằng các ủy ban cây quyết định. Các thành viên trong ủy ban được đào tạo từ một tập hợp nhỏ các ví dụ đã được ghi nhãn, sau đó được gia tăng bằng một số lượng lớn ví dụ chưa được ghi nhãn. Việc sử dụng các ví dụ chưa được ghi nhãn là rất quan trọng vì việc thu thập dữ liệu đã được ghi nhãn là đắt đỏ và tốn thời gian, trong khi việc thu thập một số lượng lớn ví dụ chưa được ghi nhãn thì dễ dàng và tiết kiệm chi phí. Ý tưởng đứng sau phương pháp này là các nhãn của các ví dụ chưa được ghi nhãn có thể được ước lượng bằng cách sử dụng các ủy ban. Việc sử dụng thêm các ví dụ chưa được ghi nhãn, do đó, cải thiện hiệu suất giải mã nghĩa từ và giảm thiểu chi phí ghi nhãn thủ công. Hiệu quả của phương pháp này đã được kiểm tra trên một tập hợp văn bản thô gồm một triệu từ. Bằng cách sử dụng dữ liệu chưa được ghi nhãn, chúng tôi đã đạt được cải tiến độ chính xác lên tới 20,2%.

Từ khóa

#học máy #giải mã nghĩa từ #dữ liệu chưa được ghi nhãn #cây quyết định #lấy mẫu có chọn lọc

Tài liệu tham khảo

F. Atsushi, I. Kentaro, T. Takenobu, and T. Hozumi, “Selective sampling of effective example sentence sets for word sense disambiguation,” Computational Linguistics, vol. 24, no.4, pp. 573–597, 1998. P. Brown, S. Della-Pietras, V. Della-Pietras, and R. Mercer, “Word sense disambiguation using statistical methods,” in Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, 1991, pp. 264–270. T. Hwee and H. Lee, “Integrating multiple knowledge sources to disambiguate word sense: An examplar-based approach,” in Proceedings of the 34th Annual Meeting of the ACL, 1996, pp. 40–47. C. Leacock, G. Towell, and E. Voorhees, “Towords building contextural representations of word senses using statistical models,” in Proceedings of the SIGLEX Workshop: Acquisition of Lexical Knowledge from Text, 1993, pp. 10–20. T. Pedersen and R. Bruce, “Distinguishing word senses in untagged text,” in Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, 1997, pp. 399–401. Y. Wilks and M. Stevenson, “Word sense disambiguation using optimised combinations of knowledge sources,” in Proceedings of COLING-ACL’98, 1998, pp. 1398–1402. R. Liere and P. Tadepalli, “Active learning with committees for text categorization,” in Proceedings of AAAI-97, 1997, pp. 591–596. D. Yarowsky, “Unsupervised word sense disambiguation rivaling supervised methods,” in Proceedings of the 33rd Annual Meeting of the ACL, 1995, pp. 189–196. K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, “Learning to classify text from labeled and unlabeled documents,” Machine Learning, vol. 39, pp. 1–32, 2000. I. Dagan and S. Engelson, “Committee-based sampling for training probabilistic classifiers,” in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 150–157. K. Lang, “Newsweeder: Learning to filter netnews,” in Proceedings of the Twelfth Internation Conference on Machine Learning, 1997, pp. 331–339. D. Lewis and W. Gale, “A sequential algorithm for training text classifiers,” in Proceedings of SIGIR-94, 1994, pp. 5–11. A. McCallum and K. Nigam, “Employing EM and pool-based active learning for text classification,” in Proceedings of the Fifteenth International Conference on Machine Learning, 1998, pp. 359–367. G. Paaß and J. Kindermann, “Bayesian query construction for neural network models,” in Proceedings of Advances in Neural Information Processing Systems 7, 1995, pp. 443–450. B.-T. Zhang, “Accelerated learning by active example selection,” International Journal of Neural Systems, vol. 5, no.1, pp. 67–75, 1994. B.-T. Zhang and D.-Y. Cho, “Genetic programming with active data selection,” Simulated Evolution and Learning, vol. LNAI 1585, pp. 146–153, 1999. Y. Freund, H. Seung, E. Shamir, and N. Tishiby, “Selective sampling using the query by committee algorithm,” Machine Learning, vol. 28, pp. 133–168, 1997. A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proceedings of COLT-98, 1998, pp. 92–100. D. Miller and H. Uyar, “A mixture of experts classifier with learning based on both labelled and unlabelled data,” in Proceedings of Advances in Neural Information Processing System 9, 1997, pp. 571–577. K. Tumer and J. Ghosh, “Error correlation and error reduction in ensemble classifiers,” Connection Science, vol. 8, no.34, pp. 385–404, 1996. N. Littlestone and M. Warmuth, “The weighted majority algorithm,” Information and Computation, vol. 108, no.2, pp. 212–261, 1994. Y. Freund and R. Schapire, “Experiments with a new boosting algorithm,” in Proceedings of the Thirteenth International Conference on Machine Learning, 1996, pp. 148–156. L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, pp. 123–140, 1996. T. Dietterich, M. Kearns, and Y. Mansour, “Applying the weak learning framework to understand and improve C4.5,” in Proceedings of the Thirteenth International Conference on Machine Learning, 1996, pp. 96–104. R. Schapire, “Theoretical views of boosting,” in Proceedings of EuroCOLT, 1999, pp. 1–10. R. Quinlan, C4.5: Programs For Machine Learning, Morgran Kaufmann Publishers, 1993. P. Utgoff, N. Berkman, and J. Clouse, “Decision tree induction based on efficient tree restructuring,” Machine Learning, vol. 29, pp. 5–44, 1997. S. Kang and Y. Kim, “Syllable-based model for the Korean morphology,” in Proceedings of COLING-94, 1994, pp. 221–226. J. Yang and Y. Kim, “Korean analysis using multiple knowledge sources,” Journal of The Korea Information Science Society, vol. 21, no.7, pp. 1324–1332, 1994. (in Korean) F. Atsushi, I. Kentaro, T. Takenobu, and T. Hozumi, “To what extent does case contribute to verb sense disambiguation?” in Proceedings of COLING-96, 1996, pp. 59–64. D. Lin, “Using syntactic dependency as local context to resolve word sense ambiguity,” in Proceedings of the 35th Annual Meeting of the ACL, 1997, pp. 64–71. S. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” in Proceedings of the 34 th Annual Meeting of the ACL, 1996, pp. 310–318. C. Fellbaum, WordNet: An Electronic Lexical Databse, The MIT Press, 1998. E. Brill, “A simple rule-based part of speech tagger,” in Proceedings of the Third Conference on Applied Natural Language Processing, 1992, pp. 152–155. P. Chan and S. Stolfo, “A comparative evaluation of voting and meta-learning on partitioned data,” in Proceedings of the Twelfth International Conference on Machine Learning, 1995, pp. 90–98. E. Charniak, Statistical Language Learning, The MIT Press, 1993. J.-M. Cho and G.-C. Kim, “Korean verb sense disambiguation using distributional information from corpora,” in Proceedings of Natural Language Processing Pacific Rim Symposium, 1995, pp. 691–696. J. Diederich, “Connectionist recruitment learning,” in Proceedings of European Conference on Artificial Intelligence, 1988, pp. 351–356. P. Domingos, “Knowledge acquisition from examples via multiple models,” in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 98–106. B.-T. Zhang, “Learning by incremental selection of critical examples,” Arbeitspapiere der GMD, No. 735, German National Research Center for Computer Science (GMD), St. Augustin/Bonn, Germany, March 1993.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA