Một hàm năng lượng khớp ngữ nghĩa cho việc học với dữ liệu đa quan hệ

Machine Learning - Tập 94 - Trang 233-259 - 2013
Antoine Bordes1, Xavier Glorot2, Jason Weston3, Yoshua Bengio2
1Heudiasyc UMR 7253, Université de Technlogie de Compiègne & CNRS, Compiègne, France
2Université de Montréal, Montréal, Canada
3Google, New York, USA

Tóm tắt

Việc học quan hệ quy mô lớn trở nên rất quan trọng để xử lý lượng dữ liệu có cấu trúc khổng lồ được sinh ra hàng ngày trong nhiều lĩnh vực ứng dụng, từ sinh học tính toán hoặc tìm kiếm thông tin đến xử lý ngôn ngữ tự nhiên. Trong bài báo này, chúng tôi trình bày một kiến trúc mạng nơ-ron mới được thiết kế để nhúng các đồ thị đa quan hệ vào một không gian vector liên tục linh hoạt, trong đó dữ liệu gốc được giữ lại và cải thiện. Mạng được đào tạo để mã hóa ngữ nghĩa của những đồ thị này nhằm gán xác suất cao cho các thành phần khả thi. Chúng tôi cho thấy một cách thực nghiệm rằng nó đạt hiệu suất cạnh tranh trong việc dự đoán liên kết trên các tập dữ liệu chuẩn từ tài liệu cũng như trên dữ liệu từ một cơ sở tri thức thực tế (WordNet). Ngoài ra, chúng tôi trình bày cách mà phương pháp của chúng tôi có thể được áp dụng để thực hiện giải quyết từ đồng nghĩa trong bối cảnh phân tích ngữ nghĩa văn bản mở, nơi mục tiêu là học để gán một đại diện nghĩa có cấu trúc cho hầu như bất kỳ câu nào của văn bản tự do, chứng minh rằng nó có thể mở rộng đến hàng chục nghìn nút và hàng nghìn loại quan hệ.

Từ khóa

#khoảng không vector #đồ thị đa quan hệ #mạng nơ-ron #dự đoán liên kết #giải nghĩa từ đồng nghĩa

Tài liệu tham khảo

Baker, C., Fillmore, C., & Lowe, J. (1998). The Berkeley FrameNet project. In ACL ’98 (pp. 86–90). Banerjee, S., & Pedersen, T. (2002). An adapted lesk algorithm for word sense disambiguation using wordnet. In Proceedings of the third international conference on computational linguistics and intelligent text processing, CICLing ’02 (pp. 136–145). Berlin: Springer. Bengio, Y. (2008). Neural net language models. Scholarpedia, 3(1), 3881. http://www.scholarpedia.org/article/Neural_net_language_models. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155. Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., & Bengio, Y. (2010). Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for scientific computing conference (SciPy). http://www.iro.umontreal.ca/~lisa/pointeurs/theano_scipy2010.pdf, oral Presentation. Bordes, A., Usunier, N., Collobert, R., & Weston, J. (2010). Towards understanding situated natural language. In Proceedings of the 13th international conference on artificial intelligence and statistics (Vol. 9, pp. 65–72). Bordes, A., Weston, J., Collobert, R., & Bengio, Y. (2011). Learning structured embeddings of knowledge bases. In Proceedings of the 25th conference on artificial intelligence (AAAI-11), San Francisco, USA. Bordes, A., Glorot, X., Weston, J., & Bengio, Y. (2012). Joint learning of words and meaning representations for open-text semantic parsing. Journal of Machine Learning Research, 22, 127–135. Boser, B., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory (pp. 144–152). Bottou, L. (2011). From machine learning to machine reasoning. Tech. rep. arXiv:1102.1808. Cambria, E., Hussain, A., Havasi, C., & Eckl, C. (2009). Affectivespace: blending common sense and affective knowledge to perform emotive reasoning. In WOMSA at CAEPIA (pp. 32–41). Caruana, R. (1995). Learning many related tasks at the same time with backpropagation. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems: Vol. 7. NIPS’94 (pp. 657–664). Cambridge: MIT Press. Chu, W., & Ghahramani, Z. (2009). Probabilistic models for incomplete multi-dimensional arrays. Journal of Machine Learning Research, 5, 89–96. Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on machine learning. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537. Coppola, B., & Moschitti, A. (2010). A general purpose FrameNet-based shallow semantic parser. In Proceedings of the international conference on language resources and evaluation, LREC’10. Decadt, B., Hoste, V., Daeleamns, W., & van den Bosh, A. (2004). Gamble, genetic algorithm optimization of memory-based WSD. In Proceeding of ACL/SIGLEX Senseval-3. Denham, W. (1973). The detection of patterns in alyawarra nonverbal behavior. PhD thesis, University of Washington. Fader, A., Soderland, S., & Etzioni, O. (2011). Identifying relations for open information extraction. In Proceedings of the conference on empirical methods in natural language processing, association for computational linguistics, EMNLP ’11 (pp. 1535–1545). Franz, T., Schultz, A., Sizov, S., & Staab, S. (2009). Triplerank: ranking semantic web data by tensor decomposition. In Proceedings of the 8th international semantic web conference, ISWC ’09 (pp. 213–228). Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning (Adaptive computation and machine learning). Cambridge: MIT Press. Giuglea, A., & Moschitti, A. (2006). Shallow semantic parsing based on FrameNet, VerbNet and PropBank. In Proceeding of the 17th European conference on artificial intelligence (ECAI’06) (pp. 563–567). Gutmann, M., & Hyvärinen, A. (2010). Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In Proceedings of the international conference on artificial intelligence and statistics (AISTATS2010). Hamel, P., Lemieux, S., Bengio, Y., & Eck, D. (2011). Temporal pooling and multiscale learning for automatic annotation and ranking of music audio. In Proceedings of the 12th international conference on music information retrieval (ISMIR11). Harabagiu, S., & Moldovan, D. (2002). Knowledge processing on extended WordNet. In C. Fellbaum (Ed.), WordNet: an electronic lexical database and some of its applications (pp. 379–405). Cambridge: MIT Press. Harshman, R. A., & Lundy, M. E. (1994). Parafac: parallel factor analysis. Computational Statistics & Data Analysis, 18(1), 39–72. Havasi, C., Speer, R., & Pustejovsky, J. (2010). Coarse Word-sense disambiguation using common sense. In AAAI Fall symposium series. Jenatton, R., Le Roux, N., Bordes, A., Obozinski, G., et al. (2012). A latent factor model for highly multi-relational data. In Neural information processing systems, NIPS 2012. Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T., & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In Proceedings of the 21st national conference on artificial intelligence, AAAI’06 (Vol. 1, pp. 381–388). Menlo Park: AAAI Press. Kingsbury, P., & Palmer, M. (2002). From treebank to PropBank. In Proceedings of the 3rd international conference on language resources and evaluation. Kok, S., & Domingos, P. (2007). Statistical predicate invention. In Proceedings of the 24th international conference on machine learning, ICML ’07 (pp. 433–440). New York: ACM. Landauer, T., & Dumais, S. (1997). A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211. Lecun, Y., Chopra, S., Hadsell, R., Marc’aurelio, R., & Huang, F. (2006). A tutorial on energy-based learning. In G. Bakir, T. Hofman, B. Schölkopf, A. Smola, & B. Taskar (Eds.), Predicting structured data. Cambridge: MIT Press. Liu, H., & Singh, P. (2004). Focusing on conceptnet’s natural language knowledge representation. In Proceedings of the 8th international conference on knowledge-based intelligent information and engineering systems. Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, 28(2), 203–208. Martinez, D., de Lacalle, O., & Agirre, E. (2008). On the use of automatically acquired examples for all-nouns word sense disambiguation. The Journal of Artificial Intelligence Research, 33, 79–107. McCray, A. T. (2003). An upper level ontology for the biomedical domain. Comparative and Functional Genomics, 4, 80–88. Miller, K., Griffiths, T., & Jordan, M. (2009). Nonparametric latent feature models for link prediction. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 1276–1284). Mitchell, J., & Lapata, M. (2008). Vector-based models of semantic composition. In Proceedings of ACL-08: HLT (pp. 236–244). Mooney, R. (2004). Learning semantic parsers: an important but under-studied problem. In Proceedings of the 19th AAAI conference on artificial intelligence Nickel, M., Tresp, V., & Kriegel, H. P. (2011). A three-way model for collective learning on multi-relational data. In L. Getoor & T. Scheffer (Eds.), Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 809–816). New York: ACM. Nickel, M., Tresp, V., & Kriegel, H. P. (2012). Factorizing yago: scalable machine learning for linked data. In Proceedings of the 21st international conference on world wide web, WWW ’12 (pp. 271–280). Paccanaro, A. (2000). Learning distributed representations of concepts from relational data. IEEE Transactions on Knowledge and Data Engineering, 13, 200. Paccanaro, A., & Hinton, G. (2001). Learning distributed representations of concepts using linear relational embedding. IEEE Transactions on Knowledge and Data Engineering, 13, 232–244. Poon, H., & Domingos, P. (2009). Unsupervised semantic parsing. In Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore (pp. 1–10). Poon, H., & Domingos, P. (2010). Unsupervised ontology induction from text. In Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden (pp. 296–305). Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400–407. Rummel, R. J. (1999). Dimensionality of nations project: attributes of nations and behavior of nation dyads. In ICPSR data file (pp. 1950–1965). Shi, L., & Mihalcea, R. (2004). Open text semantic parsing using FrameNet and WordNet. In HLT-NAACL 2004: demonstration papers, Boston, MA, USA (pp. 19–22). Singla, P., & Domingos, P. (2006). Entity resolution with Markov logic. In Proceedings of the sixth international conference on data mining (pp. 572–582). Los Alamitos: IEEE Computer Society. Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(1), 159–216. Socher, R., Huval, B., Manning, C. D., & Ng, A. Y. (2012). Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 conference on empirical methods in natural language processing (EMNLP). Speer, R., Havasi, C., & Lieberman, H. (2008). Analogyspace: reducing the dimensionality of common sense knowledge. In Proceedings of the 23rd national conference on artificial intelligence, AAAI’08 (Vol. 1, pp. 548–553). Menlo Park: AAAI Press. Sutskever, I., Salakhutdinov, R., & Tenenbaum, J. (2009). Modelling relational data using Bayesian clustered tensor factorization. In Advances in neural information processing systems (Vol. 22). Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31, 279–311. van der Maaten, L., & Hinton, G. (2008). Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research, 9, 2579–2605. Weston, J., Bengio, S., & Usunier, N. (2010). Large scale image annotation: learning to rank with joint word-image embeddings. Machine Learning, 81, 21–35.