Learning to rank for why-question answering
Tóm tắt
In this paper, we evaluate a number of machine learning techniques for the task of ranking answers to why-questions. We use TF-IDF together with a set of 36 linguistically motivated features that characterize questions and answers. We experiment with a number of machine learning techniques (among which several classifiers and regression techniques, Ranking SVM and SVM
map
) in various settings. The purpose of the experiments is to assess how the different machine learning approaches can cope with our highly imbalanced binary relevance data, with and without hyperparameter tuning. We find that with all machine learning techniques, we can obtain an MRR score that is significantly above the TF-IDF baseline of 0.25 and not significantly lower than the best score of 0.35. We provide an in-depth analysis of the effect of data imbalance and hyperparameter tuning, and we relate our findings to previous research on learning to rank for Information Retrieval.
Tài liệu tham khảo
Akbani, R., Kwek, S., & Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. In Lecture notes in computer science: Machine learning: ECML 2004 (Vol. 3201, pp. 39–50). New York: Springer.
Aslam, J. A., Kanoulas, E., Pavlu, V., Savev, S., & Yilmaz, E. (2009). Document selection methodologies for efficient and effective learning-to-rank. In Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 468–475). New York, NY: ACM.
Banerjee, S., & Pedersen, T. (2002). An adapted Lesk algorithm for word sense disambiguation using WordNet. In Lecture notes in computer science: Computational linguistics and intelligent text processing (pp. 136–145). New York: Springer.
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., et al. (2005). Learning to rank using gradient descent. In Proceedings of the 22nd international conference on machine learning (pp. 89–96)
Cao, Z., Qin, T., Liu, T., Tsai, M., & Li, H. (2007). Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on machine learning (pp. 129–136). New York, NY: ACM.
Carterette, B., & Petkova, D. (2006). Learning a ranking from pairwise preferences. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 629–630). New York, NY: ACM.
Chakrabarti, S., Khanna, R., Sawant, U., & Bhattacharyya, C. (2008). Structured learning for non-smooth ranking losses. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 88–96). New York, NY: ACM.
Charniak, E. (2000). A maximum-entropy-inspired parser. ACM International Conference Proceedings Series, 4, 132–139.
Cossock, D., & Zhang, T. (2006) Subset ranking using regression. In Lecture notes in computer science: Learning theory (Vol. 4005, p. 605). New York: Springer.
Denoyer, L., & Gallinari, P. (2006). The Wikipedia XML corpus. ACM SIGIR Forum, 40(1), 64–69.
Fan, W., Fox, E., Pathak, P., & Wu, H. (2004). The effects of fitness functions on genetic programming-based ranking discovery for web search. Journal of the American Society for Information Science and Technology, 55(7), 628–636.
Fellbaum, C. (eds) (1998) WordNet: An electronic lexical database. Cambridge, MA: MIT Press.
Freund, Y., Iyer, R., Schapire, R., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research, 4, 933–969.
Furnkranz, J., & Hullermeier, E. (2003). Pairwise preference learning and ranking. In Lecture notes in computer science: Machine learning: ECML 2003 (pp. 145–156). New York: Springer.
Goldberg, D., & Holland, J. (1988). Genetic algorithms and machine learning. Machine Learning, 3(2), 95–99
Herbrich, R., Graepel, T., & Obermayer, K. (2002). Large margin rank boundaries for ordinal regression. In Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 133–142). New York, NY: ACM.
Higashinaka, R., & Isozaki, H. (2008). Corpus-based question answering for why-questions. In Proceedings of the international joint conference on natural language processing (IJCNLP) (pp. 418–425).
Hovy, E., Hermjakob, U., & Ravichandran, D. (2002). A question/answer typology with surface text patterns. In Proceedings of the human language technology conference (HLT), San Diego, CA, USA, pp. 247–251.
Hsu, C., Chang, C., & Lin, C., et al. (2003). A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University, Taipei 106, Taiwan.
Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5), 429–449.
Joachims, T. (2002) Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 133–142). New York, NY: ACM.
Liu, T. (2009). Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3), 225–331.
Liu, T., Xu, J., Qin, T., Xiong, W., & Li, H. (2007). Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of the workshop on learning to rank for information retrieval (LR4IR) at SIGIR 2007 (pp. 3–10).
Maybury, M. (2002). Toward a question answering roadmap. Technical report, Mitre Corporation, Bedford, MA, USA.
Owen, A. (2007). Infinitely imbalanced logistic regression. The Journal of Machine Learning Research, 8, 761–77.
Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). WordNet::Similarity—Measuring the relatedness of concepts. In Proceedings of the national conference on artificial intelligence (pp. 1024–1025). Menlo Park: AAAI Press.
Qin, T., Liu, T., Xu, J., & Li, H. (2008) LETOR: A benchmark collection for learning to rank for information retrieval. Technical report, Microsoft Research Asia
Shen, L., & Joshi, A. (2005). Ranking and reranking with perceptron. Machine Learning, 60(1), 73–96.
Surdeanu, M., Ciaramita, M., & Zaragoza, H. (2008). Learning to rank answers on large online QA collections. In Proceedings of the 46th annual meeting on association for computational linguistics (pp. 719–727).
Tang, Y., Zhang, Y., Chawla, N., & Krasser, S. (2009). SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 39(1), 281–288.
Tiedemann, J. (2007). A comparison of genetic algorithms for optimizing linguistically informed IR in question answering. In Lecture notes in computer science: AI*IA 2007: Artificial intelligence and human-oriented computing (Vol. 4733, pp. 398). New York: Springer.
Trotman, A. (2004). An artificial intelligence approach to information retrieval. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (p. 603). New York, NY: ACM.
Usunier, N., Amini, M., & Gallinari, P. (2004). Boosting weak ranking functions to enhance passage retrieval for question answering. In Proceedings of the workshop on information retrieval for question answering (IR4QA) at SIGIR 2004 (pp. 1–6).
Verberne, S., Boves, L., Oostdijk N., & Coppen, P. (2007). Evaluating discourse-based answer extraction for why-question answering. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 735–736). New York, NY: ACM.
Verberne, S., Boves, L., Oostdijk, N., & Coppen, P. (2008). Using syntactic information for improving why-question answering. In Proceedings of the 22nd international conference on computational linguistics (COLING) (pp. 953–960).
Verberne, S., Boves, L., Oostdijk, N., & Coppen, P. (2010). What is not in the bag of words for why-QA? Computational Linguistics, 32(2), 229–245.
Voorhees, E. (2000). Overview of the TREC-9 question answering track. In Proceedings of TREC-9 (Vol. 7, pp. 1–80).
Voorhees, E. (2003). Overview of the TREC 2003 question answering track. In Proceedings of the twelfth text retrieval conference (TREC 2003) (Vol. 142, pp. 54–68).
Voorhees, E., & Tice, D. (2000). Building a question answering test collection. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 200–207). New York, NY: ACM.
Xia, F., Liu, T., Wang, J., Zhang, W., & Li, H. (2008). Listwise approach to learning to rank: Theory and algorithm. In Proceedings of the 25th international conference on machine learning (pp. 1192–1199). New York, NY: ACM
Xu, J., & Li, H. (2007) Adarank: A boosting algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 391–398). New York, NY: ACM.
Yeh, J., Lin, J., Ke, H., Yang, W. (2007). Learning to rank for information retrieval using genetic programming. In Proceedings of the workshop on learning to rank for information retrieval (LR4IR) at SIGIR 2007.
Yue, Y., Finley, T., Radlinski, F., & Joachims, T. (2007). A support vector method for optimizing average precision. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 271–278). New York, NY: ACM.
Zhai, C. (2001). Notes on the Lemur TFIDF model. Technical report, School of Computer Science, Carnegie Mellon University.