Rap4DQ: Learning to recommend relevant API documentation for developer questions

Yi Li1, Shaohua Wang1, Wenbo Wang1, Tien N. Nguyen2, Yan Wang3, Xinyue Ye4
1New Jersey Institute of Technology, University Heights, Newark, USA
2The University of Texas at Dallas, Richardson, USA
3Central University of Finance and Economics, Beijing, China
4Texas A&M University, College Station, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

AdWords (2019) A adwords question: How to create dynamic targeting ’all websites’. https://groups.google.com/forum/#!topic/adwords-api/xPIhAyhAX9o. Last Accessed May 10, 2019

Adwords (2020) Adwords. https://ads.google.com/

Annoy (2020) Annoy. URL https://github.com/spotify/annoy

Berger A, Caruana R, Cohn D, Freitag D, Mittal V (2000) Bridging the lexical chasm: statistical approaches to answer-finding. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 192–199

Bishop CM (2006) Pattern recognition and machine learning. Springer

Brokos G-I, Malakasiotis P, Androutsopoulos I (2016) Using centroids of word embeddings and word mover’s distance for biomedical document retrieval in question answering. arXiv:1608.03905

Burke RD, Hammond KJ, Kulyukin V, Lytinen SL, Tomuro N, Schoenberg S (1997) Question answering from frequently asked question files: Experiences with the faq finder system. AI Mag 18(2):57–57

Cao Q, Trivedi H, Balasubramanian A, Balasubramanian N (2020) Deformer: Decomposing pre-trained transformers for faster question answering. arXi:2005.00697

Cao X, Cong G, Cui B, Jensen CS (2010) A generalized framework of exploring category information for question retrieval in community question answer archives. In: Proceedings of the 19th international conference on World wide web. ACM, pp 201–210

Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078

Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

Duan H, Cao Y, Lin C-Y, Yu Y (2008) Searching questions by identifying question topic and question focus. Proceedings of ACL-08: HLT, pp 156–164

Ebay (2019) A ebay question: How to find product descriptions by id? https://forums.developer.ebay.com/questions/16455/how-to-find-product-descriptions-by-id.html. Last Accessed May 10, 2019

eBay (2020) ebay. https://www.ebay.com/

Er MJ, Zhang Y, Wang N, Pratama M (2016) Attention pooling-based convolutional neural network for sentence modelling. Inf Sci 373:388–403

Figueroa A, Neumann G (2016) Context-aware semantic classification of search queries for browsing community question–answering archives. Knowl-Based Syst 96:1–13

Gu X, Zhang H, Zhang D, Kim S (2016) Deep api learning. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 631–642

Guo J, Fan Y, Ai Q, Bruce Croft W (2016) A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp 55–64

He H, Ning Q, Roth D (2020) Quase: Question-answer driven sentence encoding. In: Proc. of the annual meeting of the association for computational linguistics (ACL)

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

Huang Q, Xia X, Xing Z, Lo D, Wang X (2018) Api method recommendation without worrying about the task-api knowledge gap. In: 2018 33Rd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 293–304

Jeon J, Bruce Croft W, Lee JH (2005) Finding similar questions in large question and answer archives. In: Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, pp 84–90

Ji Z, Xu F, Wang B, He B (2012) Question-answer topic model for question retrieval in community question answering. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, pp 2471–2474

Keras (2019) Keras documentation. https://keras.io/. Last Accessed May 10, 2019

Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882

Kokkinos Y, Margaritis KG (2015) Topology and simulations of a hierarchical markovian radial basis function neural network classifier. Inf Sci 294:612–627

Kusner M, Yu S, Kolkin N, Weinberger K (2015) From word embeddings to document distances. In: International conference on machine learning, pp 957–966

Li J, Sun A, Xing Z (2018a) Learning to answer programming questions with software documentation through social context embedding. Inf Sci 448:36–52

Li X, Jiang H, Kamei Y, Chen X (2018b) Bridging semantic gaps between natural languages and apis with word embedding. IEEE Trans Softw Eng 46(10):1081–1097

Li Y, Wang S, Nguyen TN (2020) An empirical study on the characteristics of question-answering process on developer forums. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings, pp 318–319

Lilleberg J, Zhu Y, Zhang Y (2015) Support vector machines and word2vec for text classification with semantic features. In: 2015 IEEE 14Th international conference on cognitive informatics & cognitive computing (ICCI* CC). IEEE, pp 136–140

Luong M-T (2015) Hieu pham, and christopher d manning. Effective approaches to attention-based neural machine translation. arXiv:1508.04025

Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B (2011) Design lessons from the fastest q&a site in the west. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, pp 2857–2866

Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

Nassif H, Mohtarami M, Glass J (2016) Learning semantic relatedness in community question answering using neural models. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp 137–147

Nicosia M, Filice S, Barrón-Cedeno A, Saleh I, Mubarak H, Gao W, Nakov P, Da San Martino G, Moschitti A, Darwish K et al (2015) Qcri: Answer selection for community question answering-experiments for arabic and english. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp 203–209

NLTK (2020) Nltk. https://www.nltk.org/

Pal SK, Mitra S (1992) Multilayer perceptron, fuzzy sets, and classification. IEEE Trans Neural Netw 3(5):683–697

Palangi H, Li D, Shen Y, Gao J, He X, Chen J, Song X, Ward R (2016) Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 24(4):694–707

Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv:1802.05365

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9

Rahman MM, Roy C (2018) Effective reformulation of query for code search using crowdsourced knowledge and extra-large data analytics. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 473–484

Rahman MM, Roy CK, Lo D (2016) Rack: Automatic api recommendation using crowdsourced knowledge. In: 2016 IEEE 23Rd international conference on software analysis, evolution, and reengineering (SANER), vol 1. IEEE, pp 349–359

Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press

Ranklib (2020) Ranklib. https://github.com/codelibs/ranklib. Last Accessed Dec 9, 2020

Rap4DQ Replication (2020) Rap4dq-replication. https://github.com/spacenjit/QA2020

Robertson S, Zaragoza H, et al. (2009) The probabilistic relevance framework: Bm25 and beyond. Found Trends®; Inf Retr 3(4):333–389

Sakai T, Ishikawa D, Kando N, Seki Y, Kuriyama K, Lin C-Y (2011) Using graded-relevance metrics for evaluating community qa answer selection. In: Proceedings of the fourth ACM international conference on Web search and data mining, pp 187–196. ACM

Scikit-learn (2020) Scikit-learn. https://scikit-learn.org/stable/

Severyn A, Moschitti A (2015) Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 373–382

Severyn A, Moschitti A (2016) Modeling relational information in question-answer pairs with convolutional neural networks. arXiv:1604.01178

Silva RFG, Roy CK, Rahman MM, Schneider A, Paixao K, de Almeida Maia M (2019) Recommending comprehensive solutions for programming tasks by mining crowd knowledge. In: 2019 IEEE/ACM 27Th international conference on program comprehension (ICPC). IEEE, pp 358–368

Singh P, Simperl E (2016) Using semantics to search answers for unanswered questions in q&a forums. In: Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, pp 699–706

Squire M (2015) ”Should we move to stack overflow?” measuring the utility of social media for developer support. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol 2. IEEE, pp 219–228

StackExchangeNetwork (2020) Stack overflow. https://stackoverflow.com/

Sun R, Cui H, Li K, Kan M-Y, Chua T-S (2005) Dependency relation matching for answer selection. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 651–652

Surdeanu M, Ciaramita M, Zaragoza H (2008) Learning to rank answers on large online qa collections. In: Proceedings of ACL-08: HLT, pp 719–727

Sutskever I, Vinyals O, Le Quoc V (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112

Tan Mx, Santos CD, Xiang B, Zhou B (2016) Improved representation learning for question answer matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol 1, pp 464–473

Twitter (2020) Twitter. URL https://twitter.com/

Uddin G, Khomh F (2017) Automatic summarization of api reviews. In: 2017 32Nd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 159–170

Venkatesh PK, Wang S, Zhang F, Zou Y, Hassan AE (2016) What do client developers concern when using web apis? an empirical study on developer forums and stack overflow. In: 2016 IEEE International conference on web services (ICWS). IEEE, pp 131–138

Wang S, Chen T-HP, Hassan AE (2018) How do users revise answers on technical q&a websites? a case study on stack overflow. IEEE Transactions on Software Engineering

Wu Q, Burges CJC, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retr 13(3):254–270

Xue X, Jeon J, Bruce Croft W (2008) Retrieval models for question and answer archives. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 475–482

Yan R, Song Y, Wu H (2016) Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, pp 55–64

Yao Y, Tong H, Xie T, Akoglu L, Xu F, Lu J (2015) Detecting high-quality posts in community question answering sites. Inf Sci 302:70–82

Yen S-J, Wu Y-C, Yang J-C, Lee Y-S, Lee C-J, Liu J-J (2013) A support vector machine-based context-ranking model for question answering. Inf Sci 224:77–87

Zhou G, Li C, Zhao J, Liu K (2011) Phrase-based translation model for question retrieval in community question answer archives. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, pp 653–662

Zhou G, Liu Y, Liu F, Zeng D, Zhao J (2013) Improving question retrieval in community question answering using world knowledge. In: Twenty-third international joint conference on artificial intelligence

Zhou G, He T, Zhao J, Hu P (2015) Learning continuous word embedding with metadata for question retrieval in community question answering. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol 1, pp 250–259

Zhou G, Zhou Y, He T, Wu W (2016) Learning semantic representation with neural networks for community question answering retrieval. Knowl-Based Syst 93:75–83