Multitask learning for neural generative question answering
Tóm tắt
Neural generative model in question answering (QA) usually employs sequence-to-sequence (Seq2Seq) learning to generate answers based on the user’s questions as opposed to the retrieval-based model selecting the best matched answer from a repository of pre-defined QA pairs. One key challenge of neural generative model in QA lies in generating high-frequency and generic answers regardless of the questions, partially due to optimizing log-likelihood objective function. In this paper, we investigate multitask learning (MTL) in neural network-based method under a QA scenario. We define our main task as agenerative QA via Seq2Seq learning. And we define our auxiliary task as a discriminative QA via binary QAclassification. Both main task and auxiliary task are learned jointly with shared representations, allowing to obtain improved generalization and transferring classification labels as extra evidences to guide the word sequence generation of the answers. Experimental results on both automatic evaluations and human annotations demonstrate the superiorities of our proposed method over baselines.
Tài liệu tham khảo
Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: arXiv preprint arXiv:1409.0473 (2014)
Chen, Z., Watanabe, S.: Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks. In: InterSpeech’15 (2015)
Chung, J., Gucehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: arXiv preprint arXiv:1412.3555 (2014)
Collobert, R., Weston, J.: A unified architecture for natural languageprocessing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)
Diederik, P., Kingma, J.B.: Adam: a method for stochastic optimization. In: arXiv preprint arXiv:1412.6980 (2014)
Fleiss, J.L., Cohen, J.: The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ. Psychol. Meas. 33(3), 613–619 (1973)
Galley, M., Brockett, C., Sordoni, A., Ji, Y., Auli, M., Quirk, C., Mitchell, M., Gao, J., Dolan, B.: deltableu: a discriminative metric for generation tasks with intrinsically diverse targets. In: arXiv preprint arXiv:1506.06863 (2015)
Han, L., Zhang, Y.: Learning multi-level task groups in multi-task learning. In: AAAI’15, pp. 2638–2644 (2015)
Hatori, J., Matsuzaki, T., Miyao, Y., Tsujii, J.: Incremental joint approach to word segmentation, pos tagging, and dependency parsing in chinese. In: ACL’12, pp. 1045–1053 (2012)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hong, C., Yu, J., Chen, X.: Image-based 3D human pose recovery with locality sensitive sparse retrieval. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2013, pp. 2103–2108. IEEE (2013)
Hong, C., Yu, J., Wan, J., Tao, D., Wang, M.: Multimodal deep autoencoder for human pose recovery. IEEE Trans. Image Process. 24(12), 5659–5670 (2015)
Hong, C., Chen, X., Wang, X., Tang, C.: Hypergraph regularized autoencoder for image-based 3d human pose recovery. Signal Process. 124, 132–140 (2016)
Ji, Z., Lu, Z., Li, H.: An information retrieval approach to short text conversation. In: arXiv preprint arXiv:1408.6988 (2014)
Li, J., Galley, M., Brockett, C., Spithourakis, G.P., Gao, J., Dolan, B.: A persona-based neural conversation model. In: arXiv preprint arXiv:1603.06155 (2016)
Li, J., Galley, M., Brockett, C., Gao, J., Dolan, B.: A diversity-promoting objective function for neural conversation models. In: arXiv preprint arXiv:1510.03055 (2015)
Liu, C.W., Lowe, R., Serban, L.V., Noseworthy, M., Charlin, L., Pineau, J.: How NOT to evaluate your dialogue system: an empirical study of unsupervised evaluation metrics for dialogue response generation. In: arXiv preprint arXiv:1603.08023 (2016)
Liu, X., Gao, J., He, X., Deng, L., Duh, K., Wang, Y.Y.: Representation learning using multi-task deep neural networks for semantic classification and information retrieval. In: NAACL’15 (2015)
Lowe, R., Pow, N., Serban, I., Pineau, J.: The ubuntu dialogue corpus: a large dataset for research in unstructured multi-turn dialogue systems. In: arXiv preprint arXiv:1506.08909 (2015)
Luong, M.T., Le, Q.V., Sutskever, I., Vinyals, O., Kaiser, L.: Multi-task sequence to sequence learning. In: arXiv preprint arXiv:1511.06114 (2016)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS’13, pp. 3111–3119 (2013)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: ACL’02, Association for Computational Linguistics, pp. 311–318 (2002)
Pironkov, G., Dupont, S., Dutoit, T.: Speaker-aware long short-term memory multi-task learning for speech recognition. In: EUSIPCO’16, pp. 1911–1915 (2016)
Ritter, A., Cherry, C., Dolan, B.: Data-driven response generation in social media. In: EMNLP’11 (2011)
Serban, I.V., Sordoni, A., Bengio, Y., Courville, A., Pineau, J.: Building end-to-end dialogue systems using generative hierarchical neural network models. In: arXiv preprint arXiv:1507.04808 (2015)
Shang, L., Lu, Z., Li, H.: Neural responding machine for short-text conversation. In: arXiv preprint arXiv:1503.02364 (2015)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS’14, pp. 3104–3112 (2014)
Tang, D., Wei, F., Qin, B., Yang, N., Liu, T., Zhou, M.: Sentiment embeddings with applications to sentiment analysis. IEEE Trans. Knowl. Data Eng. 28(2), 496–509 (2016)
Vinyals, O., Le, Q.: A neural conversational model. In: arXiv preprint arXiv:1506.05869 (2015)
Wu, Y., Wu, W., Zhou, M., Li, Z.: Sequential match network: a new architecture for multi-turn response selection in retrieval-based chatbots. In: arXiv preprint arXiv:1612.01627 (2016)
Xing, C., Wu, W., Wu, Y., Liu, J., Huang Y., Ming, Z., Ma, W.Y.: Topic aware neural response generation. In: AAAI’17, pp. 3351–3357 (2017)
Vinyals, O., Le, Q.: A neural conversational model. In: arXiv preprint arXiv:1506.05869 (2015)
Yin, J., Jiang, X., Lu, Z., Shang, L., Li, H., Li, X.: Neural generative question answering. In: arXiv preprint arXiv:1512.01337 (2015)
Zhou, X., Dong, D., Wu, H., Zhao, S., Yan, R., Yu, D., Liu, X., Tian, H.: Multi-view response selection for human-computer conversation. In: EMNLP’16, pp. 372–381 (2016)