Improvement of Machine Translation Evaluation by Simple Linguistically Motivated Features

Springer Science and Business Media LLC - Tập 26 - Trang 57-67 - 2011
Mu-Yun Yang1, Shu-Qi Sun1, Jun-Guo Zhu1, Sheng Li1, Tie-Jun Zhao1, Xiao-Ning Zhu1
1School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

Tóm tắt

Adopting the regression SVM framework, this paper proposes a linguistically motivated feature engineering strategy to develop an MT evaluation metric with a better correlation with human assessments. In contrast to current practices of “greedy” combination of all available features, six features are suggested according to the human intuition for translation quality. Then the contribution of linguistic features is examined and analyzed via a hill-climbing strategy. Experiments indicate that, compared to either the SVM-ranking model or the previous attempts on exhaustive linguistic features, the regression SVM model with six linguistic information based features generalizes across different datasets better, and augmenting these linguistic features with proper non-linguistic metrics can achieve additional improvements.

Tài liệu tham khảo

Papineni K, Roukos S, Ward T, Zhu W J. BLEU: A method for automatic evaluation of machine translation. IBM Research Report, RC22176 (W0109-022), 2001. George D. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proc. the 2nd International Conference of Human Language Technology Research, San Diego, USA, Mar. 24-27, 2002, pp. 138–145. Kulesza A, Shieber S M. A learning approach to improving sentence-level MT evaluation. In Proc. the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, Baltimore, USA, Oct. 4-6, 2004, pp. 75–84. Leusch G, Ueffing N, Nev H. CDER: Efficient MT evaluation using block movements. In Proc. the 13th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, Apr. 3-7, 2006, pp. 241–248 Russo-Lassner G, Lin J, Resnik P. A paraphrase-based approach to machine translation evaluation. Technical Report, LAMP-TR-125/CS-TR-4754/UMIACS-TR-2005-57, University of Maryland, College Park, USA, August. Lin C Y, Och F J. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proc. the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, Jul. 21-26, 2004, pp. 605–612. Banerjee S, Lavie A, Meteor: An automatic metric for MT evaluation with improved correlation with human judgments. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, USA, Jun. 29-30, 2005, pp. 65–72. Corston-Oliver S, Gamon M, Chris B. A machine learning approach to the automatic evaluation of machine translation. In Proc. the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, Jul. 9-11, 2001, pp. 148–155 Albrecht J S, Hwa R. A re-examination of machine learning approaches for sentence-level MT evaluation. In Proc. the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech, Jun. 23-30, 2007, pp. 880–887. Ye Y, Zhou M, Lin C Y. Sentence level machine translation evaluation as a ranking. In Proc. ACL Second Workshop on Statistical Machine Translation, Prague, Czech, Jun. 23-30, 2007, pp. 240–247. Duh K. Ranking vs. regression in machine translation evaluation. In Proc. ACL 3rd Workshop on Statistical Machine Translation, Columbus, USA, Jun. 15-20, 2008, pp. 191–194. Giménez J, Mμarquez L. Linguistic features for automatic evaluation of heterogenous MT systems. In Proc. ACL 2nd Workshop on Statistical Machine Translation, Prague, Czech, Jun. 23-30, 2007, pp. 256–264. Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N. Confidence estimation for machine translation. Natural Language Engineering Work-shop Final Report, Johns Hopkins University, 2003. Amigó E, Giménez J, Gonzalo J, Mμarquez L. MT evaluation: Human-like vs. human acceptable. In Proc. the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia, Jul. 17-21, 2006, pp. 17–24. Nießn S, Och F J, Leusch G, Ney H. An evaluation tool for machine translation: Fast evaluation for MT research. In Proc. the 2nd International Conference on Language Resources & Evaluation, Athens, Greek, May 30-Jun. 2, 2000, pp. 39–45. Tillmann C, Vogel S, Ney H, Zubiaga A, Sawaf H. Accelerated DP based search for statistical translation. In Proc. European Conference on Speech Communication and Technology, Rhodes, Greece, Sept. 22-25, 1997, pp. 2667–2670. Giménez J, Mμaquez L. Linguistic features for automatic evaluation of heterogeneous MT systems. In Proc. ACL Second Workshop on Statistical Machine Translation, Prague, Czech, Jun. 23-30, 2007, pp. 256–264. Catford J. A Linguistic Theory of Translation. London: Oxford University Press, 1965. Collins M. Head-driven statistical models for natural language parsing [Ph.D. Dissertation]. University of Pennsylvania, 1999. Gale W A, Church K W. A program for aligning sentences in bilingual corpora. Computational Linguistics, 1993, 19(1): 75–102. Abramowitz M, Stegun I. Handbook of Mathematical Functions. US Government Printing Office. 1964. Liu D, Gildea D. Syntactic features for evaluation of machine translation. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, USA, Jun. 25-30, 2005, pp. 25–32. Quirk C B. Training a sentence-level machine translation confidence measure. In Proc. the 4th International Conference on Language Resources and Evaluation, Lisbon, May, 2004, pp. 825–828. Koehn P. Statistical significance tests for machine translation evaluation. In Proc. Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain Jul. 25-26, 2004. Giménez J, Mμarquez L. A smorgasbord of features for automatic MT evaluation. In Proc. ACL Third Workshop on Statistical Machine Translation, Columbus, USA, Jun. 15-20, 2008, pp. 195–198. Zhu X, Yang M, Wang L, Wang J, Li S. A quantitative analysis of linguistic factors in human translation evaluation. In Proc. the 2nd International Symposium on Knowledge Acquisition Modeling (KAM 2009), Wuhan, China, Nov. 30-Dec. 1, 2009, pp. 410–413. Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J. Further meta-evaluation of machine translation. In Proc. ACL Third Workshop on Statistical Machine Translation, Columbus, USA, Jun. 15-20, 2008, pp. 70–106.