Sentence-level ranking with quality estimation

Machine Translation - Tập 27 - Trang 239-256 - 2013

Eleftherios Avramidis¹

¹Language Technology Lab, German Research Center for Artificial Intelligence (DFKI GmbH), Berlin, Germany

Tóm tắt

Starting from human annotations, we provide a strategy based on machine learning that performs preference ranking on alternative machine translations of the same source, at sentence level. Rankings are decomposed into pairwise comparisons so that they can be learned by binary classifiers, using black-box features derived from linguistic analysis. In order to recompose from the pairwise decisions of the classifier, they are weighed with their classification probabilities, increasing the correlation coefficient by 80 %. We also demonstrate several configurations of successful automatic ranking models. The best configurations achieve a correlation with human judgments measured by Kendall’s tau at 0.27. Although the method does not use reference translations, this correlation is comparable to the one achieved by state-of-the-art reference-aware automatic evaluation metrics such as smoothed BLEU, METEOR and Levenshtein distance.

Tài liệu tham khảo

Avramidis E (2011) DFKI system combination with sentence ranking at ML4HMT-2011. In: Proceedings of the international workshop on using linguistic information for hybrid machine translation and of the shared task on applying machine learning techniques to optimising the division of labour in hybrid machine translation, Barcelona, Spain, pp 99–103 Avramidis E, Popovic M, Vilar D, Burchardt A, Popović M (2011) Evaluate with confidence estimation: machine ranking of translation outputs using grammatical features. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, UK, pp 65–70 Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: Proceedings of the 20th international conference on Computational Linguistics, Stroudsburg, PA, USA Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-) Evaluation of machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 136–158 Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the third workshop on statistical machine translation, Columbus, Ohio, pp 70–106 Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the fourth workshop on statistical machine translation, Athens, Greece, pp 1–28 Callison-Burch C, Koehn P, Monz C, Peterson K, Przybocki M, Zaidan O (2010) Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In: Proceedings of the joint fifth workshop on statistical machine translation and metricsMATR, Uppsala, Sweden, pp 17–53 Callison-Burch C, Koehn P, Monz C, Zaidan O (2011) Findings of the 2011 workshop on statistical machine translation. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, UK, pp 22–64 Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 10–51 Cameron A (1998) Regression analysis of count data. Cambridge University Press, Cambridge Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74(368):829–836 Coomans D, Massart D (1982) Alternative k-nearest neighbour rules in supervised pattern recognition. Anal Chimica Acta 138:15–27 Demšar J, Zupan B, Leban G, Curk T (2004) Orange: from experimental machine learning to interactive data mining. In: Principles of data mining and knowledge discovery, pp 537–539 Duh K (2008) Ranking vs. regression in machine translation evaluation. In: Proceedings of the third workshop on statistical machine translation, Columbus, Ohio, pp 191–194 Federmann C, Avramidis E, Ruiz MCj, van Genabith J, Melero M, Pecina P (2012) The ML4HMT workshop on optimising the division of labour in hybrid machine translation. In: Proceedings of the 8th ELRA conference on language resources and evaluation, Istanbul, Turkey Goodstadt L (2010) Ruffus: a lightweight Python library for computational pipelines. Bioinformatics 26(21):2778–2779 He Y, Ma Y, van Genabith J, Way A (2010) Bridging SMT and TM with translation recommendation. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp 622–630 Herbrich R, Graepel T, Obermayer K (1999) Support vector learning for ordinal regression. In: International conference on artificial neural networks, pp 97–102 Hopkins M, May J (2011) Tuning as ranking. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, UK, pp 1352–1362 Hosmer D (1989) Applied logistic regression, 8th edn. Wiley, New York Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16–17):1897–1916 Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1–2):81–93 Khedr AM (2008) Learning k-nearest neighbors classifier from distributed data. Comput Inform 27(3):355–376 Knight WR (1966) A computer method for calculating Kendalls tau with ungrouped data. J Am Stat Assoc 61(314):436–439 Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Conference proceedings: the tenth machine translation summit, AAMT, AAMT, Phuket, Thailand, pp 79–86 Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 228–231 Levenshtein V (1966) Binary Codes Capable of Correcting Deletions and Insertions and Reversals. Sov Phys Doklady 10(8):707–710 Lopez A (2012) Putting human assessments of machine translation systems in order. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 1–9 Miller A (2002) Subset selection in regression, 2nd edn. Chapman & Hall, London Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pp 311–318 Parton K, Tetreault J, Madnani N, Chodorow M (2011) E-rating machine translation. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, UK, pp 108–115 Petrov S, Klein D (2007) Improved inference for unlexicalized parsing. In: Proceedings of the conference of the North American chapter of the Association for Computational Linguistics, Rochester, NY, pp 404–411 Petrov S, Barrett L, Thibaux R, Klein D (2006) Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney, Australia, pp 433–440 Raybaud S, Lavecchia C, David L, Kamel S (2009a) Word-and sentence-level confidence measures for machine translation. In: 13th Annual meeting of the European Association for Machine Translation, European Association of Machine Translation, Barcelona, Spain Raybaud S, Lavecchia C, Langlois D, Kamel S (2009b) New confidence measures for statistical machine translation. In: Proceedings of the international conference on agents, pp 394–401 Rosti AV, Ayan NF, Xiang B, Matsoukas S, Schwartz R, Dorr BJ (2007) Combining outputs from multiple machine translation systems. In: Proceedings of the North American chapter of the Association for Computational Linguistics Human Language Technologies, Rochester, NY, pp 228–235 Sánchez-Martínez F (2011) Choosing the best machine translation system to translate a sentence by using only source-language information. In: Proceedings of the 15th annual conference of the European Association for Machine Translation, Leuve, Belgium, pp 97–104 Siegel M (2011) Autorenunterstützung für die Maschinelle Übersetzung. In: Multilingual resources and multilingual applications: proceedings of the conference of the German Society for computational linguistics and language technology (GSCL), Hamburg Soricut R, Narsale S (2012) Combining quality prediction and system selection for improved automatic translation output. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 163–170 Soricut R, Wang Z, Bach N (2012) The SDL language weaver systems in the WMT12 quality estimation shared task. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 145–151 Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009) Estimating the sentence-level quality of machine translation systems. In: 13th annual meeting of the European Association for Machine Translation, Barcelona, Spain., pp 28–35 Specia L, Raj D, Turchi M (2010) Machine translation evaluation versus quality estimation. Mach Transl 24(1):39–50 Specia L, Felice M (2012) Linguistic features for quality estimation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 96–103 Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of the seventh international conference on spoken language processing, pp 901–904 Ueffing N, Ney H (2005) Word-level confidence estimation for machine translation using phrase-based translation models. Comput Linguist, pp 763–770 Vilar D, Avramidis E, Popović M, Hunsicker S (2011) DFKI’s SC and MT submissions to IWSLT, (2011) In: Proceedings of the international workshop on spoken language translation 2011. San Francisco, CA, USA, pp 98–105 Wagner J, Foster J (2009) The effect of correcting grammatical errors on parse probabilities. In: Proceedings of the 11th international conference on parsing technologies, Stroudsburg, PA, USA, pp 176–179 Ye Y, Zhou M, Lin CY (2007) Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU. In: Proceedings of the second workshop on statistical machine translation, Association for Computational Linguistics, Prague, Czech Republic, pp 240–247

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA