Sentence-level ranking with quality estimation
Tóm tắt
Starting from human annotations, we provide a strategy based on machine learning that performs preference ranking on alternative machine translations of the same source, at sentence level. Rankings are decomposed into pairwise comparisons so that they can be learned by binary classifiers, using black-box features derived from linguistic analysis. In order to recompose from the pairwise decisions of the classifier, they are weighed with their classification probabilities, increasing the correlation coefficient by 80 %. We also demonstrate several configurations of successful automatic ranking models. The best configurations achieve a correlation with human judgments measured by Kendall’s tau at 0.27. Although the method does not use reference translations, this correlation is comparable to the one achieved by state-of-the-art reference-aware automatic evaluation metrics such as smoothed BLEU, METEOR and Levenshtein distance.
Tài liệu tham khảo
Avramidis E (2011) DFKI system combination with sentence ranking at ML4HMT-2011. In: Proceedings of the international workshop on using linguistic information for hybrid machine translation and of the shared task on applying machine learning techniques to optimising the division of labour in hybrid machine translation, Barcelona, Spain, pp 99–103
Avramidis E, Popovic M, Vilar D, Burchardt A, Popović M (2011) Evaluate with confidence estimation: machine ranking of translation outputs using grammatical features. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, UK, pp 65–70
Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: Proceedings of the 20th international conference on Computational Linguistics, Stroudsburg, PA, USA
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-) Evaluation of machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 136–158
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the third workshop on statistical machine translation, Columbus, Ohio, pp 70–106
Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the fourth workshop on statistical machine translation, Athens, Greece, pp 1–28
Callison-Burch C, Koehn P, Monz C, Peterson K, Przybocki M, Zaidan O (2010) Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In: Proceedings of the joint fifth workshop on statistical machine translation and metricsMATR, Uppsala, Sweden, pp 17–53
Callison-Burch C, Koehn P, Monz C, Zaidan O (2011) Findings of the 2011 workshop on statistical machine translation. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, UK, pp 22–64
Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 10–51
Cameron A (1998) Regression analysis of count data. Cambridge University Press, Cambridge
Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74(368):829–836
Coomans D, Massart D (1982) Alternative k-nearest neighbour rules in supervised pattern recognition. Anal Chimica Acta 138:15–27
Demšar J, Zupan B, Leban G, Curk T (2004) Orange: from experimental machine learning to interactive data mining. In: Principles of data mining and knowledge discovery, pp 537–539
Duh K (2008) Ranking vs. regression in machine translation evaluation. In: Proceedings of the third workshop on statistical machine translation, Columbus, Ohio, pp 191–194
Federmann C, Avramidis E, Ruiz MCj, van Genabith J, Melero M, Pecina P (2012) The ML4HMT workshop on optimising the division of labour in hybrid machine translation. In: Proceedings of the 8th ELRA conference on language resources and evaluation, Istanbul, Turkey
Goodstadt L (2010) Ruffus: a lightweight Python library for computational pipelines. Bioinformatics 26(21):2778–2779
He Y, Ma Y, van Genabith J, Way A (2010) Bridging SMT and TM with translation recommendation. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp 622–630
Herbrich R, Graepel T, Obermayer K (1999) Support vector learning for ordinal regression. In: International conference on artificial neural networks, pp 97–102
Hopkins M, May J (2011) Tuning as ranking. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, UK, pp 1352–1362
Hosmer D (1989) Applied logistic regression, 8th edn. Wiley, New York
Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16–17):1897–1916
Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1–2):81–93
Khedr AM (2008) Learning k-nearest neighbors classifier from distributed data. Comput Inform 27(3):355–376
Knight WR (1966) A computer method for calculating Kendalls tau with ungrouped data. J Am Stat Assoc 61(314):436–439
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Conference proceedings: the tenth machine translation summit, AAMT, AAMT, Phuket, Thailand, pp 79–86
Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 228–231
Levenshtein V (1966) Binary Codes Capable of Correcting Deletions and Insertions and Reversals. Sov Phys Doklady 10(8):707–710
Lopez A (2012) Putting human assessments of machine translation systems in order. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 1–9
Miller A (2002) Subset selection in regression, 2nd edn. Chapman & Hall, London
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pp 311–318
Parton K, Tetreault J, Madnani N, Chodorow M (2011) E-rating machine translation. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, UK, pp 108–115
Petrov S, Klein D (2007) Improved inference for unlexicalized parsing. In: Proceedings of the conference of the North American chapter of the Association for Computational Linguistics, Rochester, NY, pp 404–411
Petrov S, Barrett L, Thibaux R, Klein D (2006) Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney, Australia, pp 433–440
Raybaud S, Lavecchia C, David L, Kamel S (2009a) Word-and sentence-level confidence measures for machine translation. In: 13th Annual meeting of the European Association for Machine Translation, European Association of Machine Translation, Barcelona, Spain
Raybaud S, Lavecchia C, Langlois D, Kamel S (2009b) New confidence measures for statistical machine translation. In: Proceedings of the international conference on agents, pp 394–401
Rosti AV, Ayan NF, Xiang B, Matsoukas S, Schwartz R, Dorr BJ (2007) Combining outputs from multiple machine translation systems. In: Proceedings of the North American chapter of the Association for Computational Linguistics Human Language Technologies, Rochester, NY, pp 228–235
Sánchez-Martínez F (2011) Choosing the best machine translation system to translate a sentence by using only source-language information. In: Proceedings of the 15th annual conference of the European Association for Machine Translation, Leuve, Belgium, pp 97–104
Siegel M (2011) Autorenunterstützung für die Maschinelle Übersetzung. In: Multilingual resources and multilingual applications: proceedings of the conference of the German Society for computational linguistics and language technology (GSCL), Hamburg
Soricut R, Narsale S (2012) Combining quality prediction and system selection for improved automatic translation output. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 163–170
Soricut R, Wang Z, Bach N (2012) The SDL language weaver systems in the WMT12 quality estimation shared task. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 145–151
Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009) Estimating the sentence-level quality of machine translation systems. In: 13th annual meeting of the European Association for Machine Translation, Barcelona, Spain., pp 28–35
Specia L, Raj D, Turchi M (2010) Machine translation evaluation versus quality estimation. Mach Transl 24(1):39–50
Specia L, Felice M (2012) Linguistic features for quality estimation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 96–103
Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of the seventh international conference on spoken language processing, pp 901–904
Ueffing N, Ney H (2005) Word-level confidence estimation for machine translation using phrase-based translation models. Comput Linguist, pp 763–770
Vilar D, Avramidis E, Popović M, Hunsicker S (2011) DFKI’s SC and MT submissions to IWSLT, (2011) In: Proceedings of the international workshop on spoken language translation 2011. San Francisco, CA, USA, pp 98–105
Wagner J, Foster J (2009) The effect of correcting grammatical errors on parse probabilities. In: Proceedings of the 11th international conference on parsing technologies, Stroudsburg, PA, USA, pp 176–179
Ye Y, Zhou M, Lin CY (2007) Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU. In: Proceedings of the second workshop on statistical machine translation, Association for Computational Linguistics, Prague, Czech Republic, pp 240–247