N-gram posterior probability confidence measures for statistical machine translation: an empirical study

Machine Translation - Tập 27 - Trang 85-114 - 2012
Adrià de Gispert1, Graeme Blackwood2, Gonzalo Iglesias1, William Byrne1
1Machine Intelligence Laboratory, Department of Engineering, Cambridge University, Cambridge, UK
2IBM T J Watson Research, Yorktown Heights, USA

Tóm tắt

We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework.

Tài liệu tham khảo

Allauzen C, Riley M, Schalkwyk J, Skut W, Mohri M (2007) OpenFst: a general and efficient weighted finite-state transducer library. In: Proceedings of the ninth international conference on implementation and application of automata (CIAA). Springer lecture notes in computer science, Prague, pp 11–23 Barrachina S, Bender O, Casacuberta F, Civera J, Cubel E, Khadivi S, Lagarda AL, Ney H (2009) Statistical approaches to computer-assisted translation. Comput Linguist 25(1): 3–28 Bender O, Matusov E, Hahn S, Hasan S, Khadivi S, Ney H (2007) The RWTH Arabic-to-English spoken language translation system. In: Proceedings of the automatic speech understanding workshop (ASRU), Kyoto, pp 396–401 Blackwood G (2010) Lattice rescoring methods for statistical machine translation. PhD Thesis, University of Cambridge and Clare College, Cambridge Blackwood G, de Gispert A, Byrne W (2010a) Efficient path counting transducers for minimum Bayes-risk decoding of statistical machine translation lattices. In: Proceedings of the annual meeting of the Association for Computational Linguistics (ACL): short papers, Uppsala, pp 27–32 Blackwood G, de Gispert A, Byrne W (2010b) Fluency constraints for minimum Bayes-risk decoding of statistical machine translation lattices. In: Proceedings of the 23rd international conference on computational linguistics (COLING), Beijing, pp 71–79 Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: Proceedings of the 20th international conference on computational linguistics (COLING), Geneva, pp 315–321 Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. In: Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Prague, pp 858–867 Buzek O, Resnik P, Bederson BB (2010) Error driven paraphrase annotation using mechanical turk. In: Proceedings of the NAACL-HLT workshop on creating speech and language data with Amazon’s mechanical turk, Los Angeles, pp 217–221 Casacuberta F, Civera J, Cubel E, Lagarda AL, Lapalme G, Macklovitch E, Vidal E (2009) Human interaction for high quality machine translation. Commun ACM 52(10): 135–138 Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2): 201–228 Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, 2nd edn. MIT Press, Cambridge de Gispert A, Iglesias G, Blackwood G, Banga ER, Byrne W (2010) Hierarchical phrase-based translation with weighted finite-state transducers and shallow-n grammars. Computat Linguist 36(3): 505–533 DeNero J, Kumar S, Chelba C, Och F (2010) Model combination for machine translation. In: Proceedings of human language technologies: the 11th annual conference of the North American chapter of the Association for Computational Linguistics (HLT-NAACL), Los Angeles, pp 975–983 Deng Y, Byrne W (2008) HMM word and phrase alignment for statistical machine translation. IEEE Trans Audio Speech Lang Process 16(3): 494–507 González-Rubio J, Ortiz-Martínez D, Casacuberta F (2010) Balancing user effort and translation error in interactive machine translation via confidence measures. In: Proceedings of the annual meeting of the Association for Computational Linguistics (ACL): short papers, Uppsala, pp 173–177 Graff D, Kong J, Chen K, Maeda K (2007) English gigaword, 3rd edn. Linguistic Data Consortium, Linguistic Data Consortium Habash N, Rambow O (2005) Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proceedings of the 43rd annual meeting of the Association for Computational Linguistics (ACL), Ann Arbor, pp 573–580 Iglesias G, de Gispert A, Banga ER, Byrne W (2009a) Rule filtering by pattern for efficient hierarchical translation. In: Proceedings of the 12th conference of the European chapter of the Association of Computational Linguistics (EACL), Athens, pp 380–388 Iglesias G, de Gispert AR, Banga E, Byrne W (2009b) Hierarchical phrase-based translation with weighted finite state transducers. In: Proceedings of human language technologies: the 10th annual conference of the North American chapter of the Association for Computational Linguistics (HLT-NAACL), Boulder, pp 433–441 Iglesias G, Allauzen C, Byrne W, de Gispert A, Riley M (2011) Hierarchical phrase-based translation representations. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Edinburgh, pp 1373–1383 Jiang H (2005) Confidence measures for speech recognition: a survey. Speech Commun 45: 455–470 Jiang L, Huang X (1998) Vocabulary-independent word confidence measure using subword features. In: Proceedings of the 5th international conference on spoken language processing (ICSLP), vol 7, Sydney, pp 3245–3248 Kneser R, Ney H (1995) Improved backing-off for m-gram language modeling. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), vol 1, Detroit, pp 181–184 Kumar S, Byrne W (2003) A weighted finite state transducer implementation of the alignment template model for statistical machine translation. In: Proceedings of human language technologies: the annual conference of the North American chapter of the Association for Computational Linguistics (HLT-NAACL), Edmonton, pp 63–70 Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of human language technologies: the annual conference of the North American chapter of the Association for Computational Linguistics (HLT-NAACL), Boston, pp 169–176 Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10: 707–710 Mohri M (1997) Finite-state transducers in language and speech processing. In: Computational linguistics, vol 23. MIT Press, Cambridge, pp 269–311 Mohri M, Pereira F, Riley M (2008) Speech recognition with weighted finite-state transducers. In: Handbook on speech processing and speech communication. Springer, New York Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st annual meeting of the Association for Computational Linguistics, proceedings of the conference, Sapporo, pp 160–167 Och FJ, Ney H (2001) Statistical multi-source translation. In: MT summit VIII: machine translation in the information age, proceedings, Santiago de Compostela, pp 253–258 Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual meeting of the Association for Computational Linguistics, proceedings of the conference. Philadelphia, pp 295–302 Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Philadelphia, pp 311–318 Pino J, Iglesias G, de Gispert A, Blackwood G, Brunning J, Byrne W (2010) The CUED HiFST system for the WMT10 translation shared task. In: Proceedings of the joint fifth workshop on statistical machine translation and MetricsMATR, Uppsala, pp 155–160 Rahim M, Lee C-H, Juang B-H (1997) Discriminative utterance verification for connected digits recognition. IEEE Trans Speech Audio Process 5(3): 266–277 Resnik P, Buzek O, Hu C, Kronrod Y, Quinn A, Bederson BB (2010) Improving translation via targeted paraphrasing. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Cambridge, pp 127–137 Rosti A-V, Matsoukas S, Schwartz R (2007) Improved word-level system combination for machine translation. In: Proceedings of the annual meeting of the Association of Computational Linguistics (ACL), Prague, pp 312–319 Schroeder J, Cohn T, Koehn P (2009) Word lattices for multi-source translation. In: Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics (EACL), Athens, pp 719–727 Sim K-C, Byrne W, Gales M, Sahbi H, Woodland P (2007) Consensus network decoding for statistical machine translation system combination. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), vol 4, Honolulu, pp 105–108 Snover M, Dorr BJ, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th biennial conference of the Association for Machine Translation in the Americas (AMTA), Cambridge, pp 223–231 Specia L, Saunders C, Turchi M, Wang Z, Shawe-Taylor J (2009a) Improving the confidence of machine translation quality estimates. In: MT summit XII: proceedings of the twelfth machine translation summit, Ottawa, pp 136–143 Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009b) Estimating the sentence-level quality of machine translation systems. In: EAMT-2009: proceedings of the 13th annual conference of the European Association for Machine Translation, Barcelona, pp 28–35 Tromble R, Kumar S, Och F, Macherey W (2008) Lattice minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Honolulu, pp 620–629 Ueffing N, Ney H (2005) Word-level confidence estimation for machine translation using phrase-based translation models. In: Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT-EMNLP), Vancouver, pp 763–770 Ueffing N, Ney H (2007) Word-level confidence estimation for machine translation. Comput Linguists 33(1): 9–40 Ueffing N, Och FJ, Ney H (2002) Generation of word graphs in statistical machine translation. In: EMNLP-2002: proceedings of the 2002 conference on empirical methods in natural language processing, Philadelphia, pp 156–163 Wessel F, Schlüter R, Macherey K, Ney H (2001) Confidence measures for large vocabulary continuous speech recognition. IEEE Trans Speech Audio Process 9: 288–298