N-gram posterior probability confidence measures for statistical machine translation: an empirical study
Tóm tắt
We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework.
Tài liệu tham khảo
Allauzen C, Riley M, Schalkwyk J, Skut W, Mohri M (2007) OpenFst: a general and efficient weighted finite-state transducer library. In: Proceedings of the ninth international conference on implementation and application of automata (CIAA). Springer lecture notes in computer science, Prague, pp 11–23
Barrachina S, Bender O, Casacuberta F, Civera J, Cubel E, Khadivi S, Lagarda AL, Ney H (2009) Statistical approaches to computer-assisted translation. Comput Linguist 25(1): 3–28
Bender O, Matusov E, Hahn S, Hasan S, Khadivi S, Ney H (2007) The RWTH Arabic-to-English spoken language translation system. In: Proceedings of the automatic speech understanding workshop (ASRU), Kyoto, pp 396–401
Blackwood G (2010) Lattice rescoring methods for statistical machine translation. PhD Thesis, University of Cambridge and Clare College, Cambridge
Blackwood G, de Gispert A, Byrne W (2010a) Efficient path counting transducers for minimum Bayes-risk decoding of statistical machine translation lattices. In: Proceedings of the annual meeting of the Association for Computational Linguistics (ACL): short papers, Uppsala, pp 27–32
Blackwood G, de Gispert A, Byrne W (2010b) Fluency constraints for minimum Bayes-risk decoding of statistical machine translation lattices. In: Proceedings of the 23rd international conference on computational linguistics (COLING), Beijing, pp 71–79
Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: Proceedings of the 20th international conference on computational linguistics (COLING), Geneva, pp 315–321
Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. In: Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Prague, pp 858–867
Buzek O, Resnik P, Bederson BB (2010) Error driven paraphrase annotation using mechanical turk. In: Proceedings of the NAACL-HLT workshop on creating speech and language data with Amazon’s mechanical turk, Los Angeles, pp 217–221
Casacuberta F, Civera J, Cubel E, Lagarda AL, Lapalme G, Macklovitch E, Vidal E (2009) Human interaction for high quality machine translation. Commun ACM 52(10): 135–138
Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2): 201–228
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, 2nd edn. MIT Press, Cambridge
de Gispert A, Iglesias G, Blackwood G, Banga ER, Byrne W (2010) Hierarchical phrase-based translation with weighted finite-state transducers and shallow-n grammars. Computat Linguist 36(3): 505–533
DeNero J, Kumar S, Chelba C, Och F (2010) Model combination for machine translation. In: Proceedings of human language technologies: the 11th annual conference of the North American chapter of the Association for Computational Linguistics (HLT-NAACL), Los Angeles, pp 975–983
Deng Y, Byrne W (2008) HMM word and phrase alignment for statistical machine translation. IEEE Trans Audio Speech Lang Process 16(3): 494–507
González-Rubio J, Ortiz-Martínez D, Casacuberta F (2010) Balancing user effort and translation error in interactive machine translation via confidence measures. In: Proceedings of the annual meeting of the Association for Computational Linguistics (ACL): short papers, Uppsala, pp 173–177
Graff D, Kong J, Chen K, Maeda K (2007) English gigaword, 3rd edn. Linguistic Data Consortium, Linguistic Data Consortium
Habash N, Rambow O (2005) Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proceedings of the 43rd annual meeting of the Association for Computational Linguistics (ACL), Ann Arbor, pp 573–580
Iglesias G, de Gispert A, Banga ER, Byrne W (2009a) Rule filtering by pattern for efficient hierarchical translation. In: Proceedings of the 12th conference of the European chapter of the Association of Computational Linguistics (EACL), Athens, pp 380–388
Iglesias G, de Gispert AR, Banga E, Byrne W (2009b) Hierarchical phrase-based translation with weighted finite state transducers. In: Proceedings of human language technologies: the 10th annual conference of the North American chapter of the Association for Computational Linguistics (HLT-NAACL), Boulder, pp 433–441
Iglesias G, Allauzen C, Byrne W, de Gispert A, Riley M (2011) Hierarchical phrase-based translation representations. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Edinburgh, pp 1373–1383
Jiang H (2005) Confidence measures for speech recognition: a survey. Speech Commun 45: 455–470
Jiang L, Huang X (1998) Vocabulary-independent word confidence measure using subword features. In: Proceedings of the 5th international conference on spoken language processing (ICSLP), vol 7, Sydney, pp 3245–3248
Kneser R, Ney H (1995) Improved backing-off for m-gram language modeling. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), vol 1, Detroit, pp 181–184
Kumar S, Byrne W (2003) A weighted finite state transducer implementation of the alignment template model for statistical machine translation. In: Proceedings of human language technologies: the annual conference of the North American chapter of the Association for Computational Linguistics (HLT-NAACL), Edmonton, pp 63–70
Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of human language technologies: the annual conference of the North American chapter of the Association for Computational Linguistics (HLT-NAACL), Boston, pp 169–176
Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10: 707–710
Mohri M (1997) Finite-state transducers in language and speech processing. In: Computational linguistics, vol 23. MIT Press, Cambridge, pp 269–311
Mohri M, Pereira F, Riley M (2008) Speech recognition with weighted finite-state transducers. In: Handbook on speech processing and speech communication. Springer, New York
Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st annual meeting of the Association for Computational Linguistics, proceedings of the conference, Sapporo, pp 160–167
Och FJ, Ney H (2001) Statistical multi-source translation. In: MT summit VIII: machine translation in the information age, proceedings, Santiago de Compostela, pp 253–258
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual meeting of the Association for Computational Linguistics, proceedings of the conference. Philadelphia, pp 295–302
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Philadelphia, pp 311–318
Pino J, Iglesias G, de Gispert A, Blackwood G, Brunning J, Byrne W (2010) The CUED HiFST system for the WMT10 translation shared task. In: Proceedings of the joint fifth workshop on statistical machine translation and MetricsMATR, Uppsala, pp 155–160
Rahim M, Lee C-H, Juang B-H (1997) Discriminative utterance verification for connected digits recognition. IEEE Trans Speech Audio Process 5(3): 266–277
Resnik P, Buzek O, Hu C, Kronrod Y, Quinn A, Bederson BB (2010) Improving translation via targeted paraphrasing. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Cambridge, pp 127–137
Rosti A-V, Matsoukas S, Schwartz R (2007) Improved word-level system combination for machine translation. In: Proceedings of the annual meeting of the Association of Computational Linguistics (ACL), Prague, pp 312–319
Schroeder J, Cohn T, Koehn P (2009) Word lattices for multi-source translation. In: Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics (EACL), Athens, pp 719–727
Sim K-C, Byrne W, Gales M, Sahbi H, Woodland P (2007) Consensus network decoding for statistical machine translation system combination. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), vol 4, Honolulu, pp 105–108
Snover M, Dorr BJ, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th biennial conference of the Association for Machine Translation in the Americas (AMTA), Cambridge, pp 223–231
Specia L, Saunders C, Turchi M, Wang Z, Shawe-Taylor J (2009a) Improving the confidence of machine translation quality estimates. In: MT summit XII: proceedings of the twelfth machine translation summit, Ottawa, pp 136–143
Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009b) Estimating the sentence-level quality of machine translation systems. In: EAMT-2009: proceedings of the 13th annual conference of the European Association for Machine Translation, Barcelona, pp 28–35
Tromble R, Kumar S, Och F, Macherey W (2008) Lattice minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Honolulu, pp 620–629
Ueffing N, Ney H (2005) Word-level confidence estimation for machine translation using phrase-based translation models. In: Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT-EMNLP), Vancouver, pp 763–770
Ueffing N, Ney H (2007) Word-level confidence estimation for machine translation. Comput Linguists 33(1): 9–40
Ueffing N, Och FJ, Ney H (2002) Generation of word graphs in statistical machine translation. In: EMNLP-2002: proceedings of the 2002 conference on empirical methods in natural language processing, Philadelphia, pp 156–163
Wessel F, Schlüter R, Macherey K, Ney H (2001) Confidence measures for large vocabulary continuous speech recognition. IEEE Trans Speech Audio Process 9: 288–298