A hybrid machine translation architecture guided by syntax

Gorka Labaka1, Cristina España-Bonet2, Lluı́s Màrquez3, Kepa Sarasola1
1IXA Research Group, Department of Computer Languages and Systems, University of the Basque Country (UPV/EHU), Donostia, Spain
2TALP Research Center, Department of Computer Science, Technical University of Catalonia – Barcelona Tech, Barcelona, Spain
3Qatar Computing Research Institute, Qatar Foundation, Doha, Qatar

Tóm tắt

Từ khóa


Tài liệu tham khảo

Aduriz I, Aldezabal I, Alegria I, Artola X, Ezeiza N, Urizar R (1996) EUSLEM: a Lemmatiser / Tagger for Basque. In: Proceedings of the 7th Conference of the European Association for Lexicography (EURALEX’96). Gothenburg, Sweden, pp 17–26

Aduriz I, Aranzabe MJ, Arriola JM, de Ilarraza AD, Gojenola K, Oronoz M, Uria L (2004) A cascaded syntactic analyser for Basque. In: Computational Linguistics and Intelligent Text Processing. Springer, Berlin, pp 124–134

Alegria I, Díaz de Ilarraza A, Labaka G, Lersundi M, Mayor A, Sarasola K (2006) An FST Grammar for verb chain transfer in a Spanish-Basque MT system. In: Yli-Jyrä A, Karttunen L, Karhumäki J (eds) Proceedings of the 5th International Workshop on Finite-State Methods and Natural Language Processing (FSMNLP 2005, Helsinki, Finland), vol 4002. Lecture Notes in Computer ScienceSpringer, Berlin, pp 87–98

Alegria, I., Díaz de Ilarraza A, Labaka G, Lersundi M, Mayor A, Sarasola K (2007) Transfer-Based MT from Spanish into Basque: reusability, standardization and open source. Lecture Notes in Computer Science 4394:374–384. Springer, Berlin

Banerjee S, Lavie A (2005) METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. Michigan, Ann Arbor, pp 65–72

Bojar O, Buck C, Callison-Burch C, Federmann C, Haddow B, Koehn P, Monz C, Post M, Soricut R, Specia L (2013) Findings of the 2013 workshop on statistical machine translation. In: Proceedings of the eighth workshop on statistical machine translation. Sofia, pp 1–44

Brown PF, Della Pietra SA, Della Pietra VJ (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311

Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: 11th Conference of the European Chapter of the Association for Computational Linguistics. Trento, pp 249–256

Carreras, X., Chao, I., Padró, L., Padró, M. (2004) Freeling: an Open-Source Suite of Language Analyzers. In: Proceedings of the 4th international conference on Language Resources and Evaluation (LREC), Lisbon, pp. 239–242

Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 4(13):359–393

Chen Y, Eisele A (2010) Hierarchical Hybrid Translation between English and German. In: Hansen V, Yvon F (eds) Proceedings of the 14th annual conference of the European Association for Machine Translation (EAMT 2010), Saint-Raphaël, pp 90–97

Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT ’12. Montréal, pp 427–436

Costa-Jussà MR, Farrús M, Mariño JB, Fonollosa JAR (2012) Study and comparison of rule-based and statistical Catalan–Spanish machine translation systems. Computi Inform 31(2):245–270

Doddington G (2002) Automatic evaluation of machine translation quality using N-gram co-occurrence statistics. In: Proceedings of the 2nd internation conference on Human Language Technology (HLT). San Diego, CA, pp 138–145

Dologlou Y, Markantonatou S, Tambouratzis G, Yannoutsou O, Fourla A, Iannou N (2003) Using monolingual corpora for statistical machine translation: the METIS system controlled language translation. In: EAMT-CLAW-03, The joint conference of the 8th international workshop of the european association for machine translation and the 4th controlled language applications workshop. Dublin, pp 61–68

Dove C, Loskutova O, de la Fuente R (2012) What’s your pick: RbMT, SMT or Hybrid? In: Proceedings of the tenth conference of the Association for Machine Translation in the Americas (AMTA 2012). San Diego, CA

Du J, He Y, Penkale S, Way A (2009) MaTrEx: the DCU MT system for WMT 2009. In: Proceedings of the fourth workshop on statistical machine translation, (EACL 2009), Greece, pp 95–99

Eisele A, Federmann C, Saint-Amand H, Jellinghaus M, Herrmann T, Chen Y (2008) Using moses to integrate multiple rule-based machine ttranslation engines into a hybrid system. In: Proceedings of the third workshop on statistical machine translation, Columbus, OH, pp 179–182

Enache R, España-Bonet C, Ranta A, Màrquez L (2012) A hybrid system for patent translation. In: Proceedings of the 16th annual conference of the European Association for Machine Translation (EAMT12). Trento, pp 269–276

España-Bonet C, Labaka G, Díaz de Ilarraza A, Màrquez L, Sarasola K (2011) Hybrid machine translation guided by a rule-based system. In: MT Summit XIII: the thirteenth machine translation summit, Xiamen, pp 554–561

Federmann C (2011) Results from the ML4HMT shared task on applying machine learning techniques to optimise the division of labour in hybrid machine translation. In: Proceedings of the international workshop on using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) and of the shared task on applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation (ML4HMT-11), Barcelona, pp 110–117

Federmann C (2012) Hybrid machine translation using joint, binarised feature vectors. In: Proceedings of the 20th conference of the Association for Machine Translation in the Americas (AMTA 2012). San Diego, CA, pp 113–118

Federmann C, Chen Y, Hunsicker S, Wang R (2011) DFKI System combination using syntactic information at ML4HMT-2011. In: Proceedings of the international workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) and of the shared task on applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation (ML4HMT-11), Barcelona, pp. 104–109

Federmann C, Eisele A, Chen Y, Hunsicker S, Xu J, Uszkoreit H (2010) Further experiments with shallow hybrid MT systems. In: Proceedings of the joint fifth workshop on statistical Machine Translation and Metrics, MATR, Uppsala, pp 77–81

Federmann C, Hunsicker S (2011) Stochastic Parse tree selection for an existing RBMT system. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, pp 351–357

Federmann C, Melero M, Pecina P, van Genabith J (2012) Towards optimal choice selection for improved hybrid machine translation. Prague Bull Math Linguist 97:5–22

Gieselmann P (2008) Architecture of the Lucy translation system. In: Second machine translation marathon, Wandlitz, Berlin, 28 slides

Giménez J, Màrquez L (2007) Linguistic features for automatic evaluation of heterogenous MT systems. In: Proceedings of the second workshop on statistical machine translation, Prague, pp 256–264

Giménez J, Màrquez L (2008) A smorgasbord of features for automatic MT evaluation. In: Proceedings of the third workshop on statistical machine translation, Columbus, OH, pp 195–198

Giménez J, Màrquez L (2010) Asiya: an open toolkit for automatic machine translation (meta-)evaluation. Prague Bull Math Linguist 94:77–86

Groves D, Way A (2005) Hybrid example-based SMT: the best of both worlds? In: Proceedings of ACL 2005 workshop on building and using parallel texts: data-driven machine translation and beyond, Ann Arbor, MI, pp. 183–190

Habash N, Dorr B, Monz C (2009) Symbolic-to-statistical hybridization: extending generation-heavy machine translation. Mach Transl 23(1):23–63

Heafield K, Lavie A (2010) Voting on N-grams for machine translation system combination. In: Proceedings of the ninth conference of the Association for Machine Translation in the Americas (AMTA 2010), Denver, CO

Hearne M, Way A (2006) Disambiguation strategies for data-oriented translation. In: Proceedings of the 11th conference of the European association for machine translation, Oslo, pp 59–68

Hunsicker S, Chen Y, Federmann C (2012) Machine learning for hybrid machine translation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, pp 312–316

Koehn P (2004) Statistical significance tests for machine translation evaluation. In: EMNLP-2004: Proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona

Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume, proceedings of the demo and poster sessions, Prague, pp 177–180

Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: HLT-NAACL 2004: human language technology conference and North American chapter of the association for computational linguistics annual meeting, Boston, pp 169–176

Labaka G (2010) EUSMT: Incorporating Linguistic Information to SMT for a morphologically rich language. Its use in SMT-RBMT-EBMT hybridization. Ph.D. thesis, University of the Basque Country, Donostia

Labaka G, Sarasola K, Stroppa N, Way A (2007) Comparing rule-based and data-driven approaches to Spanish-to-Basque machine translation. In: MT summit XI, Copenhagen, pp 297–304

Li Z, Callison-Burch C Dyer C, Ganitkevitch J, Khudanpur S, Schwartz L, Thornton W, Weese J, Zaidan O (2009) Joshua: open source toolkit for parsing-based machine translation. In: Third machine translation marathon, Prague

Lin CY, Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statics. In: Proceedings of the 42nd annual meeting of the Association for Computational Linguistics (ACL’04), Main Volume, Barcelona, pp 605–612

Matusov E, Ueffing N, Ney H (2006) Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment. In: 11th conference of the European chapter of the Association for Computational Linguistics, (EACL 2006), Trento, pp 33–40

Mayor A, Alegria I, Labaka G, Lersundi M, Sarasola K (2011) Matxin, an open-source rule-based machine translation system for Basque. Mach Transl 25(1):53–82

Melamed ID, Green R, Turian JP (2003) Precision and recall of machine translation. In: Proceedings of the Joint Conference on Human Language Technology and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL). Edmonton, pp 61–63

Nießen S, Och FJ, Leusch G, Ney H (2000) An evaluation tool for machine translation: fast evaluation for MT research. In: Proceedings of the 2nd international conference on language resources and evaluation, Athens, pp 39–45

Nivre J, Hall J, Nilsson J, Chanev A, Eryigit G, Kübler S, Marinov S, Marsi E (2007) Maltparser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13(2):95–135

Och FJ (2003) Minimum error rate training in statistical machine translation. In: ACL-2003: 41st Annual meeting of the Association for Computational Linguistics, Sapporo, pp 160–167

Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: ACL-2002: 40th Annual meeting of the Association for Computational Linguistics, Philadelphia, pp 295–302

Oflazer K, El-Kahlout ID (2007) Exploring different representation units in English-to-Turkish statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, pp 25–32

Okita T, Rubino R, van Genabith J (2012) Sentence-level quality estimation for MT system combination. In: Proceedings of the Second workshop on applying machine learning techniques to optimise the division of labour in hybrid MT, COLING’12, Mumbai, pp 55–64

Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: A method for automatic evaluation of machine translation. In: ACL-2002: 40th Annual meeting of the Association for Computational Linguistics, Philadelphia, pp. 311–318

Sánchez-Cartagena VM, Sánchez-Martínez F, Prez-Ortiz JA (2011) Integrating shallow-transfer rules into phrase-based statistical machine translation. In: Proceedings of the XIII machine translation summit, Xiamen, pp 562–569

Sánchez-Martínez F, Forcada ML (2009) Inferring shallow-transfer machine translation rules from small parallel corpora. J Artific Intell Res 34:605–635

Sánchez-Martínez F, Forcada ML, Way A (2009) Hybrid rule-based example-based MT: feeding apertium with sub-sentential translation units. In: Forcada ML, Way A (eds) Proceedings of the 3rd workshop on example-based machine translation. Dublin, pp 11–18

Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas. Visions for the future of machine translation, Cambridge, MA, pp 223–231

Stolcke A (2002) SRILM - An extensible language modeling toolkit. In: Proceedings of the seventh International Conference of Spoken Language Processing (ICSLP 2002), Denver, CO, pp 901–904

Thurmair G (2009) Comparing different architectures of hybrid machine translation systems. In: MT Summit XII: Proceedings of the twelfth machine translation summit, Ottawa, ON, pp 340–347

Tillmann C, Vogel S, Ney H, Zubiaga A, Sawaf H (1997) Accelerated DP based search for statistical translation. In: Proceedings of the fifth european conference on speech communication and technology, Rhodes, pp 2667–2670

Tyers FM, Sánchez-Martínez F, Forcada ML (2012) Flexible finite-state lexical selection for rule-based machine translation. Proceedings of the 16th annual conference of the European association for machine translation, Trento, pp 213–220

Tyers FM, Sánchez-Martínez F, Ortiz-Rojas S, Forcada ML (2010) Free/open-source resources in the Apertium platform for machine translation research and development. Prague Bull Math Linguist 93:67–76

Way A (2010) Machine translation. In: Clark A, Fox C, Lappin S (eds) The handbook of computational linguistics and natural language processing. Wiley Blackwell, Chichester, pp 531–573

Xu J, Uszkoreit H, Kennington C, Vilar D, Zhang X (2011) DFKI hybrid machine translation system for WMT 2011: on the Integration of SMT and RBMT. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, pp 485–489