Bilingual LSA-based adaptation for statistical machine translation

Machine Translation - Tập 21 Số 4 - Trang 187-207 - 2007
Yik-Cheung Tam1, Ian Lane1, Tanja Schultz1
1Carnegie Mellon University, Pittsburgh, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Bellegarda JR (2000) Large vocabulary speech recognition with multispan statistical language models. IEEE Trans Speech Audio Process 8: 76–84

Blei D, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3: 1107–1135

Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1994) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19: 263–311

Darroch JN, Ratcliff D (1972) Generalized iterative scaling for log-linear models. Ann Math Stat 43: 1470–1480

Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41: 391–407

Doddington G (2002) Automatic evaluation of MT quality using n-gram co-occurrence statistics. In: Proceedings of human language technology conference 2002, San Diego, CA, pp 138–145

Griffiths TL, Steyvers M, Blei DM, Tenenbaum JB (2004) Integrating topics and syntax. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17, Proceedings of the 2004 conference. MIT Press, Cambridge MA, pp 537–544

Hofmann T (1999) Probabilistic latent semantic indexing. In: UAI ’99, proceedings of the fifteenth conference on uncertainty in artificial intelligence, Stockholm, Sweden, pp 289–296

Hsu B-J(P), Glass J (2006) Style & topic language model adaptation using HMM-LDA. In: EMNLP 2006, 2006 conference on empirical methods in natural language processing, Sydney, Australia, pp 373–381

Iyer R, Ostendorf M (1996) Modeling long distance dependence in language: topic mixtures vs. dynamic cache models. In: ICSLP 96, fourth international conference on spoken language processing, Philadelphia, PA, pp 236–239

Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York

Kim W, Khudanpur S (2003) LM adaptation using cross-lingual information. In: 8th European conference on speech communication and technology (Eurospeech 2003 – Interspeech 2003), Geneva, Switzerland, pp 3129–3132

Kim W, Khudanpur S (2004) Cross-lingual latent semantic analysis for LM. In: 2004 IEEE international conference on acoustics, speech, and signal processing, vol 1. Montreal, Quebec, Canada, pp 257–260

Kneser R, Peters J, Klakow D (1997) Language model adaptation using dynamic marginals. In: Proceedings of Eurospeech ’97, 5th European conference on speech communication and technology, Rhodes, Greece, pp 1971–1974

Mrva D, Woodland PC (2006) Unsupervised language model adaptation for Mandarin broadcast conversation transcription. In: Interspeech 2006 – ICSLP, ninth international conference on spoken language processing, Pittsburgh, Pennsylvania, paper 1549-Thu1A2O.3

Och FJ (2003) Minimum error rate training in statistical machine translation. In: ACL-03, 41st annual meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 160–167

Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association of Computational Linguistics, Philadelphia, Pennsylvania, pp 311–318

Paulik M, Fügen C, Schaaf T, Schultz T, Stüker S, Waibel A (2005) Document driven machine translation enhanced automatic speech recognition. In: Proceedings of Interspeech’2005 – Eurospeech, 9th European conference on speech communication and technology, Lisbon, Portugal, pp 2261–2264

Rottmann K, Vogel S (2007) Word reordering in statistical machine translation with a POS-based distortion model. In: TMI 2007, proceedings of the 11th international conference on theoretical and methodological issues in machine translation, Skövde, pp 171–180

Stolcke A (2002) SRILM – an extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing ICSLP/Interspeech, Denver, Colorado, pp 901–904

Tam YC, Schultz T (2005) Language model adaptation using variational Bayes inference. In: Proceedings of Interspeech’2005 – Eurospeech, 9th European conference on speech communication and technology, Lisbon, Portugal, pp 5–8

Tam YC, Schultz T (2006) Unsupervised language model adaptation using latent semantic marginals. In: Interspeech 2006 – ICSLP, ninth international conference on spoken language processing, Pittsburgh, Pennsylvania, paper 1705-Thu1A2O.2

Tam YC, Schultz T (2007) Correlated latent semantic model for unsupervised language model adaptation. In: Proceedings of ICASSP 2007, international conference on acoustics, speech, and signal processing, vol IV. Honolulu, Hawaii, pp 41–44

Tseng H, Chang P, Andrew G, Jurafsky D, Manning C (2005) A conditional random field word segmenter. In: IJCNLP-05, fourth SIGHAN workshop on Chinese language processing, Jeju Island, Korea, pp 168–171

Vogel S, Zhang Y, Huang F, Tribble A, Venugopal A, Zhao B, Waibel A (2003) The CMU statistical translation system. In: MT summit IX, proceedings of the ninth machine translation summit, New Orleans, pp 402–409

Zhang Y, Vogel S (2004) Measuring confidence intervals for the machine translation evaluation metrics. In: Proceedings of the tenth conference on theoretical and methodological issues in machine translation TMI-04, Baltimore, Maryland, pp 85–94

Zhao B, Xing EP (2006) BiTAM: Bilingual topic admixture models for word alignment. In: Coling · ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, proceedings of the main conference poster sessions, Sydney, Australia, pp 969–976

Zhao B, Xing EP (2007) HM-BiTAM: Bilingual topic exploration, word alignment, and translation. In: Twenty-second annual conference on neural information processing systems, Vancouver BC, Canada