Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. - Trang 311-314

H.M. Meng¹, Wai-Kit Lo¹, Berlin Chen, K. Tang²

¹Chinese University of Hong Kong, Hong Kong, China

²Princeton University, USA

Tóm tắt

We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllables by pronunciation dictionary lookup. Mandarin radio news broadcasts form spoken documents that are indexed by word and syllable recognition. The information retrieval engine performs matching in both word and syllable scales. The English queries contain many named entities that tend to be out-of-vocabulary words for machine translation and speech recognition, and are omitted in retrieval. Names are often transliterated across languages and are generally important for retrieval. We present a technique that takes in a name spelling and automatically generates a phonetic cognate in terms of Chinese syllables to be used in retrieval. Experiments show consistent retrieval performance improvement by including the use of named entities in this way.

Từ khóa

#Information retrieval #Natural languages #Speech recognition #Radio broadcasting #Dictionaries #Indexing #Engines #Digital multimedia broadcasting #Audio recording #Broadcast technology

Tài liệu tham khảo

mohri, 1998, A Rational Design for a Weighted Finite-State Transducer Library, Lecture Notes in Computer Science 1436, 10.1007/BFb0031388 knight, 1997, Machine Transliteration, Conf Association for Computational Linguistics (ACL) 10.3115/1621753.1621760 brill, 0, 1995 Transformation-based Error-driven Learning and Natural Language Processing A Case Study in Part of Speech Tagging, 21, 1 10.3115/974557.974586

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA