Internet evolution and progress in full automatic French language modelling

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. - Trang 363-366

D. Vaufreydaz¹, M. Gery¹

¹Laboratoire CLIPS-IMAG, équipe GEOD et MRIM, Grenoble, France

Tóm tắt

The World Wide Web is the greatest information space ever seen, distributed all over the world, in many languages, on many various topics. We first describe the evolution of a French subset of this space during the last 3 years. During this time, the size of automatically extracted text for language modelling has multiplied by 6.5. Moreover, French coverage has grown from 140,000 to 200,000 lexical forms. So, we show that we can get more and more reliable data to train our trigram models. Recognition experiments, made on a French "state of the art" evaluation set, show that word accuracy increased from 51% up to 62.30% using two different models automatically computed on Web corpora. The first corpus was gathered at the beginning of 1999 and the last one at the end of 2000.

Từ khóa

#Internet #Natural languages #Speech recognition #Web server #Robots #HTML #Web sites #Data mining #Crawlers #Stochastic processes

Tài liệu tham khảo

pérennou, 1987, BDLEX lexical data and knowledge base of spoken and written French European conference on Speech Technology, 393 0, see the LIMSI web site about the GRACE action a French evaluation of text parsers dolmazon, 1997, Organisation de la première campagne Aupelf pour l'évaluation des systèmes de dictée vocale 1st jst Aupelf-Uref Avignon 0 nie, 1999, Cross-Language Information Retrieval Based on Parallel Texts and Automatic Mining of Parallel Texts from the Web 22ndAnnual International ACM SIGIR, 74 akbar, 1998, Parole et traduction automatique le module de reconnaissance RAPHAEL COLLING-ACL '98, 36 koster, 1996, A Method for Web Robots Control technical report of IETF vaufreydaz, 0, Internet documents A rich source for spoken language modeling ASRU'99, 277

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]