Using inverted files to compress text

ITI 2002. Proceedings of the 24th International Conference on Information Technology Interfaces (IEEE Cat. No.02EX534) - Trang 443-447 vol.1

S. Ristov¹

¹Rudjer Boskovic Institute, Zagreb, Croatia

Tóm tắt

This is the first report on a new approach to text compression. It consists of representing the text file with compressed inverted file index in conjunction with very compact lexicon, where lexicon includes every word in the text. The index is compressed using standard index compression techniques, and lexicon is compressed with original dictionary compression method that gives better compression results than existing procedures. Compression procedure is complex, but decompression time is linear with the file size, although it requires two passes and hence can not be performed online. First experiments show that this method, when refined, can be competitive for larger texts that only need to be decompressed in the real time.

Từ khóa

#Image coding #Data compression #Indexing #Application software #Computer applications #Time measurement #Image reconstruction #Multidimensional systems #Information technology

Tài liệu tham khảo

10.1109/DCC.2002.1000013 ristov, 1999, Ziv lempel compression of huge natural language data tries using suffix arrays, Proceedings of the 10th Annual Symposium Combinatorial Pattern Matching CPM'99, 196, 10.1007/3-540-48452-3_15 bell, 1990, Text Compression witten, 1999, Managing Gigabytes

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA