Khảo sát các kỹ thuật nhận diện từ khóa cho hình ảnh tài liệu in

Artificial Intelligence Review - Tập 35 - Trang 119-136 - 2010
Abirami Murugappan1, Baskaran Ramachandran1, P. Dhavachelvan1
1Anna University, Guindy, Chennai, India

Tóm tắt

Bài báo này cố gắng cung cấp một cuộc khảo sát về các nghiên cứu trong quá khứ liên quan đến các phương pháp dựa trên ký tự và dựa trên từ khóa được sử dụng để truy xuất thông tin từ hình ảnh tài liệu. Cuộc khảo sát này cũng cung cấp cái nhìn sâu sắc về những điểm mạnh và điểm yếu của các kỹ thuật hiện tại, mối liên hệ giữa các kỹ thuật cũng như hướng dẫn trong việc lựa chọn lĩnh vực mà công việc tương lai về truy xuất hình ảnh tài liệu có thể tập trung vào.

Từ khóa

#nhận diện từ khóa #hình ảnh tài liệu #truy xuất thông tin #phương pháp dựa trên ký tự #phương pháp dựa trên từ khóa.

Tài liệu tham khảo

Abirami S, Manjula D (2009) Feature string based intelligent information retrieval from tamil document images. Int J Comput Appl Technol Special Issue on ‘Computer Applications in Knowledge Based Systems’, Vol. 35, No. 2/3/4. Inderscience Publishers, pp 150–164 Aparna KH, Chakravarthi VS (2003) A complete OCR system development of OCR Tamil magazine documents. Tamil Internet Balasubramanian A, Meshesha M, Jawahar CV (2006) Retrieval from document image collections. In: Proceedings of the international workshop on document analysis systems, LNCS 3872: 1–12 Balasubramanian A, Jawahar CV (2006) Textual search in graphics stream of PDF. International conference on digital libraries, pp 1–10 Chaudhury S, Sethi G, Vyas A, Harit G (2003) Devising interactive access techniques for indian language document images. In: Proceedings of the seventh international conference on document analysis and recognition, pp 885–889 Chen FR, Wilcox LD, Bloomberg DS (1993) Detecting and locating partially specified keywords in scanned images using hidden markov models. In: Proceedings of the international conference on document analysis and recognition, pp 133–138 Chen FR, Wilcox LD, Bloomberg DS (1995) A comparison of discrete and continuous hidden markov models for phrase spotting in text images. In: Proceedings of the international conference on document analysis and recognition, pp 398–402 Chen FR, Bloomberg DS (1996) Extraction of thematically relevant text from images. Symposium on document analysis and information retrieval, pp 163–178 Doermann D (1998) Indexing and retrieval of document images: a survey. J Comput Vis Image Underst 70(3): 287–298 Harit G, Chaudhury S, Ghosh H (2004) Managing document images in a digital library: an ontology guided approach. In: Proceedings of the first international workshop on document image analysis for libraries, pp 64–92 Harit G, Chaudhury S, Gupta P, Vohra N, Joshi SD (2001) Model guided document image analysis system. In: Proceedings of the sixth international conference on document analysis and recognition, pp 1137–1141 Harit G, Chaudhury S, Paranjpe J (2005) Ontology guided access to document images. In: Proceedings of the eighth international conference on document analysis and recognition, pp 292–296 Harit G, Garg R, Chaudhury S (2007) An integrated scheme for compression and interactive access to document images. In: Proceedings of the international conference on computing: theory and applications, pp 506–511 Harit G, Jain R, Chaudhury S (2005) Improved geometric feature graph: a script independent representation of word images for compression and retrieval. In: Proceedings of the eighth international conference on document analysis and recognition, pp 421–425 Jawahar CV, Meshesha M, Balasubramanian A (2004) Searching in document images. In: Proceedings of the international conference on visualization, graphics and image processing, pp 622–627 Jawahar CV, Million M, Balasubramanian A (2004) Word level access to document image datasets. In: Proceedings of the workshop on computer vision, graphics and image processing, pp 73–76 Kameshiro T, Hirano T, Okada Y, Yoda F (1999) A document image retrieval method tolerating recognition and segmentation errors of OCR using shape feature and multiple candidates. In: Proceedings of the fifth international conference on document analysis and recognition, pp 681–684 Kasthuri R, Gormann LO, Govindaraju V (2002) Document image aanlysis: a primer. Sadhana 27(Part. 1): 3–22 Katsuyama K (2002) Highly accurate retrieval of Japanese document images through a combination of morphological analysis and OCR. In: Proceedings of the document recognition and retrieval 4670: 57–67 Krishnamoorthy V (2002) OCR software for Tamil Printed Text. Tamil Internet, pp 99–102 Lu S, Linlin L, Tan CL (2008) Document Image Retrieval through Word Shape Coding. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(11): 1913–1918 Lu S, Tan CL (2007) Keyword Spotting and Retrieval of Document Images captured by a Digital Camera. In: Proceedings of the ninth international conference on document analysis and recognition, pp 994–998 Lu Y, Tan CL, Huang W, Fan L (2001a) An approach to word image matching based on weighted Hausdorff distance. In: Proceedings of the international conference on document analysis and recognition, pp 921–925 Lu Y, Tan CL, Fan L, Huang W (2001b) Similarity measure for CCITT group 4 compressed document images. In: Proceedings of the international conference on image processing, pp 1118–1121 Lu Y, Tan CL (2002a) ‘Word Searching in Document Images Using Word Portion Matching’. Document Analysis Systems V, Lecture Notes on Computer science 2423: 319–328 Lu Y, Tan CL (2002b) Word spotting in Chinese document images without layout analysis. In: Proceedings of the international conference on pattern recognition, pp 57–60 Lu Y, Tan CL (2003) Word searching in CCITT group 4 compressed document images. International conference on document analysis and recognition, pp 467–471 Lu Y, Tan CL (2004) Information Retrieval in Document Image Databases. IEEE Transactions on Knowledge and Data Engineering 16(11): 1398–1410 Lu Y, Tan CL (2004) Chinese Word searching in Imaged documents. International Journal of Pattern Recognition and Artificial Intelligence 18(2): 229–246 Lu Y, Zhang L, Tan CL (2004a) Retrieved Imaged documents in digital libraries based on Word Imaged Coding. In: Proceedings of the first international workshop on document image analysis for libraries, pp 174–187 Lu Y, Zhang L, Tan CL (2004b) A Search engine for Imaged documents in PDF files. In; Proceedings of the special interest group on information retrieval, pp 536–537 Nagy G, Seth S (1984) Hierarchical representation of optically scanned documents. In: Proceedings of the international conference on pattern recognition, pp 347–34 Ohtam M, Takasu A, Adachi J (1997) Retrieval Methods for English Text with Misrecognized OCR characters. In: Proceedings of the fourth international conference on document analysis and recognition, pp 950–956 Pramod Shankar K, Jawahar CV (2006) Enabling Search over Large Collections of Telugu Document Images- An automatic Annotation based approach. LNCS 4338: 837–848 Rath T, Manmatha R (2003) Features for word spotting in historical manuscripts. International conference on document analysis and recognition, pp 218–222 Seethalakshmi R, SreeRanjani TR, Balachandar T, Abnikant Singh, Markandey S, Ritwaj R, Sarvesh K (2005) Optical Character Recognition for printed Tamil text using Unicode. Journal of Zhejiang University Science 6(11): 1297–1305 Smeaton AF, Spitz AL (1997) Using Character shape codes for information retrieval. In: Proceedings of the international conference on document analysis and recognition, pp 974–978 Spitz AL (1993) Generalized line, word and character finding. In: Proceedings of the progress in image analysis and processing, pp 377–383 Spitz AL (1995) Using character shape codes for word spotting in document images. In: Proceedings of the symposium on document analysis and information retrieval, pp 382–389 Spitz AL (1997) Determination of script, language content of document images. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(3): 235–245 Subramanian A, Kuberan B (2000) Optical Character Recognition of Printed Tamil characters. In: Proceedings of the tamil internet conference Tan CL, Sung SY, Yu Z, Xu Y (2000) Text retrieval from document images based on n-gram algorithm. In: Proceedings of the sixth pacific rim international conference on artificial intelligence, pp 1–12 Tan CL, Huang W, Yu Z, Xu Y (2002) Imaged Document Text retrieval without OCR. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(6): 838–844 Tan CL, Huang W, Sung SY, Yu Z, Xu X (2003) Text retrieval from document images based on word shape analysis. Journal of Applied Intelligence, Special issue on Text and Web Mining 18(3): 257–270 Tanaka Y, Torii H (1988) Transmedia machine and its keyword search over image texts. In: Proceedings of the research information assistee par ordinateur, pp 248–258 Zhang L, Lu Y, Tan CL (2004) A web based system for retrieving document images from digital library. In: Proceedings of the conference on computer vision and pattern recognition workshop, pp 27–35