Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
Khảo sát các kỹ thuật nhận diện từ khóa cho hình ảnh tài liệu in
Tóm tắt
Bài báo này cố gắng cung cấp một cuộc khảo sát về các nghiên cứu trong quá khứ liên quan đến các phương pháp dựa trên ký tự và dựa trên từ khóa được sử dụng để truy xuất thông tin từ hình ảnh tài liệu. Cuộc khảo sát này cũng cung cấp cái nhìn sâu sắc về những điểm mạnh và điểm yếu của các kỹ thuật hiện tại, mối liên hệ giữa các kỹ thuật cũng như hướng dẫn trong việc lựa chọn lĩnh vực mà công việc tương lai về truy xuất hình ảnh tài liệu có thể tập trung vào.
Từ khóa
#nhận diện từ khóa #hình ảnh tài liệu #truy xuất thông tin #phương pháp dựa trên ký tự #phương pháp dựa trên từ khóa.Tài liệu tham khảo
Abirami S, Manjula D (2009) Feature string based intelligent information retrieval from tamil document images. Int J Comput Appl Technol Special Issue on ‘Computer Applications in Knowledge Based Systems’, Vol. 35, No. 2/3/4. Inderscience Publishers, pp 150–164
Aparna KH, Chakravarthi VS (2003) A complete OCR system development of OCR Tamil magazine documents. Tamil Internet
Balasubramanian A, Meshesha M, Jawahar CV (2006) Retrieval from document image collections. In: Proceedings of the international workshop on document analysis systems, LNCS 3872: 1–12
Balasubramanian A, Jawahar CV (2006) Textual search in graphics stream of PDF. International conference on digital libraries, pp 1–10
Chaudhury S, Sethi G, Vyas A, Harit G (2003) Devising interactive access techniques for indian language document images. In: Proceedings of the seventh international conference on document analysis and recognition, pp 885–889
Chen FR, Wilcox LD, Bloomberg DS (1993) Detecting and locating partially specified keywords in scanned images using hidden markov models. In: Proceedings of the international conference on document analysis and recognition, pp 133–138
Chen FR, Wilcox LD, Bloomberg DS (1995) A comparison of discrete and continuous hidden markov models for phrase spotting in text images. In: Proceedings of the international conference on document analysis and recognition, pp 398–402
Chen FR, Bloomberg DS (1996) Extraction of thematically relevant text from images. Symposium on document analysis and information retrieval, pp 163–178
Doermann D (1998) Indexing and retrieval of document images: a survey. J Comput Vis Image Underst 70(3): 287–298
Harit G, Chaudhury S, Ghosh H (2004) Managing document images in a digital library: an ontology guided approach. In: Proceedings of the first international workshop on document image analysis for libraries, pp 64–92
Harit G, Chaudhury S, Gupta P, Vohra N, Joshi SD (2001) Model guided document image analysis system. In: Proceedings of the sixth international conference on document analysis and recognition, pp 1137–1141
Harit G, Chaudhury S, Paranjpe J (2005) Ontology guided access to document images. In: Proceedings of the eighth international conference on document analysis and recognition, pp 292–296
Harit G, Garg R, Chaudhury S (2007) An integrated scheme for compression and interactive access to document images. In: Proceedings of the international conference on computing: theory and applications, pp 506–511
Harit G, Jain R, Chaudhury S (2005) Improved geometric feature graph: a script independent representation of word images for compression and retrieval. In: Proceedings of the eighth international conference on document analysis and recognition, pp 421–425
Jawahar CV, Meshesha M, Balasubramanian A (2004) Searching in document images. In: Proceedings of the international conference on visualization, graphics and image processing, pp 622–627
Jawahar CV, Million M, Balasubramanian A (2004) Word level access to document image datasets. In: Proceedings of the workshop on computer vision, graphics and image processing, pp 73–76
Kameshiro T, Hirano T, Okada Y, Yoda F (1999) A document image retrieval method tolerating recognition and segmentation errors of OCR using shape feature and multiple candidates. In: Proceedings of the fifth international conference on document analysis and recognition, pp 681–684
Kasthuri R, Gormann LO, Govindaraju V (2002) Document image aanlysis: a primer. Sadhana 27(Part. 1): 3–22
Katsuyama K (2002) Highly accurate retrieval of Japanese document images through a combination of morphological analysis and OCR. In: Proceedings of the document recognition and retrieval 4670: 57–67
Krishnamoorthy V (2002) OCR software for Tamil Printed Text. Tamil Internet, pp 99–102
Lu S, Linlin L, Tan CL (2008) Document Image Retrieval through Word Shape Coding. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(11): 1913–1918
Lu S, Tan CL (2007) Keyword Spotting and Retrieval of Document Images captured by a Digital Camera. In: Proceedings of the ninth international conference on document analysis and recognition, pp 994–998
Lu Y, Tan CL, Huang W, Fan L (2001a) An approach to word image matching based on weighted Hausdorff distance. In: Proceedings of the international conference on document analysis and recognition, pp 921–925
Lu Y, Tan CL, Fan L, Huang W (2001b) Similarity measure for CCITT group 4 compressed document images. In: Proceedings of the international conference on image processing, pp 1118–1121
Lu Y, Tan CL (2002a) ‘Word Searching in Document Images Using Word Portion Matching’. Document Analysis Systems V, Lecture Notes on Computer science 2423: 319–328
Lu Y, Tan CL (2002b) Word spotting in Chinese document images without layout analysis. In: Proceedings of the international conference on pattern recognition, pp 57–60
Lu Y, Tan CL (2003) Word searching in CCITT group 4 compressed document images. International conference on document analysis and recognition, pp 467–471
Lu Y, Tan CL (2004) Information Retrieval in Document Image Databases. IEEE Transactions on Knowledge and Data Engineering 16(11): 1398–1410
Lu Y, Tan CL (2004) Chinese Word searching in Imaged documents. International Journal of Pattern Recognition and Artificial Intelligence 18(2): 229–246
Lu Y, Zhang L, Tan CL (2004a) Retrieved Imaged documents in digital libraries based on Word Imaged Coding. In: Proceedings of the first international workshop on document image analysis for libraries, pp 174–187
Lu Y, Zhang L, Tan CL (2004b) A Search engine for Imaged documents in PDF files. In; Proceedings of the special interest group on information retrieval, pp 536–537
Nagy G, Seth S (1984) Hierarchical representation of optically scanned documents. In: Proceedings of the international conference on pattern recognition, pp 347–34
Ohtam M, Takasu A, Adachi J (1997) Retrieval Methods for English Text with Misrecognized OCR characters. In: Proceedings of the fourth international conference on document analysis and recognition, pp 950–956
Pramod Shankar K, Jawahar CV (2006) Enabling Search over Large Collections of Telugu Document Images- An automatic Annotation based approach. LNCS 4338: 837–848
Rath T, Manmatha R (2003) Features for word spotting in historical manuscripts. International conference on document analysis and recognition, pp 218–222
Seethalakshmi R, SreeRanjani TR, Balachandar T, Abnikant Singh, Markandey S, Ritwaj R, Sarvesh K (2005) Optical Character Recognition for printed Tamil text using Unicode. Journal of Zhejiang University Science 6(11): 1297–1305
Smeaton AF, Spitz AL (1997) Using Character shape codes for information retrieval. In: Proceedings of the international conference on document analysis and recognition, pp 974–978
Spitz AL (1993) Generalized line, word and character finding. In: Proceedings of the progress in image analysis and processing, pp 377–383
Spitz AL (1995) Using character shape codes for word spotting in document images. In: Proceedings of the symposium on document analysis and information retrieval, pp 382–389
Spitz AL (1997) Determination of script, language content of document images. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(3): 235–245
Subramanian A, Kuberan B (2000) Optical Character Recognition of Printed Tamil characters. In: Proceedings of the tamil internet conference
Tan CL, Sung SY, Yu Z, Xu Y (2000) Text retrieval from document images based on n-gram algorithm. In: Proceedings of the sixth pacific rim international conference on artificial intelligence, pp 1–12
Tan CL, Huang W, Yu Z, Xu Y (2002) Imaged Document Text retrieval without OCR. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(6): 838–844
Tan CL, Huang W, Sung SY, Yu Z, Xu X (2003) Text retrieval from document images based on word shape analysis. Journal of Applied Intelligence, Special issue on Text and Web Mining 18(3): 257–270
Tanaka Y, Torii H (1988) Transmedia machine and its keyword search over image texts. In: Proceedings of the research information assistee par ordinateur, pp 248–258
Zhang L, Lu Y, Tan CL (2004) A web based system for retrieving document images from digital library. In: Proceedings of the conference on computer vision and pattern recognition workshop, pp 27–35