A Survey on handwritten documents word spotting
Tóm tắt
Từ khóa
Tài liệu tham khảo
Doermann D (1998) The indexing and retrieval of document images: a survey. Comput Vis Image Underst 70(3):287–298
Kameshiro T, Hirano T, Okada Y, Yoda F (1999) A document image retrieval method tolerating recognition and segmentation errors of OCR using shape-feature and multiple candidates. In: Proceedings of the fifth international conference on document analysis and recognition, 1999. ICDAR ’99, 681–684
Lavrenko V, Rath TM, Manmatha R (2004) Holistic word recognition for handwritten historical documents. In: Proceedings of the first international workshop on document image analysis for libraries, 278–287
Bai S, Li L, Tan C (2009) Keyword spotting in document images through word shape coding. In: 10th international conference on document analysis and recognition, 331–335
Almazán J, Gordo A, Fornés A, Valveny E (2014) Word spotting and recognition with embedded attributes. IEEE Trans Pattern Anal Mach Intell 36(12):2552–2566
Aghbari ZA, Brook S (2009) HAH manuscripts: a holistic paradigm for classifying and retrieving historical arabic handwritten documents. Expert Syst Appl 36(8):10942–10951
Khayyat M, Lam L, Suen CY (2014) Learning-based word spotting system for arabic handwritten documents. Pattern Recognit 47(3):1021–1030
Konidaris T, Gatos B, Ntzios K, Pratikakis I (2007) Theodoridis, and perantonis, keyword-guided word spotting in historical printed documents using synthetic data and user feedback. Int J Doc Anal Recognit 9(2–4):167–177
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
He X, Cai D, Liu H, Ma W-Y (2004) Locality preserving indexing for document representation. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’04, (New York, NY, USA), 96–103, ACM
Marinai S, Marino E, Soda G (2006) Font adaptive word indexing of modern printed documents. IEEE Trans Pattern Anal Mach Intell 28:1187–1199
Syeda-Mahmood T (1997) Indexing of handwritten document images. Proc Workshop Doc Image Anal 1997:66–73
Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th annual IEEE symposium on foundations of computer science, 2006. FOCS ’06, 459–468
Athitsos V, Potamias M, Papapetrou P, Kollios G (2008) Nearest neighbor retrieval using distance-based hashing. In: IEEE 24th international conference on data engineering, 2008. ICDE 2008, 327–336
Rath TM, Manmatha R, Lavrenko V (2004) A search engine for historical manuscript images. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, (Sheffield, United Kingdom), 369–376
Alamri H, Sadri J, Suen CY, Nobile N (2008) A novel comprehensive database for arabic off-line handwriting recognition. In: Proceedings of the 11th international conference on frontiers in handwriting recognition (ICFHR 2008), 664–669
Marti U-V, Bunke H (2002) The IAM-database: an english sentence database for offline handwriting recognition. Int J Doc Anal Recognit 5(1):39–46
Wuthrich M, Liwicki M, Fischer A, Indermuhle E, Bunke H, Viehhauser G, Stolz M (2009) Language model integration for the recognition of handwritten medieval documents. In: 10th international conference on document analysis and recognition, 2009. ICDAR ’09, 211–215
Manmatha R, Croft WB (1997) Word spotting: indexing handwritten manuscripts. In: Maybury MT (ed) Intelligent multimedia information retrieval, MIT Press, Cambridge, pp 43–64
Rath TM, Manmatha R (2003) Features for word spotting in historical manuscripts. In: Proceedings of seventh international conference on document analysis and recognition, 2003, 1, 218–222
Rath TM, Manmatha R (2007) Word spotting for historical documents. Int J Doc Anal Recognit 9(2–4):139–152
Kolcz A, Alspector J, Augusteijn M, Carlson R, Viorel G (2000) Popescu, a line-oriented approach to word spotting in handwritten documents. Pattern Anal Appl 3(2):153–168
Sigappi A, Palanivel S, Ramalingam V (2011) Handwritten document retrieval system for tamil language. Int J Comput Appl 31:42–47
Shah M, Suen C (2010) Word spotting in gray scale handwritten pashto documents. Int Conf Front Handwrit Recognit 2010:136–141
Abidi A, Jamil A, Siddiqi I, Khurshid K (2012) Word spotting based retrieval of urdu handwritten documents. In: Proceedings of the 2012 international conference on frontiers in handwriting recognition, ICFHR ’12, (Washington, DC, USA), 331–336, IEEE Computer Society
Wei H, Gao G (2014) A keyword retrieval system for historical mongolian document images. IJDAR 17(1):33–45
Kesidis A, Galiotou E, Gatos B, Lampropoulos A, Pratikakis I, Manolessou I, Ralli A (2009) Accessing the content of greek historical documents. In: Proceedings of the third workshop on analytics for noisy unstructured text data, AND ’09, (New York, NY, USA), 55–62, ACM
Cao H, Bhardwaj A, Govindaraju V (2009) A probabilistic method for keyword retrieval in handwritten document images. Pattern Recognit 42(12):3374–3382
Rath TM, Manmatha R (2003) Word image matching using dynamic time warping. In: Proceedings of 2003 IEEE computer society conference on computer vision and pattern recognition, 2003, vol 2, pp II-521–II-527
Srihari S, Srinivasan H, Babu P, Bhole C (2006) Spotting words in handwritten arabic documents. In: Document recognition and retrieval XIII: Proceedings SPIE
Srihari S, Srinivasan H, Babu P, Bhole C (2005) Handwritten arabic word spotting using the cedarabic document analysis system. In: Proceedings 2005 symposium on document image understanding technology
Srihari S, Ball G (2008) Language independent word spotting in scanned documents”, in Digital Libraries: universal and ubiquitous access to information, 5362 of lecture notes in computer science. Springer, Berlin
Zhang B, Srihari SN, Huang C (2003) Word image retrieval using binary features. Proc SPIE 5296:45–53
Kefali A, Chemmam C (2011) A semi-automatic approach of old arabic documents indexing. In: CIIA’11, 1
Liang Y, Fairhurst M, Guest R (2012) A synthesised word approach to word retrieval in handwritten documents. Pattern Recognit 45(12):4225–4236
Moghaddam R, Cheriet M (2009) Application of multi-level classifiers and clustering for automatic word spotting in historical document images. In: 10th international conference on document analysis and recognition, 511–515
Llados J, Sanchez G (2007) Indexing historical documents by word shape signatures. In: Ninth international conference on document analysis and recognition, 2007. ICDAR 2007, 1, 362–366
Fornés, A, Frinken V, Fischer A, Almazán J, Jackson G, Bunke H (2011) A keyword spotting approach using blurred shape model-based descriptors. In: Proceedings of the 2011 workshop on historical document imaging and processing, HIP ’11, (New York, NY, USA), 83–90, ACM
Lladós J, Rodríguez Partha J, Sánchez (2007) Word spotting in archive documents using shape contexts. In: Pattern recognition and image analysis, 4478 of lecture notes in computer science, 290–297
Roy PP, Rayar F, Ramel J-Y (2015) Word spotting in historical documents using primitive codebook and dynamic programming. Image Vis Comput 44:15–28
Giotis A, Sfikas G, Nikou C, Gatos B (2015) Shape-based word spotting in handwritten document images. In: 13th international conference on document analysis and recognition (ICDAR), 2015, 561–565
Adamek T, O’Connor N, Smeaton A (2007) Word matching using single closed contours for indexing handwritten historical documents. Int J Doc Anal Recognit 9(2–4):153–165
Can EF, Duygulu P (2011) A line-based representation for matching words in historical manuscripts. Pattern Recognit Lett 32(8):1126–1138
Casey RG, Lecolinet E (1996) A survey of methods and strategies in character segmentation. IEEE Trans Pattern Anal Mach Intell 18:690–706
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24:509–522
Escalera S, Fornés A, Pujol O, Radeva P, Saánchez G, Lladaós J (2009) Blurred shape model for binary and grey-level symbol recognition. Pattern Recognit Lett 30(15):1424–1433
Gatos B, Stamatopoulos N, Louloudis G, Sfikas G, Retsinas G, Papavassiliou V, Sunistira F, Katsouros V (2015) Grpoly-db: an old greek polytonic document image database. In: 13th international conference on document analysis and recognition (ICDAR), 2015, 646–650
Adamek T, O’Connor N (2004) A multiscale representation method for nonrigid shapes with a single closed contour. IEEE Trans Circuits Syst Video Technol 14:742–753
Agarwal PK, Varadarajan KR (2000) Efficient algorithms for approximating polygonal chains. Discrete Comput Geom 23(2):273–291
Ferrari V, Fevrier L, Jurie F, Schmid C (2008) Groups of adjacent contour segments for object detection. IEEE Trans Pattern Anal Mach Intell 30:36–51
Ataer E, Duygulu P (2007) Matching ottoman words. In: Signal processing and communications applications, 2007. SIU 2007. IEEE 15th, 1–4
Nol MR, Aldavert D, Toledo R, Lladós J (2015) Efficient segmentation-free keyword spotting in historical document collections. Pattern Recognit 48(2):545–555
Yalniz IZ, Manmatha R (2012) An efficient framework for searching text in noisy document images. In: 10th IAPR international workshop on document analysis systems, DAS 2012, Gold Coast, Queenslands, Australia, March 27-29, 2012, 48–52
Rothacker L, Rusiñol M, Fink G (2013) Bag-of-features HMMs for segmentation-free word spotting in handwritten documents. In: 12th international conference on document analysis and recognition (ICDAR), 2013, 1305–1309
Leydier M, Aldavert D, Toledo R, Llados J ( 2011) Browsing heterogeneous document collections by a segmentation-free word spotting method. In: International conference on document analysis and recognition, 63–67
Czuni L, Kiss P, Gal M, Lipovits A (2013) Local feature based word spotting in handwritten archive documents. In: 2013 11th international workshop on content-based multimedia indexing (CBMI), 179–184
Rodriguez-Serrano J, Perronnin F (2009) Handwritten word image retrieval with synthesized typed queries. In: 10th international conference on document analysis and recognition, 2009. ICDAR ’09, 351–355
Rodriguez-Serrano JA, Perronnin F (2012) Synthesizing queries for handwritten word image retrieval. Pattern Recognit 45(9):3270–3276
Rodrıguez JA, Perronnin F (2008) Local gradient histogram features for word spotting in unconstrained handwritten documents. In: International conference on frontiers in handwriting recognition
Zhang X, Tan C (2013) Segmentation-free keyword spotting for handwritten documents based on heat kernel signature. In: 12th international conference on document analysis and recognition (ICDAR), 2013, 827–831
Khayyat M, Lam L, Suen CY (2012) Arabic handwritten word spotting using language models. In: Proceedings of the 2012 international conference on frontiers in handwriting recognition, ICFHR ’12, (Washington, DC, USA), 43–48, IEEE Computer Society
Almazaán J, Gordo A, Fornés A, Valveny E (2014) Segmentation-free word spotting with exemplar SVMs. Pattern Recognit 47(12):3967–3978
Yao S, Wen Y, Lu Y (2015) Hog based two-directional dynamic time warping for handwritten word spotting. In: 13th international conference on document analysis and recognition (ICDAR), 2015, 161–165
Rothacker L, Fink G (2015) Segmentation-free query-by-string word spotting with bag-of-features HMMs. In: 13th international conference on document analysis and recognition, 661–665
Rabaev I, Kedem K, El-Sana J (2016) Keyword retrieval using scale-space pyramid. In: 12th IAPR workshop on document analysis systems (DAS), 2016, 144–149
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition, 2: 2169–2178
Pratikakis I, Zagoris K, Gatos B, Louloudis G, Stamatopoulos N (2014) ICFHR 2014 competition on handwritten keyword spotting (h-kws 2014). In: 14th international conference on frontiers in handwriting recognition (ICFHR), 2014, 814–819
Sagheer M, Nobile N, He CL, Suen C (2010) A novel handwritten urdu word spotting based on connected components analysis. In: 20th international conference on pattern recognition (ICPR), 2010, 2013–2016
Fischer A, Keller A, Frinken V, Bunke H (2012) Lexicon-free handwritten word spotting using character HMMs. Pattern Recognit Lett 33(7):934–942
Fischer A, Keller A, Frinken V, Bunke H (2010) Hmm-based word spotting in handwritten documents using subword models. In: 20th international conference on pattern recognition (ICPR), 2010, 3416–3419
Wshah S, Kumar G, Govindaraju V (2012) Script independent word spotting in offline handwritten documents based on hidden markov models. Int Conf Front Handwrit Recognit 2012:14–19
Wshaha S, Kumar G, Govindaraju V (2014) Statistical script independent word spotting in offline handwritten documents. Pattern Recognit 47(3):1039–1050
Rodriguez-Serrano J, Perronnin F (2012) A model-based sequence similarity with application to handwritten word spotting. IEEE Trans Pattern Anal Mach Intell 34:2108–2120
Saykol E, Sinop AK, Gudukbay U, Ulusoy O, Çetin AE (2004) Content-based retrieval of historical ottoman documents stored as textual images. IEEE Trans Image Proc 13(3):314–325
Shahab S, Al-Khatib W, Mahmoud S (2006) Computer aided indexing of historical manuscripts. Int Conf Comput Graph Imaging Vis 2006:287–295
Rodríguez-Serrano JA, Perronnin F (2009) Handwritten word-spotting using hidden markov models and universal vocabularies. Pattern Recognit 42(9):2106–2116
Huang L, Yin F, Chen Q-H, Liu C-L (2013) Keyword spotting in unconstrained handwritten chinese documents using contextual word model. Image Vis Comput 31(12):958–968
Kesidis A, Galiotou E, Gatos B, Pratikakis I (2011) A word spotting framework for historical machine-printed documents. Int J Doc Anal Recognit 14(2):131–144
Gatos B, Konidaris T, Pratikakis I, Perantonis S (2006) A holistic methodology for keyword search in historical typewritten documents. In: Antoniou G, Potamias G, Spyropoulos C, Plexousakis D (eds) Advances in artificial intelligence, 3955 of lecture notes in computer science. Springer, Berlin, pp 490–493
Frinken V, Fischer A, Bunke H (2010) A novel word spotting algorithm using bidirectional long short-term memory neural networks. In: Schwenker F, El Gayar N (eds) Artificial neural networks in pattern recognition, 5998 of lecture notes in computer science. Springer, Berlin, pp 185–196
Frinken V, Fischer A, Manmatha R, Bunke H (2012) A novel word spotting method based on recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 34:211–224
Retsinas G, Louloudis G, Stamatopoulos, Gatos B (2016) Keyword spotting in handwritten documents using projections of oriented gradients. In: 12th IAPR workshop on document analysis systems (DAS), 411–416
Fischer A, Frinken V, Bunke H, Suen C (2013) Improving HMM-based keyword spotting with character language models. In: 12th international conference on document analysis and recognition (ICDAR), 2013, 506–510
Marti U-V, Bunke H (2001) Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Int J Pattern Recognit Artif Intell 15(01):65–90
Bunke H, Bengio S, Vinciarelli A (2004) Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans Pattern Anal Mach Intell 26(6):709–720
Liu C-L, Yin F, Wang D-H, Wang Q-F (2011) Casia online and offline chinese handwriting databases. Int Conf Doc Anal Recognit 2011:37–41
Perronnin F, Rodriguez-Serrano JA (2009) Fisher kernels for handwritten word spotting. In: Proceedings of the 2009 10th international conference on document analysis and recognition, ICDAR ’09, (Washington, DC, USA), 106–110, IEEE computer society
Zhang H, Wang D-H, Liu C-L (2010) Keyword spotting from online chinese handwritten documents using one-vs-all trained character classifier. Int Conf Front Handwrit Recognit 2010:271–276
Zhang H, Zhou X-D, Liu C-L (2013) Keyword spotting in online chinese handwritten documents with candidate scoring based on semi-CRF model. In: Document analysis and recognition (ICDAR), 2013 12th international conference on, 567–571
Terasawa K, Nagasaki T, Kawashima T (2005) Eig-enspace method for text retrieval in historical document images. Proc Eighth Int Conf Doc Anal Recognit 1:437–441
Terasawa K, Nagasaki T, Kawashima T (2006) Automatic keyword extraction from historical document images. Document analysis systems VII, 3872 of lecture notes in computer science. Springer, Berlin, pp 413–424
Nabil Aouadi AK (2011) Word spotting for arabic handwritten historical document retrieval using generalized hough transform. Third Int Conf Pervasive Patterns Appl 2011:67–71
Sousa J, Gil J, Pinto J (2007) Word indexing of ancient documents using fuzzy classification. Fuzzy Syst IEEE Trans 15:852–862
Fernández D, Lladós J, Fornés A (2011) Handwritten word spotting in old manuscript images using a pseudo-structural descriptor organized in a hash structure. Pattern recognition and image analysis, 6669 of lecture notes in computer science. Springer, Berlin, pp 628–635
Bilane P, Bres S, Challita K, Emptoz H (2009) Indexation of syriac manuscripts using directional features. In: 16th IEEE international conference on image processing (ICIP), 2009, 1841–1844
Bilane P, Bres S, Emptoz H (2008) Robust directional features for wordspotting in degraded syriac manuscripts. In: International workshop on content-based multimedia indexing, 526–533
Zant T, Schomaker L, Haak K (2008) Handwritten-word spotting using biologically inspired features. IEEE Trans Pattern Anal Mach Intell 30:1945–1957
Leydier Y, Lebourgeois F, Emptoz H (2007) Text search for medieval manuscript images. Pattern Recognit 40(12):3552–3567
Leydier Y, Ouji A, LeBourgeois F, Emptoz H (2009) Towards an omnilingual word retrieval system for ancient manuscripts. Pattern Recognit 42(9):2089–2105
Ghosh SK, Valveny E (2015) Query by string word spotting based on character bi-gram indexing. In: 13th international conference on document analysis and recognition (ICDAR), 2015, 881–885
Bui QA, Visani M, Mullot R (2015) Unsupervised word spotting using a graph representation based on invariants. In: 13th international conference on document analysis and recognition (ICDAR), 2015, 616–620
Riba P, Lladãs J, Fornés A (2015) Handwritten word spotting by inexact matching of grapheme graphs. In: 13th international conference on document analysis and recognition (ICDAR), 2015, 781–785
Sharma A, Sankar KP (2015) Adapting off-the-shelf cnns for word spotting and recognition. In: 13th international conference on document analysis and recognition (ICDAR), 986–990, x
Sudholt S, Fink GA (2016) PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. https://arxiv.org/pdf/1604.00187.pdf
Zhou X-D, Wang D-H, Tian F, Liu C-L, Nakagawa M (2013) Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields. IEEE Trans Pattern Anal Mach Intell 35:2413–2426
Glucksman H (1967) Classification of mixed-font alphabets by characteristic loci. In: Proceedings of conference IEEE comput, pp 138–141
Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell 29:411–426
Serre T, Wolf L, Poggio T (2005) Object recognition with features inspired by visual cortex. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 2:2, pp 994–1000
Powell MJD (1987) Radial basis functions for multivariable interpolation: a review. In: Mason JC, Cox MG (eds) Algorithms for approximation. Clarendon Press, New York, pp 143–167
NL-HaNA (1903) Archief van het Kabinet der Koningin, Den Haag (Netherlands)