Handwritten document retrieval

G. Russell1, M.P. Perrone2, Yi-min Chee2, A. Aiman Ziq1
1IBM T.J. Watson Research Center, USA
2IBM T. J. Watson Research Center, USA

Tóm tắt

This paper investigates the use of both typed and handwritten queries to retrieve handwritten documents. The recognition-based approach reported here is novel in that it expands documents in a fashion analogous to query expansion: Individual documents are expanded using N-best lists which embody additional statistical information from a hidden Markov model (HMM) based handwriting recognizer used to transcribe each of the handwritten documents. This additional information enables the retrieval methods to be robust to machine transcription errors, retrieving documents which otherwise would be unretrievable. Cross-writer experiments on a database of 10985 words in 108 documents from 108 writers, and within-writer experiments in a probabilistic framework, on a database of 537724 words in 3342 documents from 43 writers, indicate that significant improvements in retrieval performance can be achieved. The second database is the largest database of on-line handwritten documents known to its.

Từ khóa

#Handwriting recognition #Databases #Information retrieval #Optical character recognition software #Redundancy #Hidden Markov models #Ink #Robustness #Degradation #Character recognition

Tài liệu tham khảo

10.1007/978-1-4471-2099-5_21 turtle, 1991, Efficient probabilistic inference for text retrieval, Proceedings of the 3rd RIAO Conference Computer-Assisted Information Searching on Internet 10.1108/eb026526 10.1109/ICASSP.1996.550777 kwok, 2000, Ink retrieval from handwritten documents, Proceedings of the 2nd Annual International Conference on Intelligent Data Engineering and Automated Learning 10.1109/ICPR.2000.906139 10.1145/223784.223811 robertson, 1995, Gatford okapi at trec-3, Proceedings of the 3rd Text Retrieval Conference, 109 perrone, 2002, A multimedia document retireval framework with applications to handwritten document retrieval, IBM Systems Journal 0 nielsen, 1993, Information Retrieval of Imperfectly Recognized Handwriting lopresti, 1998, On the searchability of electronic ink, Proc Int'l Workshop Frontiers in Handwriting Recognition 10.1145/290941.291008 10.1109/34.824821