Handwritten address recognition with open vocabulary using character n-grams

A. Brakensiek1, J. Rottland2, G. Rigoll3
1Department of Computer Science, Gerhard Mercator University of Duisburg, Duisburg, Germany
2Siemens Dematic AG, Konstanz, Germany
3Inst. for Human-Machine Communication, Technical University Munich, Munich, Germany

Tóm tắt

In this paper a recognition system, based on tied-mixture hidden Markov models, for handwritten address words is described, which makes use of a language model that consists of backoff character n-grams. For a dictionary-based recognition system it is essential that the structure of the address (name, street, city) is known. If the single parts of the address cannot be categorized, the used vocabulary is unknown and thus unlimited. The performance of this open vocabulary recognition using n-grams is compared to the use of dictionaries of different sizes. Especially, the confidence of recognition results and the possibility of a useful post-processing are significant advantages of language models.

Từ khóa

#Character recognition #Handwriting recognition #Vocabulary #Dictionaries #Hidden Markov models #Cities and towns #Streaming media #Automation #Writing #Postal services

Tài liệu tham khảo

willett, 2000, Ducoderthe duisburg university lvscr stackdecoder, Proc IEEE Int Conf on Acoustics Speech and Signal Processing (ICASSP), 1555 10.1007/s100320050040 10.1109/ICDAR.2001.953913 10.1109/ICDAR.1993.395706 10.1109/ICDAR.2001.953911 10.1109/34.771314 sch?ubler, 1998, A hmm-based system for recognition of handwritten adresswords, 6th Int Workshop on Frontiers in Handwriting Recognition (IWFHR), 505 10.1109/ICASSP.1992.225981 10.1109/ICDAR.1999.791791 10.1109/ICDAR.1997.620562 clarkson, 1997, Statistical language modeling using the cmu-cambridge toolkit, Proc EUROSPEECH, 2707 10.1109/5.880083 10.1109/MASSP.1986.1165342