On the improvement of handwritten text line recognition with octave convolutional recurrent neural networks
Tóm tắt
Off-line handwritten text recognition (HTR) poses a significant challenge due to the complexities of variable handwriting styles, background degradation, and unconstrained word sequences. This work tackles the handwritten text line recognition problem using octave convolutional recurrent neural networks (OctCRNN). Our approach requires no word segmentation, preprocessing, or explicit feature extraction and leverages octave convolutions to process multiscale features without increasing the number of learnable parameters. We investigate the OctCRNN under different settings, including an octave design that efficiently balances computational cost and recognition performance. We thoroughly investigate the OctCRNN under different settings by formulating an experimental pipeline with a visualization step to get intuitions about how the model works compared to a counterpart based on traditional convolutions. The system becomes complete by adding a language model to increase linguistic knowledge. Finally, we assess the performance of our solution using character and word error rates against established handwritten text recognition benchmarks: IAM, RIMES, and ICFHR 2016 READ. According to the results, our proposal achieves state-of-the-art performance while reducing the computational requirements. Our findings suggest that the architecture provides a robust framework for building HTR systems.
Từ khóa
Tài liệu tham khảo
Augustin, E., Carré, M., Grosicki, E., et al.: Rimes evaluation campaign for handwritten mail processing. In: International Workshop on Frontiers in Handwriting Recognition (IWFHR’06), pp 231–235 (2006)
Barrere, K., Soullard, Y., Lemaitre, A., et al.: A light transformer-based architecture for handwritten text recognition. In: International Workshop on Document Analysis Systems, Springer, pp 275–290 (2022)
Bauer, L.: Manual of information to accompany the Wellington corpus of written New Zealand English. Victoria University of Wellington Wellington, Department of Linguistics (1993)
Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. Adv. Neural Inf. Proc. Syst. 29 (2016)
Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 646–651 (2017)
Bluche, T., Louradour, J., Knibbe, M., et al.: The a2ia arabic handwritten text recognition system at the open hart2013 evaluation. In: 2014 11th IAPR International Workshop on Document Analysis Systems, IEEE, pp 161–165 (2014)
Bluche, T., Louradour, J., Messina, R.: Scan, attend and read: End-to-end handwritten paragraph recognition with mdlstm attention. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 1050–1055 (2017)
Cascianelli, S., Cornia, M., Baraldi, L., et al.: Boosting modern and historical handwritten text recognition with deformable convolutions. Int. J. Doc. Anal. Recognit. (IJDAR) pp 1–11 (2022)
Castro, D., Bezerra, B.L., Valenca, M.: Boosting the deep multidimensional long-short-term memory network for handwritten recognition systems. In: 2018 16th international conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 127–132 (2018)
Cheddad, A., Kusetogullari, H., Hilmkil, A., et al.: Shibr-the swedish historical birth records: a semi-annotated dataset. Neural Comput. Appl. 1–13 (2021)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Language 13(4), 359–394 (1999)
Chen, Y., Fan, H., Xu, B., et al.: Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3435–3444 (2019)
Coquenet, D., Soullard, Y., Chatelain, C., et al.: Have convolutions already made recurrence obsolete for unconstrained handwritten text recognition? In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), IEEE, pp 65–70 (2019)
Coquenet, D., Chatelain, C., Paquet, T.: Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), IEEE, pp 19–24 (2020)
Coquenet, D., Chatelain, C., Paquet, T.: End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 508–524 (2022)
Coquenet, D., Chatelain, C., Paquet, T.: Dan: a segmentation-free document attention network for handwritten document recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Doetsch, P., Kozielski, M., Ney, H.: Fast and robust training of recurrent neural networks for offline handwriting recognition. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, IEEE, pp 279–284 (2014)
Dreuw, P., Doetsch, P., Plahl, C., et al.: Hierarchical hybrid mlp/hmm or rather mlp features for a discriminatively trained gaussian hmm: a comparison for offline handwriting recognition. In: Image Processing (ICIP), 2011 18th IEEE International Conference on, IEEE, pp 3541–3544 (2011)
Dutta, K., Krishnan, P., Mathew, M., et al.: Improving cnn-rnn hybrid networks for handwriting recognition. In: 2018 16th international conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 80–85 (2018)
Efron, B.: Better bootstrap confidence intervals. J. Am. Stat. Assoc. 82(397), 171–185 (1987)
Espana-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., et al.: Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 767–779 (2010)
Francis, W.N.: A manual of information to accompany A standard sample of present-day edited American English, for use with digital computers. Brown University, Department of Linguistics (1971)
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT press (2016)
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. Adv. Neural Inf. Proc. Syst. 21, 545–552 (2008)
Graves, A., Fernández, S., Gomez, F., et al.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369–376 (2006)
Graves, A., Liwicki, M., Fernández, S., et al.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2008)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Ingle, R.R., Fujii, Y., Deselaers, T., et al.: A scalable handwritten text recognition system. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp 17–24 (2019)
Jaramillo, J.C.A., Murillo-Fuentes, J.J., Olmos, P.M.: Boosting handwriting text recognition in small databases with transfer learning. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), IEEE, pp 429–434 (2018)
Johansson, S., Eric, A., Roger, G., et al.: The Tagged LOB Corpus. Users’ Manual, Norwegian Computing Centre for the Humanities, Bergen (1986)
Kang, L., Riba, P., Rusiñol, M., et al.: Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recognition 129, 108766 (2022)
Knerr, S., Augustin, E.: A neural network-hidden markov model hybrid for cursive word recognition. In: Pattern Recognition, 1998. Proceedings. Fourteenth International Conference on, IEEE, pp 1518–1520 (1998)
Koerich, A.L., Leydier, Y., Sabourin, R., et al.: A hybrid large vocabulary handwritten word recognition system using neural networks with hidden markov models. In: Frontiers in Handwriting Recognition, 2002. Proceedings. Eighth International Workshop on, IEEE, pp 99–104 (2002)
Kozielski, M., Doetsch, P., Ney, H., et al.: Improvements in rwth’s system for off-line handwriting recognition. In: 2013 12th International Conference on Document Analysis and Recognition, IEEE, pp 935–939 (2013)
LeCun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet physics doklady, Soviet Union, pp 707–710 (1966)
Li, M., Lv, T., Chen, J., et al.: Trocr: Transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021)
Lindeberg, T.: Scale-space theory in computer vision, vol 256. Springer Science & Business Media (2013)
Lins, R.: Nabuco–two decades of processing historical documents in latin america. J. Univ. Comput. Sci. (2011)
Ly, N.T., Ngo, T.T., Nakagawa, M.: A self-attention based model for offline handwritten text recognition. In: Asian Conference on Pattern Recognition, Springer, pp 356–369 (2022)
Maas, A.L., Hannun, A.Y., Ng, A.Y., et al.: Rectifier nonlinearities improve neural network acoustic models. In: Proc. icml, Citeseer, p 3 (2013)
Marti, U.V., Bunke, H.: Using a statistical language model to improve the performance of an hmm-based cursive handwriting recognition system. In: Hidden Markov models: applications in computer vision. World Scientific, p 65–90 (2001)
Marti, U.V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)
Michael, J., Labahn, R., Grüning, T., et al.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp 1286–1293 (2019)
Morillot, O., Likforman-Sulem, L., Grosicki, E.: New baseline correction algorithm for text-line recognition with bidirectional recurrent neural networks. J. Electronic Imaging 22(2), 023028–023028 (2013)
Moysset, B., Messina, R.: Are 2d-lstm really dead for offline text recognition? Int. J. Doc. Anal. Recognit. (IJDAR) 22(3), 193–208 (2019)
Moysset, B., Bluche, T., Knibbe, M., et al.: The a2ia multi-lingual text recognition system at the second maurdor evaluation. In: Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on, IEEE, pp 297–302 (2014)
Muehlberger, G., Seaward, L., Terras, M., et al.: Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study. J. Doc. (2019)
Paszke, A., Gross, S., Chintala, S., et al.: Automatic differentiation in pytorch. In: NIPS-W (2017)
Pham, V., Bluche, T., Kermorvant, C., et al.: Dropout improves recurrent neural networks for handwriting recognition. In: 2014 14th international conference on frontiers in handwriting recognition, IEEE, pp 285–290 (2014)
Plötz, T., Fink, G.A.: Markov models for offline handwriting recognition: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 12(4), 269 (2009)
Poulos, J., Valle, R.: Character-based handwritten text transcription with attention networks. Neural Comput. Appl. pp 1–11 (2021)
Povey, D., Ghoshal, A., Boulianne, G., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding, IEEE Signal Processing Society, CONF (2011)
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp 67–72 (2017)
Raschka, S.: Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808 (2018)
Sanchez, J.A., Toselli, A.H., Romero, V., et al.: Icdar 2015 competition htrts: Handwritten text recognition on the transcriptorium dataset. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp 1166–1170 (2015)
Sanchez, J.A., Romero, V., Toselli, A.H., et al.: Icfhr2016 competition on handwritten text recognition on the read dataset. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), IEEE, pp 630–635 (2016)
Sánchez, J.A., Romero, V., Toselli, A.H., et al.: A set of benchmarks for handwritten text recognition on historical documents. Pattern Recognit. 94, 122–134 (2019)
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000)
Sharma, A., Jayagopi, D.B.: Towards efficient unconstrained handwriting recognition using dilated temporal convolution network. Expert Systems with Applications 164, 114004 (2021)
Singh, S.S., Karayev, S.: Full page handwriting recognition via image to sequence extraction. In: Document Analysis and Recognition–ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part III 16, Springer, pp 55–69 (2021)
de Sousa Neto, A.F., Bezerra, B.L.D., Toselli, A.H., et al.: Htr-flor: A deep learning system for offline handwritten text recognition. In: 2020 33rd SIBGRAPI Conference on Graphics, pp. 54–61. Patterns and Images (SIBGRAPI), IEEE (2020)
Stolcke, A.: Srilm-an extensible language modeling toolkit. In: Seventh international conference on spoken language processing (2002)
Stuner, B., Chatelain, C., Paquet, T.: Lv-rover: lexicon verified recognizer output voting error reduction. arXiv preprint arXiv:1707.07432 (2017)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, PMLR, pp 3319–3328 (2017)
Tassopoulou, V., Retsinas, G., Maragos, P.: Enhancing handwritten text recognition with n-gram sequence decomposition and multitask learning. In: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, pp 10555–10560 (2021)
Tay, Y.H., Khalid, M., Yusof, R., et al.: Offline cursive handwriting recognition system based on hybrid markov model and neural networks. In: Computational Intelligence in Robotics and Automation, 2003. Proceedings. 2003 IEEE International Symposium on, IEEE, pp 1190–1195 (2003)
Tieleman, T., Hinton, G., et al.: Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4(2):26–31 (2012)
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)
Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), IEEE, pp 228–233 (2016)
Wang, Y., Xiao, W., Li, S.: Offline handwritten text recognition using deep learning: A review. In: Journal of Physics: Conference Series, IOP Publishing, p 012015 (2021)
Wick, C., Zöllner, J., Grüning, T.: Transformer for handwritten text recognition using bidirectional post-decoding. In: International Conference on Document Analysis and Recognition, Springer, pp 112–126 (2021)
Wick, C., Zöllner, J., Grüning, T.: Rescoring sequence-to-sequence models for text line recognition with ctc-prefixes. In: International Workshop on Document Analysis Systems, Springer, pp 260–274 (2022)
Wigington, C., Tensmeyer, C., Davis, B., et al.: Start, follow, read: End-to-end full-page handwriting recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 367–383 (2018)
Wu, Y.C., Yin, F., Chen, Z., et al.: Handwritten chinese text recognition using separable multi-dimensional recurrent neural network. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 79–84 (2017)
Xiao, S., Peng, L., Yan, R., et al.: Deep network with pixel-level rectification and robust training for handwriting recognition. SN Comput. Sci. 1, 1–13 (2020)
Yousef, M., Bishop, T.E.: Origaminet: weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14710–14719 (2020)
Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recognition 108, 107482 (2020)