Learning representation hierarchies by sharing visual features: a computational investigation of Persian character recognition with unsupervised deep learning
Tóm tắt
In humans, efficient recognition of written symbols is thought to rely on a hierarchical processing system, where simple features are progressively combined into more abstract, high-level representations. Here, we present a computational model of Persian character recognition based on deep belief networks, where increasingly more complex visual features emerge in a completely unsupervised manner by fitting a hierarchical generative model to the sensory data. Crucially, high-level internal representations emerging from unsupervised deep learning can be easily read out by a linear classifier, achieving state-of-the-art recognition accuracy. Furthermore, we tested the hypothesis that handwritten digits and letters share many common visual features: A generative model that captures the statistical structure of the letters distribution should therefore also support the recognition of written digits. To this aim, deep networks trained on Persian letters were used to build high-level representations of Persian digits, which were indeed read out with high accuracy. Our simulations show that complex visual features, such as those mediating the identification of Persian symbols, can emerge from unsupervised learning in multilayered neural networks and can support knowledge transfer across related domains.
Tài liệu tham khảo
Ackley D, Hinton GE, Sejnowski TJ (1985) A learning algorithm for Boltzmann machines. Cogn Sci 9:147–169. doi:10.1016/S0364-0213(85)80012-4
Alaei A, Nagabhushan P, Pal U (2009) Fine classification of unconstrained handwritten Persian/Arabic numerals by removing confusion amongst similar classes. In: 10th International conference on document analysis and recognition. pp 601–605. doi:10.1109/ICDAR.2009.181
Alaei A, Nagabhushan P, Pal U (2010) A new two-stage scheme for the recognition of Persian handwritten characters. In: Proceedings—12th international conference on frontiers handwriting recognition, ICFHR 2010. pp 130–135. doi:10.1109/ICFHR.2010.27
Alaei A, Pal U, Nagabhushan P (2012) A comparative study of Persian/Arabic handwritten character recognition. In: 2012 International conference on frontiers handwriting recognition. pp 123–128. doi:10.1109/ICFHR.2012.152
Bengio Y (2009) Learning deep architectures for AI. Now Publishers Inc., Breda
Bengio Y (2011) Deep learning of representations for unsupervised and transfer learning. In: International conference on machine learning. pp 1–20
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:1798–1828
Borji A, Hamidi M, Mahmoudi F (2008) Robust handwritten character recognition with features inspired by visual ventral stream. Neural Process Lett 28:97–111. doi:10.1007/s11063-008-9084-y
Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge
Ciresan D, Schmidhuber J (2015) Multi-column deep neural networks for offline handwritten Chinese character classification. In: 2015 International joint conference on neural networks (IJCNN). IEEE, pp 1–6
Ciresan D, Meier U, Schmidhuber J (2012) Transfer learning for Latin and Chinese characters with deep neural networks. In: International joint conference on neural networks
Clark A (2013) Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav Brain Sci 36:181–204. doi:10.1017/S0140525X12000477
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: International conference on machine learning
Cox DD, Dean T (2014) Neural networks and neuroscience-inspired computer vision. Curr Biol 24:R921–R929. doi:10.1016/j.cub.2014.08.026
Dehaene S, Cohen L (2007) Cultural recycling of cortical maps. Neuron 56:384–398. doi:10.1016/j.neuron.2007.10.004
Dehaene S, Cohen L, Sigman M, Vinckier F (2005) The neural code for written words: a proposal. Trends Cogn Sci 9:335–341. doi:10.1016/j.tics.2005.05.004
Dehaene S, Pegado F, Braga LW et al (2010) How learning to read changes the cortical networks for vision and language. Science 330(80):1359–1364. doi:10.1126/science.1194140
DiCarlo JJ, Zoccolan D, Rust NC (2012) How does the brain solve visual object recognition? Neuron 73:415–434
Ebrahimpour R, Esmkhani A, Faridi S (2010) Farsi handwritten digit recognition based on mixture of RBF experts. IEICE Electron Express 7:1014–1019. doi:10.1587/elex.7.1014
Felleman DJ, Van Essen DC (1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1:1–47
Finkbeiner M, Coltheart M (2009) Letter recognition: from perception to representation. Cogn Neuropsychol 26:1–6. doi:10.1080/02643290902905294
Fukushima K (1988) Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Netw 1:119–130
Ghods V, Kabir E (2010) Feature extraction for online Farsi characters. In: 12th International conference on frontiers handwriting recognition. pp 477–482. doi:10.1109/ICFHR.2010.81
Grainger J, Rey A, Dufau S (2008) Letter perception: from pixels to pandemonium. Trends Cogn Sci 12:381–387. doi:10.1016/j.tics.2008.06.006
Grainger J, Dufau S, Ziegler JC (2016) A vision of reading. Trends Cogn Sci 1529:1–9. doi:10.1016/j.tics.2015.12.008
Hamidi M, Borji A (2009) Invariance analysis of modified C2 features: case study—handwritten digit recognition. Mach Vis Appl 21:969–979. doi:10.1007/s00138-009-0216-9
Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14:1771–1800
Hinton GE (2007) Learning multiple layers of representation. Trends Cogn Sci 11:428–434
Hinton GE (2010) A practical guide to training restricted Boltzmann machines. Technical reports UTML TR 2010-003, Univ Toronto 9:1
Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(80):504–507. doi:10.1126/science.1127647
Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554
Kaushanskaya M, Marian V (2009) The bilingual advantage in novel word learning. Psychon Bull Rev 16:705–710
Khosravi H, Kabir E (2007) Introducing a very large dataset of handwritten Farsi digits and a study on their varieties. Pattern Recognit Lett 28:1133–1141. doi:10.1016/j.patrec.2006.12.022
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 24:609–616
Kruger N, Janssen P, Kalkan S et al (2013) Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE Trans Pattern Anal Mach Intell 35:1847–1871. doi:10.1109/TPAMI.2012.272
Le QV, Ranzato MA, Monga R et al (2012) Building high-level features using large scale unsupervised learning. In: International conference on machine learning, Edinburgh
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. doi:10.1109/5.726791
LeCun Y, Bengio Y, Hinton GE (2015) Deep learning. Nature 521:436–444. doi:10.1038/nature14539
Mohamed A, Dahl GE, Hinton GE (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20:14–22. doi:10.1109/TASL.2011.2109382
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22:1345–1359
Parvez MT, Mahmoud SA (2013) Offline arabic handwritten text recognition: a survey. ACM Comput Surv 45:23:1–23:35. doi:10.1145/2431211.2431222
Raina R, Battle A, Lee H et al (2007) Self-taught learning: transfer learning from unlabeled data. In: International conference on machine learning. pp 759–766
Sadeghi Z (2016) Deep learning and developmental learning: emergence of fine-to-coarse conceptual categories at layers of deep belief network. Perception 45:1036–1045. doi:10.1177/0301006616651950
Salimi H, Giveki D (2012) Farsi/Arabic handwritten digit recognition based on ensemble of SVD classifiers and reliable multi-phase PSO combination rule. Int J Doc Anal Recognit 16:371–386. doi:10.1007/s10032-012-0195-7
Sigaud O, Droniou A (2015) Towards deep developmental learning. IEEE Trans Auton Ment Dev 33:1–16. doi:10.1109/TAMD.2015.2496248
Simoncelli EP, Olshausen BA (2001) Natural image statistics and neural representation. Annu Rev Neurosci 24:1193–1216
Stoianov I, Zorzi M (2012) Emergence of a “visual number sense” in hierarchical generative models. Nat Neurosci 15:194–196. doi:10.1038/nn.2996
Testolin A, Zorzi M (2016) Probabilistic models and generative neural networks: towards an unified framework for modeling normal and impaired neurocognitive functions. Front Comput Neurosci. doi:10.3389/fncom.2016.00073
Testolin A, Stoianov I, De Filippo De Grazia M, Zorzi M (2013) Deep unsupervised learning on a desktop PC: a primer for cognitive scientists. Front Psychol 4:251
Testolin A, Stoianov I, Sperduti A, Zorzi M (2016) Learning orthographic structure with sequential generative neural networks. Cogn Sci 40:579–606
Testolin A, Stoianov I, Zorzi M (2017) Letter perception emerges from unsupervised deep learning and recycling of natural image features (under review)
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10:988–999
Vinckier F, Dehaene S, Jobert A et al (2007) Hierarchical coding of letter strings in the ventral stream: dissecting the inner organization of the visual word-form system. Neuron 55:143–156. doi:10.1016/j.neuron.2007.05.031
Widrow B, Hoff M (1960) Adaptive switching circuits. In: IRE WESCON convention record. pp 96–140
Wiley RW, Wilson C, Rapp B (2016) The effects of alphabet and expertise on letter perception. J Exp Psychol Hum Percept Perform 42:1186–1203. doi:10.1037/xhp0000213
Zorzi M, Testolin A, Stoianov I (2013) Modeling language and cognition with deep unsupervised learning: a tutorial overview. Front Psychol 4:515. doi:10.3389/fpsyg.2013.00515