Using goal-driven deep learning models to understand sensory cortex

Nature Neuroscience - Tập 19 Số 3 - Trang 356-365 - 2016
Daniel Yamins1, James J. DiCarlo1
1Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

DiCarlo, J.J. & Cox, D.D. Untangling invariant object recognition. Trends Cogn. Sci. 11, 333–341 (2007).

DiCarlo, J.J., Zoccolan, D. & Rust, N.C. How does the brain solve visual object recognition? Neuron 73, 415–434 (2012).

Felleman, D.J. & Van Essen, D.C. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991).

Malach, R., Levy, I. & Hasson, U. The topography of high-order human object areas. Trends Cogn. Sci. 6, 176–184 (2002).

Carandini, M. et al. Do we know what the early visual system does? J. Neurosci. 25, 10577–10597 (2005).

Sharpee, T.O., Kouh, M. & Reynolds, J.H. Trade-off between curvature tuning and position invariance in visual area V4. Proc. Natl. Acad. Sci. USA 110, 11618–11623 (2013).

David, S.V., Hayden, B.Y. & Gallant, J.L. Spectral receptive field properties explain shape selectivity in area V4. J. Neurophysiol. 96, 3492–3505 (2006).

Gallant, J.L., Connor, C.E., Rakshit, S., Lewis, J.W. & Van Essen, D.C. Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. J. Neurophysiol. 76, 2718–2739 (1996).

Rust, N.C., Mante, V., Simoncelli, E.P. & Movshon, J.A. How MT cells analyze the motion of visual patterns. Nat. Neurosci. 9, 1421–1431 (2006).

Hubel, D.H. & Wiesel, T.N. Receptive fields of single neurones in the cat's striate cortex. J. Physiol. (Lond.) 148, 574–591 (1959).

Fukushima, K. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980).

Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025 (1999).

Serre, T., Oliva, A. & Poggio, T. A feedforward architecture accounts for rapid categorization. Proc. Natl. Acad. Sci. USA 104, 6424–6429 (2007).

Bengio, Y. Learning Deep Architectures for AI (Now Publishers, 2009).

Pinto, N., Doukhan, D., DiCarlo, J.J. & Cox, D.D. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Comput. Biol. 5, e1000579 (2009).

LeCun, Y. & Bengio, Y. Convolutional networks for images, speech, and time series. in The Handbook of Brain Theory and Neural Networks 255–258 (MIT Press, 1995).

Carandini, M. & Heeger, D.J. Normalization as a canonical neural computation. Nat. Rev. Neurosci. 13, 51–62 (2012).

Yamins, D., Hong, H., Cadieu, C. & Dicarlo, J. Hierarchical modular optimization of convolutional networks achieves representations similar to macaque it and human ventral stream. Adv. Neural Inf. Process. Syst. 26, 3093–3101 (2013).

De Valois, K.K., De Valois, R.L. & Yund, E.W. Responses of striate cortex cells to grating and checkerboard patterns. J. Physiol. (Lond.) 291, 483–505 (1979).

Jones, J.P. & Palmer, L.A. An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J. Neurophysiol. 58, 1233–1258 (1987).

Movshon, J.A., Thompson, I.D. & Tolhurst, D.J. Spatial summation in the receptive fields of simple cells in the cat's striate cortex. J. Physiol. (Lond.) 283, 53–77 (1978).

Klein, D.J., Simon, J.Z., Depireux, D.A. & Shamma, S.A. Stimulus-invariant processing and spectrotemporal reverse correlation in primary auditory cortex. J. Comput. Neurosci. 20, 111–136 (2006).

Barlow, H.B. Possible principles underlying the transformations of sensory messages. in Sensory Communication Vol. 1, 217–234 (1961).

Olshausen, B.A. & Field, D.J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).

deCharms, R.C. & Zador, A. Neural representation and the cortical code. Annu. Rev. Neurosci. 23, 613–647 (2000).

Olshausen, B.A., Sallee, P. & Lewicki, M.S. Learning sparse image codes using a wavelet pyramid architecture. Adv. Neural Inf. Process. Syst. 14, 887–893 (2001).

Logothetis, N.K., Pauls, J. & Poggio, T. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 5, 552–563 (1995).

Zoccolan, D., Kouh, M., Poggio, T. & DiCarlo, J.J. Trade-off between object selectivity and tolerance in monkey inferotemporal cortex. J. Neurosci. 27, 12292–12307 (2007).

Kriegeskorte, N. Relating population-code representations between man, monkey, and computational models. Front. Neurosci. 3, 363–373 (2009).

Ullman, S. Visual routines. Cognition 18, 97–159 (1984).

Singer, W. & Gray, C.M. Visual feature integration and the temporal correlation hypothesis. Annu. Rev. Neurosci. 18, 555–586 (1995).

Majaj, N.J., Hong, H., Solomon, E.A. & DiCarlo, J.J. Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict human core object recognition performance. J. Neurosci. 35, 13402–13418 (2015).

Yamins, D.L.K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl. Acad. Sci. USA 111, 8619–8624 (2014).

Cadieu, C.F. et al. Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Comput. Biol. 10, e1003963 (2014).

Khaligh-Razavi, S.M. & Kriegeskorte, N. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput. Biol. 10, e1003915 (2014).

Güçlü, U. & van Gerven, M.A. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J. Neurosci. 35, 10005–10014 (2015).

Yau, J.M., Pasupathy, A., Brincat, S.L. & Connor, C.E. Curvature processing dynamics in macaque area V4. Cereb. Cortex 23, 198–209 (2013).

Freeman, J. & Simoncelli, E.P. Metamers of the ventral stream. Nat. Neurosci. 14, 1195–1201 (2011).

Pasupathy, A. & Connor, C.E. Population coding of shape in area V4. Nat. Neurosci. 5, 1332–1338 (2002).

Kell, A., Yamins, D., Norman-Haignere, S. & McDermott, J. Functional organization of auditory cortex revealed by neural networks optimized for auditory tasks. Soc. Neurosci. Abstr. 466.04 (2015).

Razavian, A.S., Azizpour, H., Sullivan, J. & Carlsson, S. CNN features off-the-shelf: an astounding baseline for recognition. in Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE Conference on, 512–519 (IEEE, 2014).

Bottou, L. Large-scale machine learning with stochastic gradient descent. in Proc. COMPSTAT 2010, 177–186 (Springer, 2010).

Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012).

Choudhary, S. et al. Silicon neurons that compute. in Artificial Neural Networks and Machine Learning–ICANN 2012, 121–128 (Springer, 2012).

Snoek, J., Larochelle, H. & Adams, R.P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 26, 2951–2959 (2012).

Bergstra, J., Yamins, D. & Cox, D. Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In Proc. 30th International Conference on Machine Learning 115–123, http://jmlr.csail.mit.edu/proceedings/papers/v28/ (2013).

Griffin, G., Holub, A. & Perona, P. The Caltech-256 object category dataset. Caltech Technical Report, http://resolver.caltech.edu/CaltechAUTHORS:CNS-TR-2007-001 (2007).

Pinto, N., Cox, D.D. & DiCarlo, J.J. Why is real-world visual object recognition hard? PLoS Comput. Biol. 4, e27 (2008).

Deng, J. et al. ImageNet: a large-scale hierarchical image database. in CVPR 2009, IEEE Conference on Computer Vision and Pattern Recognition, 248–288 (IEEE, 2009).

Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at http://arxiv.org/abs/1409.1556 (2014).

Szegedy, C. et al. Going deeper with convolutions. Preprint at http://arxiv.org/abs/1409.4842 (2014).

Halevy, A., Norvig, P. & Pereira, F. The unreasonable effectiveness of data. IEEE Intell. Syst. 24, 8–12 (2009).

Pillow, J.W. et al. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature 454, 995–999 (2008).

Khorrami, P., Paine, T.L. & Huang, T.S. Do deep neural networks learn facial action units when doing expression recognition? Preprint at http://arxiv.org/abs/1510.02969 (2015).

Hinton, G.E., Dayan, P., Frey, B.J. & Neal, R.M. The “wake-sleep” algorithm for unsupervised neural networks. Science 268, 1158–1161 (1995).

Zhu, L.L., Lin, C., Huang, H., Chen, Y. & Yuille, A. Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. in Computer Vision–ECCV 2008, 759–773 (Springer, 2008).

Bengio, Y. Deep learning of representations for unsupervised and transfer learning. In Unsupervised and Transfer Learning: Challenges in Machine Learning Vol. 7 (eds. Guyon, I., Dror, G & Lemaire, V.) 29–41 (Microtome, 2013).

Mante, V., Sussillo, D., Shenoy, K.V. & Newsome, W.T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).

Stadie, B.C., Levine, S. & Abbeel, P. Incentivizing exploration in reinforcement learning with deep predictive models. Preprint at http://arxiv.org/abs/1507.00814 (2015).

Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

Harvey, C.D., Coen, P. & Tank, D.W. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature 484, 62–68 (2012).

Hulbert, J. & Norman, K. Neural differentiation tracks improved recall of competing memories following interleaved study and retrieval practice. Cereb. Cortex 25, 3994–4008 (2015).

Hung, C.P., Kreiman, G., Poggio, T. & DiCarlo, J.J. Fast readout of object identity from macaque inferior temporal cortex. Science 310, 863–866 (2005).

Rust, N.C. & Dicarlo, J.J. Selectivity and tolerance (“invariance”) both increase as visual information propagates from cortical area V4 to IT. J. Neurosci. 30, 12978–12995 (2010).

Freedman, D.J., Riesenhuber, M., Poggio, T. & Miller, E.K. Categorical representation of visual stimuli in the primate prefrontal cortex. Science 291, 312–316 (2001).

Pagan, M., Urban, L.S., Wohl, M.P. & Rust, N.C. Signals in inferotemporal and perirhinal cortex suggest an untangling of visual target information. Nat. Neurosci. 16, 1132–1139 (2013).

Marder, E. Understanding brains: details, intuition, and big data. PLoS Biol. 13, e1002147 (2015).

Gatys, L.A., Ecker, A.S. & Bethge, M. A neural algorithm of artistic style Preprint at http://arxiv.org/abs/1508.06576 (2015).

Yamane, Y., Carlson, E.T., Bowman, K.C., Wang, Z. & Connor, C.E. A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nat. Neurosci. 11, 1352–1360 (2008).

Afraz, A., Boyden, E.S. & DiCarlo, J.J. Optogenetic and pharmacological suppression of spatial clusters of face neurons reveal their causal role in face gender discrimination. Proc. Natl. Acad. Sci. USA 112, 6730–6735 (2015).

Marr, D., Poggio, T. & Ullman, S. Vision: A Computational Investigation Into the Human Representation and Processing of Visual Information (MIT Press, 2010).

Hoyle, G. The scope of neuroethology. Behav. Brain Sci. 7, 367–381 (1984).

Szegedy, C. et al. Intriguing properties of neural networks. Preprint at http://arxiv.org/abs/1312.6199 (2013).

Goodfellow, I.J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. Preprint at http://arxiv.org/abs/1412.6572 (2014).