Gradient estimation in dendritic reinforcement learning
Tóm tắt
We study synaptic plasticity in a complex neuronal cell model where NMDA-spikes can arise in certain dendritic zones. In the context of reinforcement learning, two kinds of plasticity rules are derived, zone reinforcement (ZR) and cell reinforcement (CR), which both optimize the expected reward by stochastic gradient ascent. For ZR, the synaptic plasticity response to the external reward signal is modulated exclusively by quantities which are local to the NMDA-spike initiation zone in which the synapse is situated. CR, in addition, uses nonlocal feedback from the soma of the cell, provided by mechanisms such as the backpropagating action potential. Simulation results show that, compared to ZR, the use of nonlocal feedback in CR can drastically enhance learning performance. We suggest that the availability of nonlocal feedback for learning is a key advantage of complex neurons over networks of simple point neurons, which have previously been found to be largely equivalent with regard to computational capability.
Tài liệu tham khảo
Polsky A, Mel BW, Schiller J: Computational subunits in thin dendrites of pyramidal cells. Nat Neurosci 2004,7(Jun):621–627.
Maass W: Computation with spiking neurons. In The Handbook of Brain Theory and Neural Networks. Edited by: Arbib MA. MIT Press, Cambridge; 2003:1080–1083.
Poirazi P, Brannon T, Mel BW: Pyramidal neuron as two-layer neural network. Neuron 2003,37(Mar):989–999.
Nevian T, Larkum ME, Polsky A, Schiller J: Properties of basal dendrites of layer 5 pyramidal neurons: a direct patch-clamp recording study. Nat Neurosci 2007,10(Feb):206–214.
Zhou WL, Yan P, Wuskell JP, Loew LM, Antic SD: Dynamics of action potential backpropagation in basal dendrites of prefrontal cortical pyramidal neurons. Eur J Neurosci 2008,27(Feb):923–936.
Schiller J, Major G, Koester HJ, Schiller Y: NMDA spikes in basal dendrites of cortical pyramidal neurons. Nature 2000,404(Mar):285–289.
Schiller J, Schiller Y: NMDA receptor-mediated dendritic spikes and coincident signal amplification. Curr Opin Neurobiol 2001,11(Jun):343–348.
Major G, Polsky A, Denk W, Schiller J, Tank DW: Spatiotemporally graded NMDA spike/plateau potentials in basal dendrites of neocortical pyramidal neurons. J Neurophysiol 2008,99(May):2584–2601.
Larkum ME, Zhu JJ, Sakmann B: A new cellular mechanism for coupling inputs arriving at different cortical layers. Nature 1999,398(Mar):338–341.
Larkum ME, Nevian T, Sandler M, Polsky A, Schiller J: Synaptic integration in tuft dendrites of layer 5 pyramidal neurons: a new unifying principle. Science 2009,325(Aug):756–760.
Seung H: Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron 2003, 40: 1063–1073. 10.1016/S0896-6273(03)00761-X
Fremaux N, Sprekeler H, Gerstner W: Functional requirements for reward-modulated spike-timing-dependent plasticity. J Neurosci 2010,30(Oct):13326–13337.
Williams R: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 1992, 8: 229–256.
Matsuda Y, Marzo A, Otani S: The presence of background dopamine signal converts long-term synaptic depression to potentiation in rat prefrontal cortex. J Neurosci 2006, 26: 4803–4810. 10.1523/JNEUROSCI.5312-05.2006
Seol G, Ziburkus J, Huang S, Song L, Kim I, Takamiya K, Huganir R, Lee H, Kirkwood A: Neuromodulators control the polarity of spike-timing-dependent synaptic plasticity. Neuron 2007, 55: 919–929. Erratum in: Neuron56:754. Erratum in: Neuron56:754. 10.1016/j.neuron.2007.08.013
Pawlak V, Kerr JN: Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity. J Neurosci 2008,28(Mar):2435–2446.
Werfel J, Xie X, Seung HS: Learning curves for stochastic gradient descent in linear feedforward networks. Neural Comput 2005, 17: 2699–2718. 10.1162/089976605774320539
Urbanczik R, Senn W: Reinforcement learning in populations of spiking neurons. Nat Neurosci 2009, 12: 250–252. 10.1038/nn.2264
Dayan P, Abbott L: Theoretical Neuroscience. MIT Press, Cambridge; 2001.
Pfister J, Toyoizumi T, Barber D, Gerstner W: Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning. Neural Comput 2006, 18: 1318–1348. 10.1162/neco.2006.18.6.1318
Bertsekas DP, Tsitsiklis JN: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs; 1989.
Baxter J, Bartlett P: Infinite-horizon policy-gradient estimation. J Artif Intell Res 2001, 15: 319–350.
Baxter J, Bartlett P, Weaver L: Experiments with infinite-horizon, policy-gradient estimation. J Artif Intell Res 2001, 15: 351–381.
Friedrich J, Urbanczik R, Senn W: Spatio-temporal credit assignment in neuronal population learning. PLoS Comput Biol 2011.,7(Jun):
Clopath C, Büsing L, Vasilaki E, Gerstner W: Connectivity reflects coding: a model of voltage-based STDP with homeostasis. Nat Neurosci 2010,13(Mar):344–352.
Friedrich J, Urbanczik R, Senn W: Learning spike-based population codes by reward and population feedback. Neural Comput 2010, 22: 1698–1717. 10.1162/neco.2010.05-09-1010