Gradient estimation in dendritic reinforcement learning

The Journal of Mathematical Neuroscience - Tập 2 - Trang 1-19 - 2012

Mathieu Schiess¹, Robert Urbanczik¹, Walter Senn¹

¹Department of Physiology, University of Bern, Bern, Switzerland

Tóm tắt

We study synaptic plasticity in a complex neuronal cell model where NMDA-spikes can arise in certain dendritic zones. In the context of reinforcement learning, two kinds of plasticity rules are derived, zone reinforcement (ZR) and cell reinforcement (CR), which both optimize the expected reward by stochastic gradient ascent. For ZR, the synaptic plasticity response to the external reward signal is modulated exclusively by quantities which are local to the NMDA-spike initiation zone in which the synapse is situated. CR, in addition, uses nonlocal feedback from the soma of the cell, provided by mechanisms such as the backpropagating action potential. Simulation results show that, compared to ZR, the use of nonlocal feedback in CR can drastically enhance learning performance. We suggest that the availability of nonlocal feedback for learning is a key advantage of complex neurons over networks of simple point neurons, which have previously been found to be largely equivalent with regard to computational capability.

Tài liệu tham khảo

Polsky A, Mel BW, Schiller J: Computational subunits in thin dendrites of pyramidal cells. Nat Neurosci 2004,7(Jun):621–627. Maass W: Computation with spiking neurons. In The Handbook of Brain Theory and Neural Networks. Edited by: Arbib MA. MIT Press, Cambridge; 2003:1080–1083. Poirazi P, Brannon T, Mel BW: Pyramidal neuron as two-layer neural network. Neuron 2003,37(Mar):989–999. Nevian T, Larkum ME, Polsky A, Schiller J: Properties of basal dendrites of layer 5 pyramidal neurons: a direct patch-clamp recording study. Nat Neurosci 2007,10(Feb):206–214. Zhou WL, Yan P, Wuskell JP, Loew LM, Antic SD: Dynamics of action potential backpropagation in basal dendrites of prefrontal cortical pyramidal neurons. Eur J Neurosci 2008,27(Feb):923–936. Schiller J, Major G, Koester HJ, Schiller Y: NMDA spikes in basal dendrites of cortical pyramidal neurons. Nature 2000,404(Mar):285–289. Schiller J, Schiller Y: NMDA receptor-mediated dendritic spikes and coincident signal amplification. Curr Opin Neurobiol 2001,11(Jun):343–348. Major G, Polsky A, Denk W, Schiller J, Tank DW: Spatiotemporally graded NMDA spike/plateau potentials in basal dendrites of neocortical pyramidal neurons. J Neurophysiol 2008,99(May):2584–2601. Larkum ME, Zhu JJ, Sakmann B: A new cellular mechanism for coupling inputs arriving at different cortical layers. Nature 1999,398(Mar):338–341. Larkum ME, Nevian T, Sandler M, Polsky A, Schiller J: Synaptic integration in tuft dendrites of layer 5 pyramidal neurons: a new unifying principle. Science 2009,325(Aug):756–760. Seung H: Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron 2003, 40: 1063–1073. 10.1016/S0896-6273(03)00761-X Fremaux N, Sprekeler H, Gerstner W: Functional requirements for reward-modulated spike-timing-dependent plasticity. J Neurosci 2010,30(Oct):13326–13337. Williams R: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 1992, 8: 229–256. Matsuda Y, Marzo A, Otani S: The presence of background dopamine signal converts long-term synaptic depression to potentiation in rat prefrontal cortex. J Neurosci 2006, 26: 4803–4810. 10.1523/JNEUROSCI.5312-05.2006 Seol G, Ziburkus J, Huang S, Song L, Kim I, Takamiya K, Huganir R, Lee H, Kirkwood A: Neuromodulators control the polarity of spike-timing-dependent synaptic plasticity. Neuron 2007, 55: 919–929. Erratum in: Neuron56:754. Erratum in: Neuron56:754. 10.1016/j.neuron.2007.08.013 Pawlak V, Kerr JN: Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity. J Neurosci 2008,28(Mar):2435–2446. Werfel J, Xie X, Seung HS: Learning curves for stochastic gradient descent in linear feedforward networks. Neural Comput 2005, 17: 2699–2718. 10.1162/089976605774320539 Urbanczik R, Senn W: Reinforcement learning in populations of spiking neurons. Nat Neurosci 2009, 12: 250–252. 10.1038/nn.2264 Dayan P, Abbott L: Theoretical Neuroscience. MIT Press, Cambridge; 2001. Pfister J, Toyoizumi T, Barber D, Gerstner W: Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning. Neural Comput 2006, 18: 1318–1348. 10.1162/neco.2006.18.6.1318 Bertsekas DP, Tsitsiklis JN: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs; 1989. Baxter J, Bartlett P: Infinite-horizon policy-gradient estimation. J Artif Intell Res 2001, 15: 319–350. Baxter J, Bartlett P, Weaver L: Experiments with infinite-horizon, policy-gradient estimation. J Artif Intell Res 2001, 15: 351–381. Friedrich J, Urbanczik R, Senn W: Spatio-temporal credit assignment in neuronal population learning. PLoS Comput Biol 2011.,7(Jun): Clopath C, Büsing L, Vasilaki E, Gerstner W: Connectivity reflects coding: a model of voltage-based STDP with homeostasis. Nat Neurosci 2010,13(Mar):344–352. Friedrich J, Urbanczik R, Senn W: Learning spike-based population codes by reward and population feedback. Neural Comput 2010, 22: 1698–1717. 10.1162/neco.2010.05-09-1010

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA