Policy compression: An information bottleneck in action selection

Psychology of Learning and Motivation - Tập 74 - Trang 195-232 - 2021
Lucy Lai1, Samuel J. Gershman2
1Program in Neuroscience, Harvard University, Cambridge, MA, United States
2Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, United States

Tài liệu tham khảo

Abel, 2019, State abstraction as compression in apprenticeship learning, Proceedings of the AAAI Conference on Artificial Intelligence, 33, 3134, 10.1609/aaai.v33i01.33013134 Amir, 2020, Value-complexity tradeoff explains mouse navigational learning, PLoS Computational Biology, 16, e1008497, 10.1371/journal.pcbi.1008497 Arimoto, 1972, An algorithm for computing the capacity of arbitrary discrete memoryless channels, IEEE Transactions on Information Theory, 18, 14, 10.1109/TIT.1972.1054753 Bar-Gad, 2003, Information processing, dimensionality reduction and reinforcement learning in the basal ganglia, Progress in Neurobiology, 71, 439, 10.1016/j.pneurobio.2003.12.001 Bates, 2020, Efficient data compression in perception and perceptual memory, Psychological Review, 127, 891, 10.1037/rev0000197 Bates, 2019, Adaptive allocation of human visual working memory capacity during statistical and categorical learning, Journal of Vision, 19, 11, 10.1167/19.2.11 Berg, 1948, A simple objective technique for measuring flexibility in thinking, The Journal of General Psychology, 39, 15, 10.1080/00221309.1948.9918159 Berger, 1971 Bertelson, 1965, Serial choice reaction-time as a function of response versus signal-and-response repetition, Nature, 206, 217, 10.1038/206217a0 Bhui, 2018, Decision by sampling implements efficient coding of psychoeconomic functions, Psychological Review, 125, 985, 10.1037/rev0000123 Blahut, 1972, Computation of channel capacity and rate-distortion functions, IEEE Transactions on Information Theory, 18, 460, 10.1109/TIT.1972.1054855 Blum, 2003, PAC-MDL bounds, 344 Blumer, 1987, Occam's razor, Information Processing Letters, 24, 377, 10.1016/0020-0190(87)90114-1 Bo, 2009, Visuospatial working memory capacity predicts the organization of acquired explicit motor sequences, Journal of Neurophysiology, 101, 3116, 10.1152/jn.00006.2009 Botvinick, 2008, Hierarchical models of behavior and prefrontal function, Trends in Cognitive Sciences, 12, 201, 10.1016/j.tics.2008.02.009 Brady, 2009, Compression in visual working memory: Using statistical regularities to form more efficient memory representations, Journal of Experimental Psychology: General, 138, 487, 10.1037/a0016797 Collins, 2018, The tortoise and the hare: Interactions between reinforcement learning and working memory, Journal of Cognitive Neuroscience, 30, 1422, 10.1162/jocn_a_01238 Collins, 2014, Working memory contributions to reinforcement learning impairments in schizophrenia, Journal of Neuroscience, 34, 13747, 10.1523/JNEUROSCI.0989-14.2014 Collins, 2012, How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis, European Journal of Neuroscience, 35, 1024, 10.1111/j.1460-9568.2011.07980.x Culbreth, 2016, Impaired activation in cognitive control regions predicts reversal learning in schizophrenia, Schizophrenia Bulletin, 42, 484, 10.1093/schbul/sbv075 Culbreth, 2018, Effort-based decision-making in schizophrenia, Current Opinion in Behavioral Sciences, 22, 1, 10.1016/j.cobeha.2017.12.003 Dassonville, 1999, Choice and stimulus–response compatibility affect duration of response selection, Cognitive Brain Research, 7, 235, 10.1016/S0926-6410(98)00027-5 Denti, 2019, A note on rational inattention and rate distortion theory, Decisions in Economics and Finance, 1 Dezfouli, 2012, Habits, action sequences and reinforcement learning, European Journal of Neuroscience, 35, 1036, 10.1111/j.1460-9568.2012.08050.x Dickinson, 1985, Actions and habits: The development of behavioural autonomy, Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 308, 67, 10.1098/rstb.1985.0010 Faisal, 2008, Noise in the nervous system, Nature Reviews Neuroscience, 9, 292, 10.1038/nrn2258 Feng, 2014, Multitasking versus multiplexing: Toward a normative account of limitations in the simultaneous execution of control-demanding behaviors, Cognitive, Affective, & Behavioral Neuroscience, 14, 129, 10.3758/s13415-013-0236-9 Forbes, 2009, Working memory in schizophrenia: A meta-analysis, Psychological Medicine, 39, 889, 10.1017/S0033291708004558 Fox, 2016, Taming the noise in reinforcement learning via soft updates Fründ, 2014, Quantifying the effect of intertrial dependence on perceptual decisions, Journal of Vision, 14, 9, 10.1167/14.7.9 Gershman, 2020, Origin of perseveration in the trade-off between reward and complexity, Cognition, 204, 104394, 10.1016/j.cognition.2020.104394 Gershman, 2021, The rational analysis of memory Gershman, 2020, The reward-complexity trade-off in schizophrenia, bioRxiv Grau-Moya, 2018, Soft q-learning with mutual-information regularization Graybiel, 1998, The basal ganglia and chunking of action repertoires, Neurobiology of Learning and Memory, 70, 119, 10.1006/nlme.1998.3843 Hale, 1968, The relation of correct and error responses in a serial choice reaction task, Psychonomic Science, 13, 299, 10.3758/BF03342595 Hassett, 2017, Change in the relative contributions of habit and working memory facilitates serial reversal learning expertise in rhesus monkeys, Animal Cognition, 20, 485, 10.1007/s10071-017-1076-8 Hick, 1952, On the rate of gain of information, Quarterly Journal of Experimental Psychology, 4, 11, 10.1080/17470215208416600 Howarth, 1956, Non-random sequences in visual threshold experiments, Quarterly Journal of Experimental Psychology, 8, 163, 10.1080/17470215608416816 Huffman, 1952, A method for the construction of minimum-redundancy codes, Proceedings of the IRE, 40, 1098, 10.1109/JRPROC.1952.273898 Huys, 2015, Interplay of approximate planning strategies, Proceedings of the National Academy of Sciences of the United States of America, 112, 3098, 10.1073/pnas.1414219112 Hyman, 1953, Stimulus information as a determinant of reaction time, Journal of Experimental Psychology, 45, 188, 10.1037/h0056940 Icard, 2019, Why be random?, Mind Jin, 2010, Start/stop signals emerge in nigrostriatal circuits during sequence learning, Nature, 466, 457, 10.1038/nature09263 Jin, 2014, Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences, Nature Neuroscience, 17, 423, 10.1038/nn.3632 Konda, 2000, Actor-critic algorithms, 1008 Lashley, 1951, The problem of serial order in behavior, 112 Lau, 2005, Dynamic response-by-response models of matching behavior in rhesus monkeys, Journal of the Experimental Analysis of Behavior, 84, 555, 10.1901/jeab.2005.110-04 Lehnert, 2019, Successor features combine elements of model-free and model-based reinforcement learning, bioRxiv Lehnert, 2020, Reward-predictive representations generalize across tasks in reinforcement learning, PLoS Computational Biology, 16, e1008317, 10.1371/journal.pcbi.1008317 Lerch, 2018 Longstreth, 1988, Hick's law: Its limit is 3 bits, Bulletin of the Psychonomic Society, 26, 8, 10.3758/BF03334845 Matějka, 2015, Rational inattention to discrete choices: A new foundation for the multinomial logit model, American Economic Review, 105, 272, 10.1257/aer.20130047 Mathy, 2012, What's magic about magic numbers? Chunking and data compression in short-term memory, Cognition, 122, 346, 10.1016/j.cognition.2011.11.003 McDougle, 2021, Modeling the influence of working memory, reinforcement, and action uncertainty on reaction time and choice during instrumental learning, Psychonomic Bulletin & Review, 28, 20, 10.3758/s13423-020-01774-z McFadden, 2001, Economic choices, American Economic Review, 91, 351, 10.1257/aer.91.3.351 Miller, 1956, The magical number seven, plus or minus two: Some limits on our capacity for processing information, Psychological Review, 63, 81, 10.1037/h0043158 Miller, 2019, Habits without values, Psychological Review, 126, 292, 10.1037/rev0000120 Miyapuram, 2006, Hierarchical chunking during learning of visuomotor sequences, 249 Mosteller, 1951, An experimental measurement of utility, Journal of Political Economy, 59, 371, 10.1086/257106 Mowbray, 1959, On the reduction of choice reaction times with practice, Quarterly Journal of Experimental Psychology, 11, 16, 10.1080/17470215908416282 Musslick, 2020, On the rational boundedness of cognitive control: Shared versus separated representations, PsyArXiv Musslick, 2017, Multitasking capability versus learning efficiency in neural network architectures, 829 Nagy, 2020, Optimal forgetting: Semantic compression of episodic memories, PLoS Computational Biology, 16, 1, 10.1371/journal.pcbi.1008367 Nassar, 2018, Chunking as a rational strategy for lossy data compression in visual working memory, Psychological Review, 125, 486, 10.1037/rev0000101 Ngiam, 2019, “Memory compression” effects in visual working memory are contingent on explicit long-term memory, Journal of Experimental Psychology: General, 148, 1373, 10.1037/xge0000649 Nissen, 1987, Attentional requirements of learning: Evidence from performance measures, Cognitive Psychology, 19, 1, 10.1016/0010-0285(87)90002-8 Norman, 1981, Categorization of action slips, Psychological Review, 88, 1, 10.1037/0033-295X.88.1.1 Norris, 2020, Chunking and data compression in verbal short-term memory, Cognition, 208, 104534, 10.1016/j.cognition.2020.104534 Ostlund, 2009, Evidence of action sequence chunking in goal-directed instrumental conditioning and its dependence on the dorsomedial prefrontal cortex, Journal of Neuroscience, 29, 8280, 10.1523/JNEUROSCI.1176-09.2009 Parush, 2011, Dopaminergic balance between reward maximization and policy complexity, Frontiers in Systems Neuroscience, 5, 22, 10.3389/fnsys.2011.00022 Precup, D. (2000). Temporal abstraction in reinforcement learning (Unpublished doctoral dissertation). University of Massachusetts Amherst. Precup, 1998, Theoretical results on reinforcement learning with temporally abstract options, Machine Learning: ECML-98, 382 Proctor, 2018, Hick's law for choice reaction time: A review, Quarterly Journal of Experimental Psychology, 71, 1281, 10.1080/17470218.2017.1322622 Ramkumar, 2016, Chunking as the result of an efficiency computation trade-off, Nature Communications, 7, 1, 10.1038/ncomms12176 Reddy, 2016, Probabilistic reversal learning in schizophrenia: Stability of deficits and potential causal mechanisms, Schizophrenia Bulletin, 42, 942, 10.1093/schbul/sbv226 Robbins, 1951, A stochastic approximation method, The Annals of Mathematical Statistics, 22, 400, 10.1214/aoms/1177729586 Robertson, 2007, The serial reaction time task: Implicit motor skill learning?, The Journal of Neuroscience, 27, 10073, 10.1523/JNEUROSCI.2747-07.2007 Rutledge, 2009, Dopaminergic drugs modulate learning rates and perseveration in Parkinson's patients in a dynamic foraging task, Journal of Neuroscience, 29, 15104, 10.1523/JNEUROSCI.3524-09.2009 Sagiv, 2018, Efficiency of learning vs. processing: Towards a normative theory of multitasking, 1004 Sakai, 2003, Chunking during human visuomotor sequence learning, Experimental Brain Research, 152, 229, 10.1007/s00221-003-1548-8 Schlagenhauf, 2014, Striatal dysfunction during reversal learning in unmedicated schizophrenia patients, Neuroimage, 89, 171, 10.1016/j.neuroimage.2013.11.034 Schulz, 2019, The algorithmic architecture of exploration in the human brain, Current Opinion in Neurobiology, 55, 7, 10.1016/j.conb.2018.11.003 Seibel, 1963, Discrimination reaction time for a 1,023-alternative task, Journal of Experimental Psychology, 66, 215, 10.1037/h0048914 Seidler, 2012, Neurocognitive contributions to motor skill learning: The role of working memory, Journal of Motor Behavior, 44, 445, 10.1080/00222895.2012.672348 Shannon, 1948, A mathematical theory of communication, The Bell System Technical Journal, 27, 379, 10.1002/j.1538-7305.1948.tb01338.x Shima, 2007, Categorization of behavioural sequences in the prefrontal cortex, Nature, 445, 315, 10.1038/nature05470 Sims, 2012, An ideal observer analysis of visual working memory, Psychological Review, 119, 807, 10.1037/a0029856 Sims, 2016, Rate-distortion theory and human perception, Cognition, 152, 181, 10.1016/j.cognition.2016.03.020 Smith, 2013, A dual operator view of habitual behavior reflecting cortical and striatal dynamics, Neuron, 79, 361, 10.1016/j.neuron.2013.05.038 Still, 2012, An information-theoretic approach to curiosity-driven reinforcement learning, Theory in Biosciences, 131, 139, 10.1007/s12064-011-0142-z Sutton, 2018 Teichner, 1974, Laws of visual choice reaction time, Psychological Review, 81, 75, 10.1037/h0035867 Terrace, 1991, Chunking during serial learning by a pigeon: I. Basic evidence, Journal of Experimental Psychology. Animal Behavior Processes, 17, 81, 10.1037/0097-7403.17.1.81 Thorndike, 1911 Tishby, 2011, Information theory of decisions and actions, 601 Tkačik, 2010, Optimal population coding by noisy spiking neurons, Proceedings of the National Academy of Sciences, 107, 14419, 10.1073/pnas.1004906107 Tomov, 2020, Discovery of hierarchical representations for efficient planning, PLoS Computational Biology, 16, e1007594, 10.1371/journal.pcbi.1007594 Verplanck, 1952, Nonindependence of successive responses in measurements of the visual threshold, Journal of Experimental Psychology, 44, 273, 10.1037/h0054948 Verwey, 1999, Evidence for a multistage model of practice in a sequential movement task, Journal of Experimental Psychology. Human Perception and Performance, 25, 1693, 10.1037/0096-1523.25.6.1693 Von Neumann, 1958 Wifall, 2016, The roles of stimulus and response uncertainty in forced-choice performance: An amendment to Hick/Hyman Law, Psychological Research, 80, 555, 10.1007/s00426-015-0675-8 Zelazo, 2006, The dimensional change card sort (DCCS): A method of assessing executive function in children, Nature Protocols, 1, 297, 10.1038/nprot.2006.46