Policy compression: An information bottleneck in action selection
Tài liệu tham khảo
Abel, 2019, State abstraction as compression in apprenticeship learning, Proceedings of the AAAI Conference on Artificial Intelligence, 33, 3134, 10.1609/aaai.v33i01.33013134
Amir, 2020, Value-complexity tradeoff explains mouse navigational learning, PLoS Computational Biology, 16, e1008497, 10.1371/journal.pcbi.1008497
Arimoto, 1972, An algorithm for computing the capacity of arbitrary discrete memoryless channels, IEEE Transactions on Information Theory, 18, 14, 10.1109/TIT.1972.1054753
Bar-Gad, 2003, Information processing, dimensionality reduction and reinforcement learning in the basal ganglia, Progress in Neurobiology, 71, 439, 10.1016/j.pneurobio.2003.12.001
Bates, 2020, Efficient data compression in perception and perceptual memory, Psychological Review, 127, 891, 10.1037/rev0000197
Bates, 2019, Adaptive allocation of human visual working memory capacity during statistical and categorical learning, Journal of Vision, 19, 11, 10.1167/19.2.11
Berg, 1948, A simple objective technique for measuring flexibility in thinking, The Journal of General Psychology, 39, 15, 10.1080/00221309.1948.9918159
Berger, 1971
Bertelson, 1965, Serial choice reaction-time as a function of response versus signal-and-response repetition, Nature, 206, 217, 10.1038/206217a0
Bhui, 2018, Decision by sampling implements efficient coding of psychoeconomic functions, Psychological Review, 125, 985, 10.1037/rev0000123
Blahut, 1972, Computation of channel capacity and rate-distortion functions, IEEE Transactions on Information Theory, 18, 460, 10.1109/TIT.1972.1054855
Blum, 2003, PAC-MDL bounds, 344
Blumer, 1987, Occam's razor, Information Processing Letters, 24, 377, 10.1016/0020-0190(87)90114-1
Bo, 2009, Visuospatial working memory capacity predicts the organization of acquired explicit motor sequences, Journal of Neurophysiology, 101, 3116, 10.1152/jn.00006.2009
Botvinick, 2008, Hierarchical models of behavior and prefrontal function, Trends in Cognitive Sciences, 12, 201, 10.1016/j.tics.2008.02.009
Brady, 2009, Compression in visual working memory: Using statistical regularities to form more efficient memory representations, Journal of Experimental Psychology: General, 138, 487, 10.1037/a0016797
Collins, 2018, The tortoise and the hare: Interactions between reinforcement learning and working memory, Journal of Cognitive Neuroscience, 30, 1422, 10.1162/jocn_a_01238
Collins, 2014, Working memory contributions to reinforcement learning impairments in schizophrenia, Journal of Neuroscience, 34, 13747, 10.1523/JNEUROSCI.0989-14.2014
Collins, 2012, How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis, European Journal of Neuroscience, 35, 1024, 10.1111/j.1460-9568.2011.07980.x
Culbreth, 2016, Impaired activation in cognitive control regions predicts reversal learning in schizophrenia, Schizophrenia Bulletin, 42, 484, 10.1093/schbul/sbv075
Culbreth, 2018, Effort-based decision-making in schizophrenia, Current Opinion in Behavioral Sciences, 22, 1, 10.1016/j.cobeha.2017.12.003
Dassonville, 1999, Choice and stimulus–response compatibility affect duration of response selection, Cognitive Brain Research, 7, 235, 10.1016/S0926-6410(98)00027-5
Denti, 2019, A note on rational inattention and rate distortion theory, Decisions in Economics and Finance, 1
Dezfouli, 2012, Habits, action sequences and reinforcement learning, European Journal of Neuroscience, 35, 1036, 10.1111/j.1460-9568.2012.08050.x
Dickinson, 1985, Actions and habits: The development of behavioural autonomy, Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 308, 67, 10.1098/rstb.1985.0010
Faisal, 2008, Noise in the nervous system, Nature Reviews Neuroscience, 9, 292, 10.1038/nrn2258
Feng, 2014, Multitasking versus multiplexing: Toward a normative account of limitations in the simultaneous execution of control-demanding behaviors, Cognitive, Affective, & Behavioral Neuroscience, 14, 129, 10.3758/s13415-013-0236-9
Forbes, 2009, Working memory in schizophrenia: A meta-analysis, Psychological Medicine, 39, 889, 10.1017/S0033291708004558
Fox, 2016, Taming the noise in reinforcement learning via soft updates
Fründ, 2014, Quantifying the effect of intertrial dependence on perceptual decisions, Journal of Vision, 14, 9, 10.1167/14.7.9
Gershman, 2020, Origin of perseveration in the trade-off between reward and complexity, Cognition, 204, 104394, 10.1016/j.cognition.2020.104394
Gershman, 2021, The rational analysis of memory
Gershman, 2020, The reward-complexity trade-off in schizophrenia, bioRxiv
Grau-Moya, 2018, Soft q-learning with mutual-information regularization
Graybiel, 1998, The basal ganglia and chunking of action repertoires, Neurobiology of Learning and Memory, 70, 119, 10.1006/nlme.1998.3843
Hale, 1968, The relation of correct and error responses in a serial choice reaction task, Psychonomic Science, 13, 299, 10.3758/BF03342595
Hassett, 2017, Change in the relative contributions of habit and working memory facilitates serial reversal learning expertise in rhesus monkeys, Animal Cognition, 20, 485, 10.1007/s10071-017-1076-8
Hick, 1952, On the rate of gain of information, Quarterly Journal of Experimental Psychology, 4, 11, 10.1080/17470215208416600
Howarth, 1956, Non-random sequences in visual threshold experiments, Quarterly Journal of Experimental Psychology, 8, 163, 10.1080/17470215608416816
Huffman, 1952, A method for the construction of minimum-redundancy codes, Proceedings of the IRE, 40, 1098, 10.1109/JRPROC.1952.273898
Huys, 2015, Interplay of approximate planning strategies, Proceedings of the National Academy of Sciences of the United States of America, 112, 3098, 10.1073/pnas.1414219112
Hyman, 1953, Stimulus information as a determinant of reaction time, Journal of Experimental Psychology, 45, 188, 10.1037/h0056940
Icard, 2019, Why be random?, Mind
Jin, 2010, Start/stop signals emerge in nigrostriatal circuits during sequence learning, Nature, 466, 457, 10.1038/nature09263
Jin, 2014, Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences, Nature Neuroscience, 17, 423, 10.1038/nn.3632
Konda, 2000, Actor-critic algorithms, 1008
Lashley, 1951, The problem of serial order in behavior, 112
Lau, 2005, Dynamic response-by-response models of matching behavior in rhesus monkeys, Journal of the Experimental Analysis of Behavior, 84, 555, 10.1901/jeab.2005.110-04
Lehnert, 2019, Successor features combine elements of model-free and model-based reinforcement learning, bioRxiv
Lehnert, 2020, Reward-predictive representations generalize across tasks in reinforcement learning, PLoS Computational Biology, 16, e1008317, 10.1371/journal.pcbi.1008317
Lerch, 2018
Longstreth, 1988, Hick's law: Its limit is 3 bits, Bulletin of the Psychonomic Society, 26, 8, 10.3758/BF03334845
Matějka, 2015, Rational inattention to discrete choices: A new foundation for the multinomial logit model, American Economic Review, 105, 272, 10.1257/aer.20130047
Mathy, 2012, What's magic about magic numbers? Chunking and data compression in short-term memory, Cognition, 122, 346, 10.1016/j.cognition.2011.11.003
McDougle, 2021, Modeling the influence of working memory, reinforcement, and action uncertainty on reaction time and choice during instrumental learning, Psychonomic Bulletin & Review, 28, 20, 10.3758/s13423-020-01774-z
McFadden, 2001, Economic choices, American Economic Review, 91, 351, 10.1257/aer.91.3.351
Miller, 1956, The magical number seven, plus or minus two: Some limits on our capacity for processing information, Psychological Review, 63, 81, 10.1037/h0043158
Miller, 2019, Habits without values, Psychological Review, 126, 292, 10.1037/rev0000120
Miyapuram, 2006, Hierarchical chunking during learning of visuomotor sequences, 249
Mosteller, 1951, An experimental measurement of utility, Journal of Political Economy, 59, 371, 10.1086/257106
Mowbray, 1959, On the reduction of choice reaction times with practice, Quarterly Journal of Experimental Psychology, 11, 16, 10.1080/17470215908416282
Musslick, 2020, On the rational boundedness of cognitive control: Shared versus separated representations, PsyArXiv
Musslick, 2017, Multitasking capability versus learning efficiency in neural network architectures, 829
Nagy, 2020, Optimal forgetting: Semantic compression of episodic memories, PLoS Computational Biology, 16, 1, 10.1371/journal.pcbi.1008367
Nassar, 2018, Chunking as a rational strategy for lossy data compression in visual working memory, Psychological Review, 125, 486, 10.1037/rev0000101
Ngiam, 2019, “Memory compression” effects in visual working memory are contingent on explicit long-term memory, Journal of Experimental Psychology: General, 148, 1373, 10.1037/xge0000649
Nissen, 1987, Attentional requirements of learning: Evidence from performance measures, Cognitive Psychology, 19, 1, 10.1016/0010-0285(87)90002-8
Norman, 1981, Categorization of action slips, Psychological Review, 88, 1, 10.1037/0033-295X.88.1.1
Norris, 2020, Chunking and data compression in verbal short-term memory, Cognition, 208, 104534, 10.1016/j.cognition.2020.104534
Ostlund, 2009, Evidence of action sequence chunking in goal-directed instrumental conditioning and its dependence on the dorsomedial prefrontal cortex, Journal of Neuroscience, 29, 8280, 10.1523/JNEUROSCI.1176-09.2009
Parush, 2011, Dopaminergic balance between reward maximization and policy complexity, Frontiers in Systems Neuroscience, 5, 22, 10.3389/fnsys.2011.00022
Precup, D. (2000). Temporal abstraction in reinforcement learning (Unpublished doctoral dissertation). University of Massachusetts Amherst.
Precup, 1998, Theoretical results on reinforcement learning with temporally abstract options, Machine Learning: ECML-98, 382
Proctor, 2018, Hick's law for choice reaction time: A review, Quarterly Journal of Experimental Psychology, 71, 1281, 10.1080/17470218.2017.1322622
Ramkumar, 2016, Chunking as the result of an efficiency computation trade-off, Nature Communications, 7, 1, 10.1038/ncomms12176
Reddy, 2016, Probabilistic reversal learning in schizophrenia: Stability of deficits and potential causal mechanisms, Schizophrenia Bulletin, 42, 942, 10.1093/schbul/sbv226
Robbins, 1951, A stochastic approximation method, The Annals of Mathematical Statistics, 22, 400, 10.1214/aoms/1177729586
Robertson, 2007, The serial reaction time task: Implicit motor skill learning?, The Journal of Neuroscience, 27, 10073, 10.1523/JNEUROSCI.2747-07.2007
Rutledge, 2009, Dopaminergic drugs modulate learning rates and perseveration in Parkinson's patients in a dynamic foraging task, Journal of Neuroscience, 29, 15104, 10.1523/JNEUROSCI.3524-09.2009
Sagiv, 2018, Efficiency of learning vs. processing: Towards a normative theory of multitasking, 1004
Sakai, 2003, Chunking during human visuomotor sequence learning, Experimental Brain Research, 152, 229, 10.1007/s00221-003-1548-8
Schlagenhauf, 2014, Striatal dysfunction during reversal learning in unmedicated schizophrenia patients, Neuroimage, 89, 171, 10.1016/j.neuroimage.2013.11.034
Schulz, 2019, The algorithmic architecture of exploration in the human brain, Current Opinion in Neurobiology, 55, 7, 10.1016/j.conb.2018.11.003
Seibel, 1963, Discrimination reaction time for a 1,023-alternative task, Journal of Experimental Psychology, 66, 215, 10.1037/h0048914
Seidler, 2012, Neurocognitive contributions to motor skill learning: The role of working memory, Journal of Motor Behavior, 44, 445, 10.1080/00222895.2012.672348
Shannon, 1948, A mathematical theory of communication, The Bell System Technical Journal, 27, 379, 10.1002/j.1538-7305.1948.tb01338.x
Shima, 2007, Categorization of behavioural sequences in the prefrontal cortex, Nature, 445, 315, 10.1038/nature05470
Sims, 2012, An ideal observer analysis of visual working memory, Psychological Review, 119, 807, 10.1037/a0029856
Sims, 2016, Rate-distortion theory and human perception, Cognition, 152, 181, 10.1016/j.cognition.2016.03.020
Smith, 2013, A dual operator view of habitual behavior reflecting cortical and striatal dynamics, Neuron, 79, 361, 10.1016/j.neuron.2013.05.038
Still, 2012, An information-theoretic approach to curiosity-driven reinforcement learning, Theory in Biosciences, 131, 139, 10.1007/s12064-011-0142-z
Sutton, 2018
Teichner, 1974, Laws of visual choice reaction time, Psychological Review, 81, 75, 10.1037/h0035867
Terrace, 1991, Chunking during serial learning by a pigeon: I. Basic evidence, Journal of Experimental Psychology. Animal Behavior Processes, 17, 81, 10.1037/0097-7403.17.1.81
Thorndike, 1911
Tishby, 2011, Information theory of decisions and actions, 601
Tkačik, 2010, Optimal population coding by noisy spiking neurons, Proceedings of the National Academy of Sciences, 107, 14419, 10.1073/pnas.1004906107
Tomov, 2020, Discovery of hierarchical representations for efficient planning, PLoS Computational Biology, 16, e1007594, 10.1371/journal.pcbi.1007594
Verplanck, 1952, Nonindependence of successive responses in measurements of the visual threshold, Journal of Experimental Psychology, 44, 273, 10.1037/h0054948
Verwey, 1999, Evidence for a multistage model of practice in a sequential movement task, Journal of Experimental Psychology. Human Perception and Performance, 25, 1693, 10.1037/0096-1523.25.6.1693
Von Neumann, 1958
Wifall, 2016, The roles of stimulus and response uncertainty in forced-choice performance: An amendment to Hick/Hyman Law, Psychological Research, 80, 555, 10.1007/s00426-015-0675-8
Zelazo, 2006, The dimensional change card sort (DCCS): A method of assessing executive function in children, Nature Protocols, 1, 297, 10.1038/nprot.2006.46