Hierarchical Reinforcement Learning Explains Task Interleaving Behavior

Christoph Gebhardt1, Antti Oulasvirta2, Otmar Hilliges1
1Eidgenossische Technische Hochschule Zurich, Zürich, Switzerland
2Aalto University, Espoo, Finland

Tóm tắt

How do people decide how long to continue in a task, when to switch, and to which other task? It is known that task interleaving adapts situationally, showing sensitivity to changes in expected rewards, costs, and task boundaries. However, the mechanisms that underpin the decision to stay in a task versus switch away are not thoroughly understood. Previous work has explained task interleaving by greedy heuristics and a policy that maximizes the marginal rate of return. However, it is unclear how such a strategy would allow for adaptation to environments that offer multiple tasks with complex switch costs and delayed rewards. Here, we develop a hierarchical model of supervisory control driven by reinforcement learning (RL). The core assumption is that the supervisory level learns to switch using task-specific approximate utility estimates, which are computed on the lower level. We show that a hierarchically optimal value function decomposition can be learned from experience, even in conditions with multiple tasks and arbitrary and uncertain reward and cost structures. The model also reproduces well-known key phenomena of task interleaving, such as the sensitivity to costs of resumption and immediate as well as delayed in-task rewards. In a demanding task interleaving study with 211 human participants and realistic tasks (reading, mathematics, question-answering, recognition), the model yielded better predictions of individual-level data than a flat (non-hierarchical) RL model and an omniscient-myopic baseline. Corroborating emerging evidence from cognitive neuroscience, our results suggest hierarchical RL as a plausible model of supervisory control in task interleaving.

Từ khóa


Tài liệu tham khảo

Altmann, E., & Trafton, J. (2002). Memory for goals: an activation-based model. Cognitive science, 26(1), 39–83. Altmann, E., & Trafton, J. (2007). Timecourse of recovery from task interruption: data and a model. Psychon Bull Review, 14(6), 1079–1084. Andre, D., & Russell, S. (2002). State abstraction for programmable reinforcement learning agents. In Eighteenth National Conference on Artificial Intelligence, 119–125. Bailey, B., & Konstan, J. (2006). On the need for attention-aware systems: measuring effects of interruption on task performance, error rate, and affective state. In Computers in Human Behavior, (Vol. 22 pp. 685–708). Balaguer, J., Spiers, H., Hassabis, D., & Summerfield, C. (2016). Neural mechanisms of hierarchical planning in a virtual subway network. Neuron, 90(4), 893–903. Botvinick, M. (2012). Hierarchical reinforcement learning and decision making. Curr Opin Neurobiol, 22(6), 956–962. Botvinick, M., Niv, Y., & Barto, A. (2009). Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition, 113(3), 262–280. Brumby, D., Janssen, C., Kujala, T., & Salvucci, D. (2018). Computational models of user multitasking, pp. 341–362. Brumby, D., Salvucci, D., & Howes, A. (2009). Focus on driving: how cognitive constraints shape the adaptation of strategy when dialing while driving. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 1629–1638. Dietterich, T. (1998). The maxq method for hierarchical reinforcement learning. In ICML, 98, 118–126. Duggan, G., Johnson, H., & Sørli, P. (2013). Interleaving tasks to improve performance: users maximise the marginal rate of return. Int J Hum-Comput St, 71(5), 533–550. Edwards, M., & Gronlund, S. (1998). Task Interruption and its Effects on Memory. Memory, 6 (6), 665–687. Frank, M., & Badre, D. (2011). Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. Cerebral Cortex, 22(3), 509–526. Gershman, S.J., & Uchida, N. (2019). Believing in dopamine, nature reviews neuroscience, 1–12. Ghavamzadeh, M., & Mahadevan, S. (2002). Hierarchically optimal average reward reinforcement learning. In ICML (pp. 195–202). Gutzwiller, R. (2014). Switch choice in applied multi-task management, Ph.D. thesis, Colorado State University. Libraries. Gutzwiller, R., Wickens, C., & Clegg, B. (2019). The role of reward and effort over time in task switching. Theoretical Issues in Ergonomics Science, 20(2), 196–214. Horrey, W., & Wickens, C. (2006). Examining the impact of cell phone conversations on driving using meta-analytic techniques. Human factors, 48(1), 196–205. Iani, C., & Wickens, C. (2007). Factors affecting task management in aviation. Human factors, 49(1), 16–24. Iqbal, S., & Bailey, B. (2008). Effects of intelligent notification management on users and their tasks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 93–102), DOI https://doi.org/10.1145/1357054.1357070, (to appear in print). Janssen, C., & Brumby, D. (2010). Strategic adaptation to performance objectives in a dual-task setting. Cognitive science, 34(8), 1548–1560. Janssen, C., & Brumby, D. (2015). Strategic adaptation to task characteristics, incentives, and individual differences in dual-tasking. PLOS ONE, 10(7), 1–32. Janssen, C., Brumby, D., & Garnett, R. (2012). Natural break points: the influence of priorities & cognitive & motor cues on dual-task interleaving. J. Cogn. Eng. Decis. Mak., 6(1), 5–29. Jersild, A. (1927). Mental set and shift. Arch. of psychology. Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2), 99–134. Kangasrääsiö, A., Athukorala, K., Howes, A., Corander, J., Kaski, S., & Oulasvirta, A. (2017). Inferring cognitive models from data using approximate Bayesian computation. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 1295–1306). Kangasrääsiö, A., Jokinen, J.P., Oulasvirta, A., Howes, A., & Kaski, S. (2019). Parameter inference for computational cognitive models with approximate Bayesian computation. Cognitive science, 43(6), e12738. Kiesel, A., Steinhauser, M., Wendt, M., Falkenstein, M., Jost, K., Philipp, A.M., & Koch, I. (2010). Control and interference in task switching—a review. Psychological Bulletin, 136(5), 849–874. Kriegeskorte, N., & Douglas, P. (2018). Cognitive computational neuroscience. Nature Neuroscience, 21(9), 1148–1160. Krishnan, S., Garg, A., Liaw, R., Miller, L., Pokorny, F.T., & Goldberg, K. (2016). Hirl: hierarchical inverse reinforcement learning for long-horizon tasks with delayed rewards. arXiv:1604.06508. Levenshtein, V.I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady, 10, 707–710. Lintusaari, J., Vuollekoski, H., Kangasrääsiö, A., Skytén, K., Järvenpää, M., Marttinen, P., Gutmann, M., Vehtari, A., Corander, J., & Kaski, S. (2018). Elfi: Engine for likelihood-free inference. JMLR, 19(1), 643–649. McFarlane, D. (2002). The scope and importance of human interruption in human-computer interaction design. Human-Computer Interaction, 17(1), 1–61. Monk, C., Boehm-Davis, D., & Mason, G. (2004). Recovering from interruptions: implications for driver distraction research. Human factors, 46(4), 650–663. Monsell, S. (2003). Task switching. Trends in cognitive sciences, 7(3), 134–140. Norman, D., & Shallice, T. (1986). Attention to action, Consciousness and Self-Regulation: Advances in Research and Theory Volume 4 (pp. 1–18). Oberauer, K., & Lewandowsky, S. (2011). Modeling working memory: a computational implementation of the Time-Based Resource-Sharing theory. Psychon Bull Review, 18(1), 10–45. Oulasvirta, A., & Saariluoma, P. (2006). Surviving task interruptions: investigating the impl. of long-term working memory theory. Int J Hum-Comput St, 64(10), 941–961. Payne, S., Duggan, G., & Neth, H. (2007). Discretionary task interleaving: heuristics for time allocation in cognitive foraging. Journal of Experimental Psychology: General, 136(3), 370. Raby, M., & Wickens, C.D. (1994). Strategic workload management and decision biases in aviation. The International Journal of Aviation Psychology, 4(3), 211–240. Rasmussen, D., Voelker, A., & Eliasmith, C. (2017). A neural model of hierarchical reinf. learning PloS one, 12 7. Rubinstein, J., Meyer, D., & Evans, J. (2001). Executive control of cognitive processes in task switching. Journal of Experimental Psychology: Human Perception and Performance, 27(4), 763. Salvucci, D., & Taatgen, N. (2008). Threaded cognition: an integrated theory of concurrent multitasking. Psychology Review, 115(1), 101. Salvucci, D., Taatgen, N., & Borst, J. (2009). Toward a unified theory of the multitasking continuum: From concurrent performance to task switching, interruption, and resumption. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1819–1828). Sutton, R., & Barto, A. (1998). Introduction to reinforcement learning, vol. 135. Trafton, J., Altmann, E., Brock, D., & Mintz, F. (2003). Preparing to resume an interrupted task: effects of prospective goal encoding and retrospective rehearsal. Int J Hum-Comput St, 58(5), 583–603. Wickens, C., Gutzwiller, R., & Santamaria, A. (2015). Discrete task switching in overload: a meta-analyses and a model. Int J Hum-Comput St, 79, 79–84. Wickens, C., & McCarley, J. (2008). Executive control: attention switching, interruptions, and task management. In Consciousness and self-regulation, 145–160.