Transfer of Learned Opponent Models in Zero Sum Games
Tóm tắt
Human learning transfer abilities take advantage of important cognitive building blocks such as an abstract representation of concepts underlying tasks and causal models of the environment. One way to build abstract representations of the environment when the task involves interactions with others is to build a model of the opponent that may inform what actions they are likely to take next. In this study, we explore opponent modelling and its transfer in games where human agents play against computer agents with human-like limited degrees of iterated reasoning. In two experiments, we find that participants deviate from Nash equilibrium play and learn to adapt to their opponent’s strategy to exploit it. Moreover, we show that participants transfer their learning to new games. Computational modelling shows that players start each game with a model-based learning strategy that facilitates between-game transfer of their opponent’s strategy, but then switch to behaviour that is consistent with a model-free learning strategy in the latter stages of the interaction.
Tài liệu tham khảo
Batzilis, D., Jaffe, S., Levitt, S., List, J. A., & Picel, J. (2019). Behavior in strategic settings: evidence from a million rock-paper-scissors games. Games, 10(2), 18. Multidisciplinary Digital Publishing Institute.
Brockbank, E., & Vul, E. (2021). Formalizing opponent modeling with the rock, paper, scissors game. Games, 12(3), 70. Multidisciplinary Digital Publishing Institute.
Camerer, C. F. (2003). Behavioural studies of strategic thinking in games. Trends in Cognitive Sciences, 7(5), 225–231. Elsevier.
Camerer, C., & Ho, T. H. (1999). Experience-weighted attraction learning in normal form games. Econometrica, 67(4), 827–874. Wiley Online Library.
Camerer, C. F., Ho, T. -H., & Chong, J. -K. (2004). A cognitive hierarchy model of games. The Quarterly Journal of Economics, 119(3), 861–398. MIT Press.
Cheung, Y. -W., & Friedman, D. (1994). Learning in evolutionary games: some laboratory results. Santa Cruz: University of California.
Costa-Gomes, M., Crawford, V. P., & Broseta, B. (2001). Cognition and behavior in normal-form games: an experimental study. Econometrica, 69(5), 1193–1235. Wiley Online Library.
Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12), 1704–1711. Nature Publishing Group.
Dyson, B. J. (2019). Behavioural isomorphism, cognitive economy and recursive thought in non-transitive game strategy. Games, 10(3), 32. Multidisciplinary Digital Publishing Institute.
Dyson, B. J., Wilbiks, J. M. P., Sandhu, R., Papanicolaou, G., & Lintag, J. (2016). Negative outcomes evoke cyclic irrational decisions in rock, paper, scissors. Scientific Reports, 6(1), 1–6. Nature Publishing Group.
Eyler, D., Shalla, Z., Doumaux, A., & McDevitt, T. (2009). Winning at rock-paper-scissors. The College Mathematics Journal, 40(2), 125–128.
Goodie, A. S., Doshi, P., & Young, D. L. (2012). Levels of theory-of-mind reasoning in competitive games. Journal of Behavioral Decision Making, 25(1), 95–108. https://doi.org/10.1002/bdm.717.
Hedden, T., & Zhang, J. (2002). What do you think I think you think?: Strategic reasoning in matrix games. Cognition, 85(1), 1–36. https://doi.org/10.1016/S0010-0277(02)00054-9.
Ho, T. -H., Camerer, C. F., & Weigelt, K. (1998). Iterated dominance and iterated best response in experimental “p-beauty contests”. The American Economic Review, 88(4), 947–969 . JSTOR.
Ho, T. H., Camerer, C. F., & Chong, J. -K. (2007). Self-tuning experience weighted attraction learning in games. Journal of Economic Theory, 133(1), 177–198. Elsevier.
Jones, M., & Zhang, J. (2004). Rationality and bounded information in repeated games, with application to the iterated prisoner’s dilemma. Journal of Mathematical Psychology, 48(5), 334–354. Elsevier.
Kool, W., Rosen, Z. B., & McGuire, J.T. (2011). Decision making and the avoidance of cognitive demand. Experimental Psychology. https://doi.org/10.2996/kmj/1138846322.
Knez, M., & Camerer, C. (2000). Increasing cooperation in prisoner’s dilemmas by establishing a precedent of efficiency in coordination games. Organizational Behavior and Human Decision Processes, 82(2), 194–216. Elsevier.
Lake, B.M., Ullman, T.D., Tenenbaum, J.B., & Gershman, S.J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40. Cambridge University Press. https://doi.org/10.1017/S0140525X16001837.
Lieder, F., & Griffiths, T.L. (2020). Resource-rational analysis: understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 43. Cambridge University Press.
Mandler, J.M. (2004). The foundations of mind: origins of conceptual thought. Oxford: Oxford University Press.
Mertens, J. -F. (1990). Repeated games. In Game theory and applications (pp. 77–130). Elsevier.
Mullen, K., Ardia, D., Gil, D., Windover, D., & Cline, J. (2011). DEoptim: an R package for global optimization by differential evolution. Journal of Statistical Software, 40(6), 1–26. https://doi.org/10.18637/jss.v040.i06.
Nagel, R. (1995). Unraveling in guessing games: an experimental study. The American Economic Review, 85(5), 1313–1326. JSTOR.
Shachat, J., & Swarthout, J. T. (2004). Do we detect and exploit mixed strategy play by opponents? Mathematical Methods of Operations Research, 59(3), 359–373. Springer.
Simon, D. A., & Daw, N. D. (2011). Environmental statistics and the trade-off between model-based and TD learning in humans. In Advances in neural information processing systems 24: 25th annual conference on neural information processing systems 2011, NIPS 2011 (pp. 1–9).
Spiliopoulos, L. (2013). Strategic adaptation of humans playing computer algorithms in a repeated constant-sum game. Autonomous Agents and Multi-Agent Systems, 27(1), 131–160. Springer.
Stahl, D. O. (2000). Rule learning in symmetric normal-form games: theory and evidence. Games and Economic Behavior, 32(1), 105–138. Elsevier.
Stahl, D. O. (2003). Sophisticated learning and learning sophistication. Available at SSRN 410921.
Stahl, D. O., & Wilson, P. W. (1995). On players models of other players: theory and experimental evidence. Games and Economic Behavior, 10(1), 218–254. Elsevier.
Visser, I., & Speekenbrink, M. (2010). depmixS4: an R package for hidden Markov models. Journal of Statistical Software, 36(7), 1–21.
Wang, Z., Xu, B., & Zhou, H. -J. (2014). Social cycling and conditional responses in the rock-paper-scissors game. Scientific Reports, 4(1), 1–7. Nature Publishing Group.
Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292. Springer.
Xu, B., Zhou, H. -J., & Wang, Z. (2013). Cycle frequency in standard rock–paper–scissors games: evidence from experimental economics. Physica A: Statistical Mechanics and Its Applications, 392(20), 4997–5005. Elsevier.
Zhang, H., Moisan, F., & Gonzalez, C. (2021). Rock-paper-scissors play: beyond the win-stay/lose-change strategy. Games, 12(3), 52. https://doi.org/10.3390/g12030052.