A kernel based learning method for non-stationary two-player repeated games

Knowledge-Based Systems - Tập 196 - Trang 105820 - 2020
Renan Motta Goulart1, Saul C. Leite2, Raul Fonseca Neto3
1Postgraduate Program in Computational Modeling - Universidade Federal de Juiz de Fora, Brazil
2Center for Mathematics, Computation and Cognition - Federal University of ABC, Brazil
3Computer Science Department - Federal University of Juiz de fora, Brazil

Tài liệu tham khảo

Nash, 1951, Non-cooperative games, Ann. Mat., 54, 286, 10.2307/1969529 Wright, 2019, Level-0 models for predicting human behavior in games, J. Artificial Intelligence Res., 64, 357, 10.1613/jair.1.11361 Axelrod, 1984 Littman, 2003, A polynomial-time nash equilibrium algorithm for repeated games, 48 Brown, 1951, Iterative solutions of games by fictitious play, 374 Brandt, 2007, From external to internal regret, J. Mach. Learn. Res., 8, 1307 Robinson, 1951, An iterative method of solving a game, Ann. Math., 51, 296, 10.2307/1969530 Daskalakis, 2015, Near-optimal no-regret algorithms for zero-sum games, Games Econ. Behav., 92, 327, 10.1016/j.geb.2014.01.003 Zinkevich, 2008, Regret minimization in games with incomplete information, 1729 Johanson, 2012, Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization, 837 R. Arora, O. Dekel, A. Tewari, Online bandit learning against an adaptive adversary: From regret to policy regret, in: Proceedings of the 29th International Conference on Machine Learning, 2012. Crandall, 2014, Towards minimizing disappointment in repeated games, J. Artificial Intelligence Res., 49, 111, 10.1613/jair.4202 Cesa-Bianchi, 2013, Online learning with switching costs and other adaptive adversaries, 1160 Bowling, 2002, Multiagent learning using a variable learning rate, Artificial Intelligence, 136, 215, 10.1016/S0004-3702(02)00121-2 Crandall, 2011, Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning, Mach. Learn., 82, 281, 10.1007/s10994-010-5192-9 Hernandez-Leal, 2017, An exploration strategy for non-stationary opponents, Auton. Agents Multi-Agent Syst., 31, 971, 10.1007/s10458-016-9347-3 Brafman, 2003, R-max a general polynomial time algorithm for near-optimal reinforcement learning, J. Mach. Learn. Res., 3, 213 Jensen, 2005, Rapid on-line temporal sequence prediction by an adaptive agent, 67 Jensen, 2005, Non-stationary policy learning in 2-player zero sum games Mealing, 2013, Opponent modelling by sequence prediction and lookahead in two-player games, 385 Sepahvand, 2014, Sequential decisions: A computational comparison of observational and reinforcement accounts, PLoS One, 9, 1 Mertens, 1989, Repeated games, 205 von Neumman, 1944 Mohri, 2012 Shawe-Taylor, 2004 Lodhi, 2002, Text classification using string kernels, J. Mach. Learn. Res., 2, 419 Knuth, 1998 Armijo, 1966, Minimization of functions having lipschitz continuous first partial derivatives, Pacific J. Math., 16, 1, 10.2140/pjm.1966.16.1 E. Piccolo, G. Squillero, Adaptive opponent modelling for the iterated prisoner’s dilemma, in: Proceedings of the IEEE Congress on Evolutionary Computation, CEC, New Orleans, LA, USA, pp. 836–841.