A kernel based learning method for non-stationary two-player repeated games
Tài liệu tham khảo
Nash, 1951, Non-cooperative games, Ann. Mat., 54, 286, 10.2307/1969529
Wright, 2019, Level-0 models for predicting human behavior in games, J. Artificial Intelligence Res., 64, 357, 10.1613/jair.1.11361
Axelrod, 1984
Littman, 2003, A polynomial-time nash equilibrium algorithm for repeated games, 48
Brown, 1951, Iterative solutions of games by fictitious play, 374
Brandt, 2007, From external to internal regret, J. Mach. Learn. Res., 8, 1307
Robinson, 1951, An iterative method of solving a game, Ann. Math., 51, 296, 10.2307/1969530
Daskalakis, 2015, Near-optimal no-regret algorithms for zero-sum games, Games Econ. Behav., 92, 327, 10.1016/j.geb.2014.01.003
Zinkevich, 2008, Regret minimization in games with incomplete information, 1729
Johanson, 2012, Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization, 837
R. Arora, O. Dekel, A. Tewari, Online bandit learning against an adaptive adversary: From regret to policy regret, in: Proceedings of the 29th International Conference on Machine Learning, 2012.
Crandall, 2014, Towards minimizing disappointment in repeated games, J. Artificial Intelligence Res., 49, 111, 10.1613/jair.4202
Cesa-Bianchi, 2013, Online learning with switching costs and other adaptive adversaries, 1160
Bowling, 2002, Multiagent learning using a variable learning rate, Artificial Intelligence, 136, 215, 10.1016/S0004-3702(02)00121-2
Crandall, 2011, Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning, Mach. Learn., 82, 281, 10.1007/s10994-010-5192-9
Hernandez-Leal, 2017, An exploration strategy for non-stationary opponents, Auton. Agents Multi-Agent Syst., 31, 971, 10.1007/s10458-016-9347-3
Brafman, 2003, R-max a general polynomial time algorithm for near-optimal reinforcement learning, J. Mach. Learn. Res., 3, 213
Jensen, 2005, Rapid on-line temporal sequence prediction by an adaptive agent, 67
Jensen, 2005, Non-stationary policy learning in 2-player zero sum games
Mealing, 2013, Opponent modelling by sequence prediction and lookahead in two-player games, 385
Sepahvand, 2014, Sequential decisions: A computational comparison of observational and reinforcement accounts, PLoS One, 9, 1
Mertens, 1989, Repeated games, 205
von Neumman, 1944
Mohri, 2012
Shawe-Taylor, 2004
Lodhi, 2002, Text classification using string kernels, J. Mach. Learn. Res., 2, 419
Knuth, 1998
Armijo, 1966, Minimization of functions having lipschitz continuous first partial derivatives, Pacific J. Math., 16, 1, 10.2140/pjm.1966.16.1
E. Piccolo, G. Squillero, Adaptive opponent modelling for the iterated prisoner’s dilemma, in: Proceedings of the IEEE Congress on Evolutionary Computation, CEC, New Orleans, LA, USA, pp. 836–841.