A kernel based learning method for non-stationary two-player repeated games

Knowledge-Based Systems - Tập 196 - Trang 105820 - 2020

Renan Motta Goulart¹, Saul C. Leite², Raul Fonseca Neto³

¹Postgraduate Program in Computational Modeling - Universidade Federal de Juiz de Fora, Brazil

²Center for Mathematics, Computation and Cognition - Federal University of ABC, Brazil

³Computer Science Department - Federal University of Juiz de fora, Brazil

Tài liệu tham khảo

Nash, 1951, Non-cooperative games, Ann. Mat., 54, 286, 10.2307/1969529 Wright, 2019, Level-0 models for predicting human behavior in games, J. Artificial Intelligence Res., 64, 357, 10.1613/jair.1.11361 Axelrod, 1984 Littman, 2003, A polynomial-time nash equilibrium algorithm for repeated games, 48 Brown, 1951, Iterative solutions of games by fictitious play, 374 Brandt, 2007, From external to internal regret, J. Mach. Learn. Res., 8, 1307 Robinson, 1951, An iterative method of solving a game, Ann. Math., 51, 296, 10.2307/1969530 Daskalakis, 2015, Near-optimal no-regret algorithms for zero-sum games, Games Econ. Behav., 92, 327, 10.1016/j.geb.2014.01.003 Zinkevich, 2008, Regret minimization in games with incomplete information, 1729 Johanson, 2012, Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization, 837 R. Arora, O. Dekel, A. Tewari, Online bandit learning against an adaptive adversary: From regret to policy regret, in: Proceedings of the 29th International Conference on Machine Learning, 2012. Crandall, 2014, Towards minimizing disappointment in repeated games, J. Artificial Intelligence Res., 49, 111, 10.1613/jair.4202 Cesa-Bianchi, 2013, Online learning with switching costs and other adaptive adversaries, 1160 Bowling, 2002, Multiagent learning using a variable learning rate, Artificial Intelligence, 136, 215, 10.1016/S0004-3702(02)00121-2 Crandall, 2011, Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning, Mach. Learn., 82, 281, 10.1007/s10994-010-5192-9 Hernandez-Leal, 2017, An exploration strategy for non-stationary opponents, Auton. Agents Multi-Agent Syst., 31, 971, 10.1007/s10458-016-9347-3 Brafman, 2003, R-max a general polynomial time algorithm for near-optimal reinforcement learning, J. Mach. Learn. Res., 3, 213 Jensen, 2005, Rapid on-line temporal sequence prediction by an adaptive agent, 67 Jensen, 2005, Non-stationary policy learning in 2-player zero sum games Mealing, 2013, Opponent modelling by sequence prediction and lookahead in two-player games, 385 Sepahvand, 2014, Sequential decisions: A computational comparison of observational and reinforcement accounts, PLoS One, 9, 1 Mertens, 1989, Repeated games, 205 von Neumman, 1944 Mohri, 2012 Shawe-Taylor, 2004 Lodhi, 2002, Text classification using string kernels, J. Mach. Learn. Res., 2, 419 Knuth, 1998 Armijo, 1966, Minimization of functions having lipschitz continuous first partial derivatives, Pacific J. Math., 16, 1, 10.2140/pjm.1966.16.1 E. Piccolo, G. Squillero, Adaptive opponent modelling for the iterated prisoner’s dilemma, in: Proceedings of the IEEE Congress on Evolutionary Computation, CEC, New Orleans, LA, USA, pp. 836–841.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA