Approachability in Stackelberg Stochastic Games with Vector Costs
Tóm tắt
The notion of approachability was introduced by Blackwell (Pac J Math 6(1):1–8, 1956) in the context of vector-valued repeated games. The famous ‘Blackwell’s approachability theorem’ prescribes a strategy for approachability, i.e., for ‘steering’ the average vector cost of a given agent toward a given target set, irrespective of the strategies of the other agents. In this paper, motivated by the multi-objective optimization/decision-making problems in dynamically changing environments, we address the approachability problem in Stackelberg stochastic games with vector-valued cost functions. We make two main contributions. Firstly, we give a simple and computationally tractable strategy for approachability for Stackelberg stochastic games along the lines of Blackwell’s. Secondly, we give a reinforcement learning algorithm for learning the approachable strategy when the transition kernel is unknown. We also recover as a by-product Blackwell’s necessary and sufficient conditions for approachability for convex sets in this setup and thus a complete characterization. We give sufficient conditions for non-convex sets.
Tài liệu tham khảo
Abounadi J, Bertsekas D, Borkar VS (2001) Learning algorithms for Markov decision processes with average cost. SIAM J Control Optim 40(3):681–698
Altman E (1999) Constrained Markov decision processes, vol 7. CRC Press, Boca Raton
Aubin JP, Cellina A (1984) Differential inclusions: set-valued maps and viability theory. Springer, New York
Bardi M, Capuzzo-Dolcetta I (2008) Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Springer, Berlin
Barwell AD (2011) Omega-limit sets of discrete dynamical systems. Ph.D. dissertation, University of Birmingham
Benaim M, Hofbauer J, Sorin S (2005) Stochastic approximations and differential inclusions. SIAM J Control Optim 44(1):328–348
Benaim M, Hofbauer J, Sorin S (2006) Stochastic approximations and differential inclusions, part ii: applications. Math Oper Res 31(4):673–695
Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pac J Math 6(1):1–8
Borkar VS (1991) Topics in controlled Markov chains. Longman Scientific & Technical, Harlow
Borkar VS (1998) Asynchronous stochastic approximation. SIAM J Control and Optim 36(3):840–851
Borkar VS (2005) An actor-critic algorithm for constrained Markov decision processes. Syst Control Lett 54(3):207–213
Borkar VS (2008) Stochastic approximation: a dynamical systems viewpoint. Hindustan Publ Agency, New Delhi, Cambridge University Press, Cambridge
Borkar VS, Meyn SP (2000) The ode method for convergence of stochastic approximation and reinforcement learning. SIAM J Control and Optim 38(2):447–469
Even-Dar E, Kakade S, Mansour Y (2009) Online Markov decision processes. Math Oper Res 34(3):726–736
Filar J, Vrieze K (1996) Competitive Markov decision processes. Springer, New York
Kamal S (2010) A vector minmax problem for controlled Markov chains. Arxiv preprint, arXiv:1011.0675v1
Mannor S, Shimkin N (2003) The empirical Bayes envelope and regret minimization in competitive Markov decision processes. Math Oper Res 28(2):327–345
Mannor S, Shimkin N (2004) A geometric approach to multi-criterion reinforcement learning. J Mach Learn Res 5:325–360
Milman E (2006) Approachable sets of vector payoffs in stochastic games. Games Econom Behav 56(1):135–147
Patek SD (1997) Stochastic shortest path games: theory and algorithms. Ph.D. dissertation, Lab. for Information and Decision Systems, MIT
Perchet V (2014) Approachability, regret and calibration; implications and equivalences. J Dyn Games 1(2):181–254
Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Shimkin N, Shwartz A (1993) Guaranteed performance regions in Markovian systems with competing decision makers. IEEE Trans Autom Control 38(1):84–95
Steuer RE (1989) Multiple criteria optimization: theory, computation, and application. Wiley, New York
Tucker W (1999) The Lorenz attractor exists. Comptes Rendus Acad Sci Ser I Math 328(12):1197–1202
Wagner DH (1977) Survey of measurable selection theorems. SIAM J Control and Optim 15(5):859–903
Yu JY, Mannor S, Shimkin N (2009) Markov decision processes with arbitrary reward processes. Math Oper Res 34(3):737–757
Yu H, Bertsekas DP (2013) On boundedness of Q-learning iterates for stochastic shortest path problems. Math Oper Res 38(2):209–227