An index-based deterministic convergent optimal algorithm for constrained multi-armed bandit problems

Automatica - Tập 129 - Trang 109673 - 2021
Hyeong Soo Chang1
1Department of Computer Science and Engineering, Sogang University, Seoul, Republic of Korea

Tài liệu tham khảo

Audibert, J. -Y., Bubeck, S., & Munos, R. (2010). Best arm identification in multi-armed bandits. In Proc. of the 23rd international conference on learning theory (pp. 41–53). Auer, 2002, Finite-time analysis of the multiarmed bandit problem, Machine Learning, 47, 235, 10.1023/A:1013689704352 Bather, 1980, Randomized allocation of treatments in sequential trials, Advances in Applied Probability, 12, 174, 10.2307/1426500 Bather, 1981, Randomized allocation of treatments in sequential experiments, Journal of the Royal Statistical Society. Series B., 43, 265 Berry, 1985 Cesa-Bianchi, 2006 Chang, 2020, An asymptotically optimal strategy for constrained multi-armed bandit problems, Mathematical Methods of Operations Research, 91, 545, 10.1007/s00186-019-00697-3 Hoeffding, 1963, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, 58, 13, 10.1080/01621459.1963.10500830 Kaufmann, 2016, On the complexity of best-arm identification in multi-armed bandit models, Journal of Machine Learning Research, 17, 1 Kleywegt, 2001, The sample average approximation method for stochastic discrete optimization, SIAM Journal on Optimization, 12, 479, 10.1137/S1052623499363220 Lai, 1985, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, 6, 4, 10.1016/0196-8858(85)90002-8 Locatelli, A., Gutzeit, M., & Carpentier, A. (2016). An optimal algorithm for the thresholding bandit problem. In Proc. of the 33rd international conference on machine learning (pp. 1690–1698). Pasupathy, 2014, Stochastically constrained ranking and selection via SCORE, ACM Transactions on Modeling and Computer Simulation, 25, 1, 10.1145/2630066 Robbins, 1952, Some aspects of the sequential design of experiments, American Mathematical Society. Bulletin, 58, 527, 10.1090/S0002-9904-1952-09620-8 Russo, 2020, Simple Bayestian algorithms for best-arm identification, Operations Research, 68, 1625, 10.1287/opre.2019.1911 Wang, 2008, Sample average approximation of expected value constrained stochastic systems, Operations Research Letters, 36, 515, 10.1016/j.orl.2008.05.003 Yang, 2020, An optimal algorithm for the stochastic bandits while knowing the near-optimal mean reward, IEEE Transactions on Neural Networks Learning Systems