An index-based deterministic convergent optimal algorithm for constrained multi-armed bandit problems
Tài liệu tham khảo
Audibert, J. -Y., Bubeck, S., & Munos, R. (2010). Best arm identification in multi-armed bandits. In Proc. of the 23rd international conference on learning theory (pp. 41–53).
Auer, 2002, Finite-time analysis of the multiarmed bandit problem, Machine Learning, 47, 235, 10.1023/A:1013689704352
Bather, 1980, Randomized allocation of treatments in sequential trials, Advances in Applied Probability, 12, 174, 10.2307/1426500
Bather, 1981, Randomized allocation of treatments in sequential experiments, Journal of the Royal Statistical Society. Series B., 43, 265
Berry, 1985
Cesa-Bianchi, 2006
Chang, 2020, An asymptotically optimal strategy for constrained multi-armed bandit problems, Mathematical Methods of Operations Research, 91, 545, 10.1007/s00186-019-00697-3
Hoeffding, 1963, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, 58, 13, 10.1080/01621459.1963.10500830
Kaufmann, 2016, On the complexity of best-arm identification in multi-armed bandit models, Journal of Machine Learning Research, 17, 1
Kleywegt, 2001, The sample average approximation method for stochastic discrete optimization, SIAM Journal on Optimization, 12, 479, 10.1137/S1052623499363220
Lai, 1985, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, 6, 4, 10.1016/0196-8858(85)90002-8
Locatelli, A., Gutzeit, M., & Carpentier, A. (2016). An optimal algorithm for the thresholding bandit problem. In Proc. of the 33rd international conference on machine learning (pp. 1690–1698).
Pasupathy, 2014, Stochastically constrained ranking and selection via SCORE, ACM Transactions on Modeling and Computer Simulation, 25, 1, 10.1145/2630066
Robbins, 1952, Some aspects of the sequential design of experiments, American Mathematical Society. Bulletin, 58, 527, 10.1090/S0002-9904-1952-09620-8
Russo, 2020, Simple Bayestian algorithms for best-arm identification, Operations Research, 68, 1625, 10.1287/opre.2019.1911
Wang, 2008, Sample average approximation of expected value constrained stochastic systems, Operations Research Letters, 36, 515, 10.1016/j.orl.2008.05.003
Yang, 2020, An optimal algorithm for the stochastic bandits while knowing the near-optimal mean reward, IEEE Transactions on Neural Networks Learning Systems