An index-based deterministic convergent optimal algorithm for constrained multi-armed bandit problems

Automatica - Tập 129 - Trang 109673 - 2021

Hyeong Soo Chang¹

¹Department of Computer Science and Engineering, Sogang University, Seoul, Republic of Korea

Tài liệu tham khảo

Audibert, J. -Y., Bubeck, S., & Munos, R. (2010). Best arm identification in multi-armed bandits. In Proc. of the 23rd international conference on learning theory (pp. 41–53). Auer, 2002, Finite-time analysis of the multiarmed bandit problem, Machine Learning, 47, 235, 10.1023/A:1013689704352 Bather, 1980, Randomized allocation of treatments in sequential trials, Advances in Applied Probability, 12, 174, 10.2307/1426500 Bather, 1981, Randomized allocation of treatments in sequential experiments, Journal of the Royal Statistical Society. Series B., 43, 265 Berry, 1985 Cesa-Bianchi, 2006 Chang, 2020, An asymptotically optimal strategy for constrained multi-armed bandit problems, Mathematical Methods of Operations Research, 91, 545, 10.1007/s00186-019-00697-3 Hoeffding, 1963, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, 58, 13, 10.1080/01621459.1963.10500830 Kaufmann, 2016, On the complexity of best-arm identification in multi-armed bandit models, Journal of Machine Learning Research, 17, 1 Kleywegt, 2001, The sample average approximation method for stochastic discrete optimization, SIAM Journal on Optimization, 12, 479, 10.1137/S1052623499363220 Lai, 1985, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, 6, 4, 10.1016/0196-8858(85)90002-8 Locatelli, A., Gutzeit, M., & Carpentier, A. (2016). An optimal algorithm for the thresholding bandit problem. In Proc. of the 33rd international conference on machine learning (pp. 1690–1698). Pasupathy, 2014, Stochastically constrained ranking and selection via SCORE, ACM Transactions on Modeling and Computer Simulation, 25, 1, 10.1145/2630066 Robbins, 1952, Some aspects of the sequential design of experiments, American Mathematical Society. Bulletin, 58, 527, 10.1090/S0002-9904-1952-09620-8 Russo, 2020, Simple Bayestian algorithms for best-arm identification, Operations Research, 68, 1625, 10.1287/opre.2019.1911 Wang, 2008, Sample average approximation of expected value constrained stochastic systems, Operations Research Letters, 36, 515, 10.1016/j.orl.2008.05.003 Yang, 2020, An optimal algorithm for the stochastic bandits while knowing the near-optimal mean reward, IEEE Transactions on Neural Networks Learning Systems

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA