Gaussian Two-Armed Bandit and Optimization of Batch Data Processing

Problems of Information Transmission - Tập 54 Số 1 - Trang 84-100 - 2018
А. В. Колногоров1
1Department of Applied Mathematics and Information Science, Yaroslav-the-Wise Novgorod State University, Moscow, Russia

Tóm tắt

Từ khóa


Tài liệu tham khảo

Berry, D.A. and Fristedt, B., Bandit Problems: Sequential Allocation of Experiments, London: Chapman & Hall, 1985.

Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym. Baiesovskii podkhod, Moscow: Nauka, 1982. Translated under the title Sequential Control with Incomplete Information, New York: Academic, 1990.

Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem, Moscow: Nauka, 1969. Translated under the title Automaton Theory and Modeling of Biological Systems, New York: Academic, 1973.

Varshavsky, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973. Translated under the title Kollektives Verhalten von Automaten, Warschawski, W.I., Berlin: Akademie, 1978.

Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981. Translated under the title Mathematical Theory of Adaptive Control, Singapore: World Sci., 2006.

Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov: rekurrentnye algoritmy (Adaptive Choice between Alternatives: Recursive Algorithms), Moscow: Nauka, 1986.

Robbins, H., Some Aspects of the Sequential Design of Experiments, Bull. Amer. Math. Soc., 1952, vol. 58, no. 5, pp. 527–535.

Fabius, J. and van Zwet, W.R., Some Remarks on the Two-Armed Bandit, Ann. Math. Statist., 1970, vol. 41, no. 6, pp. 1906–1916.

Vogel, W., An Asymptotic Minimax Theorem for the Two Armed Bandit Problem, Ann. Math. Statist., 1960, vol. 31, no. 2, pp. 444–451.

Bather, J.A., The Minimax Risk for the Two-Armed Bandit Problem, Mathematical Learning Models—Theory and Algorithms, Herkenrath, U., Kalin, D., and Vogel, W., Eds., Lect. Notes Statist, vol. 20, New York: Springer, 1983, pp. 1–11.

Lai, T.L., Levin, B., Robbins, H., and Siegmund, D., Sequential Medical Trials (Stopping Rules/Asymptotic Optimality), Proc. Natl. Acad. Sci. USA, 1980, vol. 77, no. 6, Part 1, pp. 3135–3138.

Cesa-Bianchi, N. and Lugosi, G., Prediction, Learning, and Games, Cambridge: Cambridge Univ. Press, 2006.

Juditsky, A., Nazin, A.V., Tsybakov, A.B., and Vayatis, N., Gap-Free Bounds for Stochastic Multi-Armed Bandit, in Proc. 17th IFAC World Congr., Seoul, Korea, July 6–11, 2008, pp. 11560–11563. Available at http://www.ifac-papersonline.net/Detailed/37644.html.

Gasnikov, A.V., Nesterov, Yu.E., and Spokoiny, V.G., On the Efficiency of a Randomized Mirror Descent Algorithm in Online Optimization Problems, Zh. Vychisl. Mat. Mat. Fiz., 2015, vol. 55, no. 4, pp. 582–598 [Comput. Math. Math. Phys. (Engl. Transl.), 2015, vol. 55, no. 4, pp. 580–596].

Kolnogorov, A.V., Determination of Minimax Strategies and Risk in a Random Environment (the Two-Armed Bandit Problem), Avtomat. i Telemekh., 2011, no. 5, pp. 127–138 [Autom. Remote Control (Engl. Transl.), 2011, vol. 72, no. 5, pp. 1017–1027].

Kolnogorov, A.V., One-Armed Bandit Problem for Parallel Data Processing Systems, Probl. Peredachi Inf., 2015, vol. 51, no. 2, pp. 99–113 [Probl. Inf. Trans. (Engl. Transl.), 2015, vol. 51, no. 2, pp. 177–191].

Oleynikov, A.O., Numerical Optimization of Parallel Processing in a Stationary Environment, Trans. Karelian Res. Centre Russ. Acad. Sci., 2013, no. 1, pp. 73–78.