Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes

Springer Science and Business Media LLC - Tập 14 - Trang 309-341 - 2004

Hyeong Soo Chang¹, Robert Givan², Edwin K. P. Chong³

¹Department of Computer Science and Engineering, Sogang University, Seoul, Korea

²School of Electrical and Computer Engineering, Purdue University, West Lafayette, USA

³Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, USA

Tóm tắt

We propose a novel approach, called parallel rollout, to solving (partially observable) Markov decision processes. Our approach generalizes the rollout algorithm of Bertsekas and Castanon (1999) by rolling out a set of multiple heuristic policies rather than a single policy. In particular, the parallel rollout approach aims at the class of problems where we have multiple heuristic policies available such that each policy performs near-optimal for a different set of system paths. Parallel rollout automatically combines the given multiple policies to create a new policy that adapts to the different system paths and improves the performance of each policy in the set. We formally prove this claim for two criteria: total expected reward and infinite horizon discounted reward. The parallel rollout approach also resolves the key issue of selecting which policy to roll out among multiple heuristic policies whose performances cannot be predicted in advance. We present two example problems to illustrate the effectiveness of the parallel rollout approach: a buffer management problem and a multiclass scheduling problem.

Tài liệu tham khảo

Anderson, A. T., Jensen, A., and Nielsen, B. F. 1995. Modelling and performance study of packet-traffic with self-similar characteristics over several timescales with Markovian arrival processes (MAP). Proc. 12th Nordic Teletraffic Seminar, 269–283. Anderson, A. T., and Nielsen, B. F. 1997. An application of superpositions of two state Markovian sources to the modelling of self-similar behaviour. Proc. IEEE INFOCOM, 196–204. Asmussen, S., Nerman, 0., and Olsson, M. 1996. Fitting phase-type distributions via the EM algorithm. Scand. J. Statist. 23:419–414. Bertsekas, D. P. 1995. Dynamic Programming and Optimal Control. Athena Scientific. Bertsekas, D. P. 1997. Differential training of rollout policies. Proc. 35th Allerton Conference on Communication, Control, and Computing, Allerton Park, IL. Bertsekas, D. P., and Castanon, D. A. 1999. Rollout algorithms for stochastic scheduling problems. J. of Heuristics 5:89–108. Bertsekas, D. P., and Tsitsikiis, J. 1996. Neuro-Dynamic Programming. Nashua, NH: Athena Scientific. Blondia, C. 1993. A discrete-time batch Markovian arrival process as B-ISDN traffic model. Belgian J. of Operations Research, Statistics and Computer Science 32. Bonald, T., May, M., and Bolot, M. 2000. Analytic evaluation of RED performance, Proc. IEEE INFOCOM, 1415–1424. Chang, H. S. 2001. On-line sampling-based control for network queueing problems, Ph.D. thesis. Department of Electrical and Computer Engineering, West Lafayette, IN: Purdue University. Chang, H. S., Givan, R., and Chong, E. K. P. 2000. On-line scheduling via sampling. Proc. 5th Int. Conf. on Artificial Intelligence Planning and Scheduling, 62–71. Chen, D. T., and Rieders, M. 1996. Cyclic Markov modulated Poisson processes in traffic characterization. Stochastic Models 12(4): 585–610. Duffield, N. G., and Whitt, W. 1998. A source traffic model and its transient analysis for network control. Stochastic Models 14:51–78. Firoiu, V., and Borden, M., 2000. A study of active queue management for congestion control. Proc. IEEE INFOCOM, 1435–1445. Fischer, W., and Meier-Hellstern, K. 1992. The Markov-modulated Poisson process (MMPP) cookbook. Performance Evaluation 18:149–171. Floyd, S., and Jacobson, V. 1993. Random early detection gateways for congestion avoidance. IEEEIACM Trans. Net. 1(4): 397–413. Givan, R., Chong, E. K. P., and Chang, H. S. 2002. Scheduling multiclass packet streams to minimize weighted loss. Queueing Systems 41(3): 241–270. Hashem, E. 1989. Analysis of random drop for gateway congestion control. Tech. Rep. LCS/TR-465, Massachusetts Institute of Technology. Ho, Y. C., and Cao, X. R. 1991. Perturbation Analysis of Discrete Event Dynamic Systems. Norwell, Massachusetts: Kluwer Academic Publishers. Keams, M., Mansour, Y., and Ng, A. Y. 1999. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning 49: 193–208. Keams, M., Mansour, Y., and Ng, A. Y. 2000. Approximate planning in large POMDPs via reusable trajectories. Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, and K.-R. Miiller (eds), Cambridge, MA: MIT Press. Kitaeve, M. Y, and Rykov, V. V. 1995. Controlled Queueing Systems. CRC Press. Kulkami, V. G., and Tedijanto, T. E. 1998. Optimal admission control of Markov-modulated batch arrivals to a finite-capacity buffer. Stochastic Models 14(1): 95–122. Mayne, D. Q., and Michalska, H. 1990. Receding horizon control of nonlinear system. IEEE Trans. Auto. Contr. 38(7): 814–824. Misra, V. M., and Gong, W. B. 1998. A hierarchical model for teletraffic. Proc. IEEE CDC 2: 1674–1679. Neuts, M. F. 1979. Aversatile Markovian point process. J. Appl. Prob. 16: 764–779. Peha, J. M., and Tobagi, F. A. 1990. Evaluating scheduling algorithms for traffic with heterogeneous performance objectives. Proc. IEEE GLOBECOM, 21–27. Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: Wiley. Romanov, A., and Floyd, S. 1995. Dynamics of TCP traffic over ATM network. IEEEJ. of Select. Areas Commun. 13(4): 633–641. Ross, S. M. 1997. Simulation. San Diego, CA: Academic Press. Sen, P., Marlaris, B., Rikh, N., and Anastassiou, D. 1989. Models for packet switching of variable-bit-rate video sources. IEEE J. of Select. Areas Commun. 7: 865–869. Turin, J. W. 1996. Fitting stochastic automata via the EM algorithm. Stochastic Models 12: 405–424.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA