Feature-based methods for large scale dynamic programming

Machine Learning - Tập 22 - Trang 59-94 - 1996

John N. Tsitsiklis¹, Benjamin van Roy¹

¹Laboratory for Information and Decision Systems, Massachusetts Institute of Technology

Tóm tắt

We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

Tài liệu tham khảo

Bakshi, B. R. & Stephanopoulos, G., (1993). "Wave-Net: A Multiresolution, Hierarchical Neural Network with Localized Learning", AIChE Journal, vol, 39, no 1, pp. 57–81. Barto, A. G., Bradtke, S. J. & Singh, S. P., (1995). "Real-time Learning and Control Using Asynchronous Dynamic Programming", Artificial Intelligence, vol. 72, pp. 81–138. Bellman, R. E. & Dreyfus, S. E., (1959). "Functional Approximation and Dynamic Programming" Math. Tables and Other Aids Comp. Vol. 13, pp. 247–251. Bortsekas, D. P., (1995).Dynamic Programming and Optimal Control, Athena Scientific, Bellmont, MA. Bertsekas, D. P. (1994), "A Counter-Example to Temporal Differences Learning", Neural Computation, vol. 7, pp. 270–279. Bertsekas, D. P. & Castañon, D. A., (1989), "Adaptive Aggregation for Infinite Horizon Dynamic Programming", IEEE Transactions on Automatic Control, Vol. 34, No. 6, pp. 589–598. Bertsekas, D. P. & Tsitsiklis, J. N., (1989).Parallel and Distributed Computation: Numerical Methods, Prentice Hall, Englewood Cliffs, NJ. Dayan, P. D., (1992). "The Convergence of TD(λ) for General λ", Machine Learning, vol. 8, pp. 341–362. Gordon, G. J., (1995). "Stable Function Approximation in Dynamic Programming", Technical Report: CMUCS-95-103, Carnegie Mellon University. Jaakola, T., Jordan M. I., & Singh, S. P., (1994). "On the Convergence of Stochastic Iterative Dynamic Programming Algorithms," Neural Computation, Vol. 6, No. 6. Jaakola T., Singh, S. P., & Jordan, M. I., (1995). "Reinforcement Learning Algorithms for Partially Observable Markovian Decision Processes," inAdvances in Neural Information Processing Systems 7, J. D. Cowan, G. Tesauro, and D. Touretzky, editors, Morgan Kaufmann. Korf, R. E. (1987). "Planning as Search: A Quantitative Approach", Artificial Intelligence, vol. 33, pp. 65–88. Lippman, R. P., Kukolich, L. & Singer, E., (1993). "LNKnet: Neural Network, Machine-Learning, and Statistical Software for Pattern Classification", The Lincoln Laboratory Journal, vol. 6, no. 2, pp. 249–268. Morin, T. L., (1987). "Computational Advances in Dynamic Programming", inDynamic Programming and Its Applications, edited by Puterman, M.L., pp. 53–90. Poggio, T. & Girosi, F., (1990). "Networks for Approximation and Learning", Proceedings of the IEEE, vol. 78, no. 9, pp. 1481–1497. Reetz, D., (1977). "Approximate Solutions of a Discounted Markovian Decision Process", Bonner Mathematische Schriften, vol. 98: Dynamische Optimierung, pp. 77–92. Schweitzer, P. J., & Seidmann, A., (1985). "Generalized Polynomial Approximations in Markovian Decision Processes", Journal of Mathematical Analysis and Applications, vol. 110, pp. 568–582. Sutton, R. S., (1988). "Learning to Predict by the Method of Temporal Differences", Machine Learning, vol. 3, pp. 9–44. Tesauro, G., (1992). "Practical Issues in Temporal Difference Learning", Machine Learning, vol. 8, pp. 257–277. Tsitsiklis, J. N., (1994). "Asynchronous Stochastic Approximation and Q-Learning", Machine Learning, vol. 16, pp. 185–202. Watkina, C. J. C. H., Dayan, P., (1992). "Q-learning", Machine Learning, vol. 8, pp. 279–292. Whitt, W., (1978). Approximations of Dynamic Programs I.Mathematics of Operations Research, vol. 3, pp. 231–243.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA