Feature-based methods for large scale dynamic programming

Machine Learning - Tập 22 - Trang 59-94 - 1996
John N. Tsitsiklis1, Benjamin van Roy1
1Laboratory for Information and Decision Systems, Massachusetts Institute of Technology

Tóm tắt

We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

Tài liệu tham khảo

Bakshi, B. R. & Stephanopoulos, G., (1993). "Wave-Net: A Multiresolution, Hierarchical Neural Network with Localized Learning", AIChE Journal, vol, 39, no 1, pp. 57–81. Barto, A. G., Bradtke, S. J. & Singh, S. P., (1995). "Real-time Learning and Control Using Asynchronous Dynamic Programming", Artificial Intelligence, vol. 72, pp. 81–138. Bellman, R. E. & Dreyfus, S. E., (1959). "Functional Approximation and Dynamic Programming" Math. Tables and Other Aids Comp. Vol. 13, pp. 247–251. Bortsekas, D. P., (1995).Dynamic Programming and Optimal Control, Athena Scientific, Bellmont, MA. Bertsekas, D. P. (1994), "A Counter-Example to Temporal Differences Learning", Neural Computation, vol. 7, pp. 270–279. Bertsekas, D. P. & Castañon, D. A., (1989), "Adaptive Aggregation for Infinite Horizon Dynamic Programming", IEEE Transactions on Automatic Control, Vol. 34, No. 6, pp. 589–598. Bertsekas, D. P. & Tsitsiklis, J. N., (1989).Parallel and Distributed Computation: Numerical Methods, Prentice Hall, Englewood Cliffs, NJ. Dayan, P. D., (1992). "The Convergence of TD(λ) for General λ", Machine Learning, vol. 8, pp. 341–362. Gordon, G. J., (1995). "Stable Function Approximation in Dynamic Programming", Technical Report: CMUCS-95-103, Carnegie Mellon University. Jaakola, T., Jordan M. I., & Singh, S. P., (1994). "On the Convergence of Stochastic Iterative Dynamic Programming Algorithms," Neural Computation, Vol. 6, No. 6. Jaakola T., Singh, S. P., & Jordan, M. I., (1995). "Reinforcement Learning Algorithms for Partially Observable Markovian Decision Processes," inAdvances in Neural Information Processing Systems 7, J. D. Cowan, G. Tesauro, and D. Touretzky, editors, Morgan Kaufmann. Korf, R. E. (1987). "Planning as Search: A Quantitative Approach", Artificial Intelligence, vol. 33, pp. 65–88. Lippman, R. P., Kukolich, L. & Singer, E., (1993). "LNKnet: Neural Network, Machine-Learning, and Statistical Software for Pattern Classification", The Lincoln Laboratory Journal, vol. 6, no. 2, pp. 249–268. Morin, T. L., (1987). "Computational Advances in Dynamic Programming", inDynamic Programming and Its Applications, edited by Puterman, M.L., pp. 53–90. Poggio, T. & Girosi, F., (1990). "Networks for Approximation and Learning", Proceedings of the IEEE, vol. 78, no. 9, pp. 1481–1497. Reetz, D., (1977). "Approximate Solutions of a Discounted Markovian Decision Process", Bonner Mathematische Schriften, vol. 98: Dynamische Optimierung, pp. 77–92. Schweitzer, P. J., & Seidmann, A., (1985). "Generalized Polynomial Approximations in Markovian Decision Processes", Journal of Mathematical Analysis and Applications, vol. 110, pp. 568–582. Sutton, R. S., (1988). "Learning to Predict by the Method of Temporal Differences", Machine Learning, vol. 3, pp. 9–44. Tesauro, G., (1992). "Practical Issues in Temporal Difference Learning", Machine Learning, vol. 8, pp. 257–277. Tsitsiklis, J. N., (1994). "Asynchronous Stochastic Approximation and Q-Learning", Machine Learning, vol. 16, pp. 185–202. Watkina, C. J. C. H., Dayan, P., (1992). "Q-learning", Machine Learning, vol. 8, pp. 279–292. Whitt, W., (1978). Approximations of Dynamic Programs I.Mathematics of Operations Research, vol. 3, pp. 231–243.