A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Springer Science and Business Media LLC - 2006

David Choi¹, Benjamin Van Roy²

¹Lincoln Laboratory, Massachusetts Institue of Technology, Lexington, USA

²Departments of Management Science and Engineering and Electrical Engineering, Stanford University, Stanford, USA

Tóm tắt

Từ khóa

Tài liệu tham khảo

Barto A, Crites R 1996. Improving elevator performance using reinforcement learning, Adv Neural Inf Process Syst, 8:1017–1023.

Bellman R, Dreyfuss S 1959. Functional approximations and dynamic programming, Math Tables Other Aids Comput, 13:247–251.

Benveniste A, Métivier M, and Priouret P 1991. Adaptive Algorithms and Stochastic Approximations. Berlin Heidelberg New York: Springer-Verlag

Bertsekas DP 1995a. Nonlinear Programming. Athena Scientific.

Bertsekas DP 1995b. Dynamic Programming and Optimal Control. Athena Scientific.

Bertsekas DP, Singh S 1997. Reinforcement learning for dynamic channel allocation in cellular telephone systems. Adv Neural Inf Process Syst. MIT, vol. 9, p. 974.

Bertsekas DP, Tsitsiklis JN 1995. Neuro-Dynamic Programming. Athena Scientific.

Borkar V 1995. Probability theory: an advanced course. Berlin Heidelberg New York: Springer-Verlag

Boyan J 1999. Least-squares temporal difference learning. Proceedings of the Sixteenth International Conference (ICML) on Machine Learning, pp. 49–56.

Boyan J 2002. Technical update: least-squares temporal difference learning, Mach Learn, 49(2):233–246.

Bradtke SJ, Barto AG 1996. Linear least-squares algorithms for temporal-difference learning, Mach Learn, 22:33–57.

Choi DS, Van Roy B 2001. A generalized kalman filter for fixed point approximation and efficient temporal-difference learning, proceedings of the international joint conference on machine learning.

Dayan PD 1992. The convergence of TD(λ) for general (λ), Mach Learn, 8:341–362.

de Farias DP, Van Roy B 2000. On the existence of fixed points for approximate value iteration and temporal-difference learning, J Optim Theory Appl, 105(3).

Gurvits L, Lin LJ, and Hanson SJ 1994. incremental learning of evaluation functions for absorbing markov chains: New Methods and Theorems, preprint.

Karatzas I, Shreve SE 1998. Methods of Mathematical Finance. Berlin Heidelberg New York: Springer.

Lagoudakis M, Parr R 2001. Model-free least-squares policy iteration. Neural Inf Process Syst (NPIS-14).

Nedic A, Bertsekas DP 2001. Policy evaluation algorithms with linear function approximation. Tech. Rep. LIDS-P-2537, MIT Laboratory for Information and Decision Systems, December 2001.

Pineda F 1997. Mean-field analysis for batched TD(λ), Neural Comput, 1403–1419.

Sutton RS 1988. Learning to predict by the method of temporal differences, Mach Learn, 3:9–44.

Tadić V 2001. On the convergence of temporal-difference learning with linear function approximation, Mach Learn, 42:241–267.

Tesauro G 1995. Temporal difference learning and TD-gammon, Communications of the ACM, 38(3).

Tsitsiklis JN, Van Roy B 1997. An analysis of temporal-difference learning with function approximation, IEEE Trans Automat Contr, 42:674–690.

Tsitsiklis JN, Van Roy B 1999. Optimal stopping of markov processes: Hilbert Space Theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives, IEEE Trans Automat Contr, 44(10):1840–1851.

Van Roy B 1998. Learning and value function approximation in complex decision processes, Ph.D. dissertation, MIT.

Van Roy B, Bertsekas DP, Lee Y, and Tsitsiklis JN 1999. A Neuro-dynamic programming approach to retailer inventory management, Proc. of the IEEE Conf Decis Contr.

Varaiya P, Walrand J, and Buyukkoc C 1985. Extensions of the multiarmed bandit problem: the discounted case, IEEE Trans Automat Contr, 30(5).

Warmuth M, Forster J 2000. Relative loss bounds for temporal-difference learning. Proc. of the Seventeenth International Conference on Machine Learning, pp. 295–302.

Warmuth M, Schapire R 1997. On the worst-case analysis of temporal-difference learning algorithms, Journal of Machine Learning, 22(1,2,3):95–121.

Zhang W, Dietterich TG 1995. A reinforcement learning approach to job-shop scheduling. Proc. of the International Joint Conference on Artificial Intellience.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA