Linear Least-Squares algorithms for temporal difference learning

Machine Learning - 1996

Steven J. Bradtke¹, Andrew G. Barto²

¹One E Telecom Pkwy, GTE Data Services, DC B2H, 33637, Temple Terrace, FL

²Dept. of Computer Science, University of Massachusetts, 01003-4610, Amherst, MA

Tóm tắt

Từ khóa

Tài liệu tham khảo

Anderson, C. W. (1988). Strategy learning with multilayer connectionist representations. Technical Report 87?509.3, GTE Laboratories Incorporated, Computer and Intelligent Systems Laboratory, 40 Sylvan Road, Waltham, MA 02254.

Barto, A. G., Sutton, R. S. & Anderson, C. W. (1983) Neuronlike elements that can solve difficult learning control problems.IEEE Transactions on Systems, Man, and Cybernetics, 13: 835?846.

Bradtke, S. J., (1994).Incremental Dynamic Programming for On-Line Adaptive Optimal Control. PhD thesis, University of Massachusetts, Computer Science Dept. Technical Report 94-62.

Darken, C. Chang, J. & Moody, J., (1992) Learning rate schedules for faster stochastic gradient search. InNeural Networks for Signal Processing 2 ? Proceedings of the 1992 IEEE Workshop. IEEE Press.

Dayan, P., (1992). The convergence of TD(?) for general ?.Machine Learning, 8: 341?362.

Dayan, P. & Sejnowski, T.J., (1994). TD(?): Convergence with probability 1.Machine Learning.

Goodwin, G.C. & Sin, K.S., (1984).Adaptive Filtering Prediction and Control. Prentice-Hall, Englewood Cliffs, N.J.

Jaakkola, T. Jordan, M.I. & Singh, S.P., (1994). On the convergence of stechastic iterative dynamic programming algorithms,Neural Computation, 6(6).

Kemeny, J.G. & Snell, J.L., (1976).Finite Markov Chains. Springer-Verlag, New York.

Ljung, L. & Söderström, T. (1983)Theory and Practice of Recursive Identification, MIT Press, Cambridge, MA.

Lukes, G., Thompson, B. & Werbos, P., (1990), Expectation driven learning with an associative memory. InProceedings of the International Joint Conference on Neural Networks, pages 1: 521?524.

Robbins, H. & Monro, S., (1951). A stochastic approximation method.Annals of Mathematical Statistics, 22:400?407.

Söderström, T. & Stoica, P.G., (1983).Instrumental Variable Methods for System Identification. Springer-Verlag, Berlin.

Sutton, A.S., (1984).Temporal Credit Assignment in Reinforcement Learning. PhD thesis, Department of Computer and Information Science, University of Massachusetts at Amherst, Amherst, MA 01003.

Sutton, R.S., (1988). Learning to predict by the method of temporal differences.Machine Learning, 3:9?44.

Tesauro, G.J., (1992). Practical issues in temporal difference learning.Machine Learning, 8(3/4):257?277.

Tsitsiklis, J.N., (1993). Asynchronous stochastic approximation and Q-learning. Technical Report LIDS-P-2172, Laboratory for Information and Decision Systems, MIT, Cambridge, MA.

Watkins, C. J. C. H., (1989).Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England.

Watkins, C. J. C. H. & Dayan, P., (1992). Q-learning.Machine Learning, 8(3/4):257?277, May 1992.

Werbos, P.J., (1987). Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research.IEEE Transactions on Systems, Man, and Cybernetics, 17(1):7?20.

Werbos, P.J., (1988). Generalization of backpropagation with application to a recurrent gas market model.Neural Networks, 1(4):339?356, 1988.

Werbos, P.J., (1990). Consistency of HDP applied to a simple reinforcement learning problem.Neural Networks, 3(2):179?190.

Werbos, P.J., (1992). Approximate dynamic programming for real-time control and neural modeling. In D. A. White and D. A. Sofge, editors,Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. pages 493?525. Van Nostrand Reinhold, New York.

Young, P., (1984).Recursive Estimation and Time-series Analysis. Springer-Verlag.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA