Greedy feature replacement for online value function approximation

Journal of Zhejiang University SCIENCE C - 2014

Fengfei Zhao¹, Qinghua Zheng², Zhuo Shao³, Jun Fang¹, Bo-yan Ren²

¹Department of Computer Science and Technology, Tsinghua University, Beijing, China

²School of Software, Tsinghua University, Beijing, China

³Department of Physics and State Key Laboratory of Low-Dimensional Quantum Physics, Tsinghua University, Beijing, China

Tóm tắt

Từ khóa

Tài liệu tham khảo

Albus, J.S., 1971. A theory of cerebellar function. Math. Biosci., 10(1–2):25–61. [doi:10.1016/0025-5564(71)900 51-4]

Barto, A.G., Bradtke, S.J., Singh, S.P., 1995. Learning to act using real-time dynamic programming. Artif. Intell., 72(1–2):81–138. [doi:10.1016/0004-3702(94)00011-O]

Buro, M., 1999. From simple features to sophisticated evaluation functions. Proc. 1st Int. Conf. on Computers and Games, p.126–145. [doi:10.1007/3-540-48957-6_8]

de Hauwere, Y.M., Vrancx, P., Nowé, A., 2010. Generalized learning automata for multi-agent reinforcement learning. AI Commun., 23(4):311–324. [doi:10.3233/AIC-2010-0476]

Geramifard, A., Doshi, F., Redding, J., et al., 2011. Online discovery of feature dependencies. Proc. 28th Int. Conf. on Machine Learning, p.881–888.

Geramifard, A., Dann, C., How, J.P., 2013. Off-policy learning combined with automatic feature expansion for solving large MDPs. Proc. 1st Multidisciplinary Conf. on Reinforcement Learning and Decision Making, p.29–33.

Kaelbling, L.P., Littman, M.L., Moore, A.W., 1996. Reinforcement learning: a survey. J. Artif. Intell. Res., 4:237–285. [doi:10.1613/jair.301]

Kolter, J.Z., Ng, A.Y., 2009. Near-Bayesian exploration in polynomial time. Proc. 26th Annual Int. Conf. on Machine Learning, p. 513–520. [doi:10.1145/1553374. 1553441]

Lagoudakis, M.G., Parr, R., 2003. Least-squares policy iteration. J. Mach. Learn. Res., 4(6):1107–1149.

Pazis, J., Lagoudakis, M.G., 2009. Binary action search for learning continuous-action control policies. Proc. 26th Annual Int. Conf. on Machine Learning, p.793–800. [doi:10.1145/1553374.1553476]

Puterman, M.L., 1994. Markov Decision Processes-Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York, NY, p.139–161.

Ratitch, B., Precup, D., 2004. Sparse distributed memories for on-line value-based reinforcement learning. Proc. 15th European Conf. on Machine Learning, p.347–358. [doi:10. 1007/978-3-540-30115-8_33]

Rummery, G.A., Niranjan, M., 1994. On-line Q-learning Using Connectionist Systems. Technical Report No. cued/f-infeng/tr166, Engineering Department, Cambridge University.

Singh, S., Jaakkola, T., Littman, M.L., et al., 2000. Convergence results for single-step on-policy reinforcement-learning algorithms. Mach. Learn., 38(3):287–308. [doi:10.1023/A: 1007678930559]

Singh, S.P., Sutton, R.S., 1996. Reinforcement learning with replacing eligibility traces. Mach. Learn., 22(1–3):123–158. [doi:10.1023/A:1018012322525]

Singh, S.P., Yee, R.C., 1994. An upper bound on the loss from approximate optimal-value functions. Mach. Learn., 16(3):227–233. [doi:10.1007/Bf00993308]

Sprague, N., Ballard, D., 2003. Multiple-goal reinforcement learning with modular sarsa(0). Proc. 18th Int. Joint Conf. on Artificial Intelligence, p.1445–1447.

Sturtevant, N.R., White, A.M., 2006. Feature construction for reinforcement learning in hearts. Proc. 5th Int. Conf. on Computers and Games, p.122–134. [doi:10.1007/978-3-540-75538-8_11]

Sutton, R.S., 1996. Generalization in reinforcement learning: successful examples using sparse coarse coding. Adv. Neur. Inform. Process. Syst., 8:1038–1044.

Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: an Introduction. MIT Press, Cambridge, MA, USA, p.3–25.

Tsitsiklis, J.N., 1994. Asynchronous stochastic approximation and Q-learning. Mach. Learn., 16(3):185–202. [doi:10. 1007/Bf00993306]

Tsitsiklis, J.N., van Roy, B., 1997. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Contr., 42(5):674–690. [doi:10.1109/9. 580874]

Watkins, C.J.C.H., Dayan, P., 1992. Q-learning. Mach. Learn., 8(3–4):279–292. [doi:10.1007/Bf00992698]

Whiteson, S., Taylor, M.E., Stone, P., 2007. Adaptive Tile Coding for Value Function Approximation. Technical Report No. AI-TR-07-339, University of Texas at Austin.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]