Reachability-based model reduction for Markov decision process

Springer Science and Business Media LLC - 2015

Felipe Martins dos Santos¹, Leliane Nunes de Barros¹, Felipe W. Trevizan¹

¹Institute of Mathematics and Statistics - University of São Paulo, Rua do Matão, São Paulo, 1010, Brazil

Tóm tắt

Từ khóa

Tài liệu tham khảo

Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, New York, NY, USA.

Hoey J, St-Aubin R, Hu A, Boutilier C (1999) SPUDD: Stochastic planning using decision diagrams In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 279–288.. Morgan Kauffman, San Franciso, CA, USA.

Feng Z, Hansen EA, Zilberstein S (2003) Symbolic generalization for on-line planning In: Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, 109–116.. Morgan Kaufmann, San Francisco, CA, USA.

Barto AG, Bradtke SJ, Singh SP (1993) Learning to act using real-time dynamic programming. Artif Intell 72: 81–138.

Bonet B, Geffner H (2003) Labeled RTDP: improving the convergence of real-time dynamic programming In: Proceedings of 13th International Conference on Automated Planning and Scheduling, 12–21.. AAAI Press, ICAPS, Trento, Italy.

Givan R, Greig M, Dean T (2003) Equivalence notions and model minimization in Markov decision processes. Artif Intell 147: 163–223.

Bertsekas D, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Cambridge, MA, USA.

Dean T, Kanazawa K (1990) A model for reasoning about persistence and causation. Comput Intell 5: 142–150.

Bahar RI, Frohm EA, Gaona CM, Hachtel GD, Macii E, Pardo A, Somenzi F (1993) Algebraic decision diagrams and their applications In: Proceedings of the 1993 IEEE/ACM International Conference on Computer-aided Design, 188–191.. IEEE Computer Society Press, Los Alamitos, CA, USA.

Bryant RE (1986) Graph-based algorithms for Boolean function manipulation. IEEE Trans Comput 35: 677–691.

Dai P, Goldsmith J (2007) Topological value iteration algorithm for Markov decision processes In: IJCAI’07 Proceedings of the 20th International Joint Conference on Artificial Intelligence, 1860–1865.. Morgan Kauffman, San Francisco, CA, USA.

Bertsekas DP (1995) Dynamic programming and optimal control. Vol. 1. Athena Scientific, Cambridge, MA, USA.

Li L, Walsh TJ, Littman ML (2006) Towards a unified theory of state abstraction for MDPs In: Proceedings of the 9th International Sysmposium on Artificial Intelligence and Mathematics, 531–539, Fort Lauderdale, Florida, USA.

Dean T, Givan R, Leach S (1997) Model reduction techniques for computing approximately optimal solutions for Markov decision processes In: Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence, 124–131.. Morgan Kauffman, San Francisco, CA, USA.

Givan R, Leach S, Dean T (2000) Bounded-parameter Markov decision processes. Artif Intell 122: 71–109.

Ravindran B, Barto AG (2002) Model minimization in hierarchical reinforcement learning. Lecture Notes Comput Sci2371/2002: 196–211.

Ravindran B, Barto AG (2004) Approximate homomorphisms: a framework for non-exact minimization in Markov decision processes In: Proceedings of the 5th International Conference on Knowledge Based Computer Systems, Hyderabad, India.

Boutilier C, Dearden R, Goldszmidt M (1995) Exploiting structure in policy construction In: IJCAI-95, 1104–1111.. University of British Columbia Vancouver, BC, Canada, Canada.

Kim KE, Dean T (2002) Solving factored MDPs with large action space using algebraic decision diagrams In: Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence, 80–89.. Springer-Verlag, London, UK.

Guo W, Leong TY (2010) An analytic characterization of model minimization in factored Markov decision processes In: Proceedings of the 24th AAAI Conference on Artificial Intelligence, 1077–1082.. AAAI Press, Atlanta, Georgia.

Russel S, Norvig P (2003) Inteligência Artificial: Uma Abordagem Moderna. Segunda edn.. Campus/Elsevier, Rio de Janeiro.

Pednault EPD (October, 1994) ADL and the state-transition model of action. Journal of Logic and Computation, Volume 4, Number 5: 1077–1082.

dos Santos FM, de Barros LN, Holguin MG (2013) Stochastic bisimulation for mdps using reachability analysis In: 2013 Brazilian Conference on Intelligent Systems (BRACIS), 213–218, Fortaleza, Ceará, Brazil.

Sanner S (2010) Relational dynamic influence diagram language (RDDL): language description. http://users.cecs.anu.edu.au/~ssanner/IPPC_2011/RDDL.pdf .

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA