Reachability-based model reduction for Markov decision process

Felipe Martins dos Santos1, Leliane Nunes de Barros1, Felipe W. Trevizan1
1Institute of Mathematics and Statistics - University of São Paulo, Rua do Matão, São Paulo, 1010, Brazil

Tóm tắt

Từ khóa


Tài liệu tham khảo

Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, New York, NY, USA.

Hoey J, St-Aubin R, Hu A, Boutilier C (1999) SPUDD: Stochastic planning using decision diagrams In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 279–288.. Morgan Kauffman, San Franciso, CA, USA.

Feng Z, Hansen EA, Zilberstein S (2003) Symbolic generalization for on-line planning In: Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, 109–116.. Morgan Kaufmann, San Francisco, CA, USA.

Barto AG, Bradtke SJ, Singh SP (1993) Learning to act using real-time dynamic programming. Artif Intell 72: 81–138.

Bonet B, Geffner H (2003) Labeled RTDP: improving the convergence of real-time dynamic programming In: Proceedings of 13th International Conference on Automated Planning and Scheduling, 12–21.. AAAI Press, ICAPS, Trento, Italy.

Givan R, Greig M, Dean T (2003) Equivalence notions and model minimization in Markov decision processes. Artif Intell 147: 163–223.

Bertsekas D, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Cambridge, MA, USA.

Dean T, Kanazawa K (1990) A model for reasoning about persistence and causation. Comput Intell 5: 142–150.

Bahar RI, Frohm EA, Gaona CM, Hachtel GD, Macii E, Pardo A, Somenzi F (1993) Algebraic decision diagrams and their applications In: Proceedings of the 1993 IEEE/ACM International Conference on Computer-aided Design, 188–191.. IEEE Computer Society Press, Los Alamitos, CA, USA.

Bryant RE (1986) Graph-based algorithms for Boolean function manipulation. IEEE Trans Comput 35: 677–691.

Dai P, Goldsmith J (2007) Topological value iteration algorithm for Markov decision processes In: IJCAI’07 Proceedings of the 20th International Joint Conference on Artificial Intelligence, 1860–1865.. Morgan Kauffman, San Francisco, CA, USA.

Bertsekas DP (1995) Dynamic programming and optimal control. Vol. 1. Athena Scientific, Cambridge, MA, USA.

Li L, Walsh TJ, Littman ML (2006) Towards a unified theory of state abstraction for MDPs In: Proceedings of the 9th International Sysmposium on Artificial Intelligence and Mathematics, 531–539, Fort Lauderdale, Florida, USA.

Dean T, Givan R, Leach S (1997) Model reduction techniques for computing approximately optimal solutions for Markov decision processes In: Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence, 124–131.. Morgan Kauffman, San Francisco, CA, USA.

Givan R, Leach S, Dean T (2000) Bounded-parameter Markov decision processes. Artif Intell 122: 71–109.

Ravindran B, Barto AG (2002) Model minimization in hierarchical reinforcement learning. Lecture Notes Comput Sci2371/2002: 196–211.

Ravindran B, Barto AG (2004) Approximate homomorphisms: a framework for non-exact minimization in Markov decision processes In: Proceedings of the 5th International Conference on Knowledge Based Computer Systems, Hyderabad, India.

Boutilier C, Dearden R, Goldszmidt M (1995) Exploiting structure in policy construction In: IJCAI-95, 1104–1111.. University of British Columbia Vancouver, BC, Canada, Canada.

Kim KE, Dean T (2002) Solving factored MDPs with large action space using algebraic decision diagrams In: Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence, 80–89.. Springer-Verlag, London, UK.

Guo W, Leong TY (2010) An analytic characterization of model minimization in factored Markov decision processes In: Proceedings of the 24th AAAI Conference on Artificial Intelligence, 1077–1082.. AAAI Press, Atlanta, Georgia.

Russel S, Norvig P (2003) Inteligência Artificial: Uma Abordagem Moderna. Segunda edn.. Campus/Elsevier, Rio de Janeiro.

Pednault EPD (October, 1994) ADL and the state-transition model of action. Journal of Logic and Computation, Volume 4, Number 5: 1077–1082.

dos Santos FM, de Barros LN, Holguin MG (2013) Stochastic bisimulation for mdps using reachability analysis In: 2013 Brazilian Conference on Intelligent Systems (BRACIS), 213–218, Fortaleza, Ceará, Brazil.

Sanner S (2010) Relational dynamic influence diagram language (RDDL): language description. http://users.cecs.anu.edu.au/~ssanner/IPPC_2011/RDDL.pdf .