A survey of point-based POMDP solvers

Guy Shani1, Joëlle Pineau2, Robert Kaplow2
1Information Systems Engineering, Ben Gurion University, Beersheba, Israel
2School of Computer Science, McGill University, Montreal, Canada

Tóm tắt

Từ khóa


Tài liệu tham khảo

Albore, A., Palacios, H., & Geffner, H. (2009). A translation-based approach to contingent planning. In International joint conference on artificial intelligence (IJCAI) (pp. 1623–1628).

Armstrong-Crews, N., Gordon, G., & Veloso, M. (2008). Solving POMDPs from both sides: Growing dual parsimonious bounds. In AAAI workshop for advancement in POMDP solvers.

Atrash A., Kaplow R., Villemure J., West R., Yamani H., Pineau J. (2009) Development and validation of a robust speech interface for improved human-robot interaction. International Journal of Social Robotics 1: 345–356

Barto A. G., Bradtke S. J., Singh S. P. (1995) Learning to act using real-time dynamic programming. Artificial Intelligence 72: 81–138. doi: 10.1016/0004-3702(94)00011-O

Bellman R. (1957) Dynamic programming. Princeton University Press, Princeton

Bellman R. (1957) A Markovian decision process. Journal of Mathematics and Mechanics 6: 679–684

Bonet, B., & Geffner, H. (2003). Labeled RTDP: Improving the convergence of real-time dynamic programming. In International conference on planning and scheduling (ICAPS) (pp. 12–31).

Bonet, B., & Geffner, H. (2009). Solving POMDPs: RTDP-Bel vs. Point-based algorithms. In International joint conference on artificial intelligence (IJCAI) (pp. 1641–1646).

Boutilier, C. (2002). A POMDP formulation of preference elicitation problems. In National conference on artificial intelligence (AAAI) (pp. 239–246).

Brunskill, E., Kaelbling, L., Lozano-Perez, T., & Roy, N. (2008). Continuous-state POMDPs with hybrid dynamics. In International symposium on artificial intelligence and mathematics (ISAIM).

Cassandra, A., Littman, M. L., & Zhang, N. L. (1997). Incremental Pruning: A simple, fast, exact method for partially observable Markov decision processes. In Conference on uncertainty in artificial intelligence (UAI) (pp. 54–61). http://www.cs.duke.edu/~mlittman/docs/uai97-pomdp.ps .

Dai, P., & Goldsmith, J. (2007). Topological value iteration algorithm for Markov decision processes. In: International joint conference on artificial intelligence (IJCAI) (pp. 1860–1865)

Dibangoye, J. S., Shani, G., Chaib-draa, B., & Mouaddib, A. I. (2009). Topological order planner for POMDPs. In International joint conference on artificial intelligence (IJCAI) (pp. 1684–1689).

Doshi, F., & Roy, N. (2008). The permutable POMDP: Fast solutions to POMDPs for preference elicitation. In International conference on autonomous agents and multiagent systems (AAMAS) (pp. 493–500).

Geffner, H., & Bonet, B. (1998). Solving large POMDPs using real time dynamic programming. In Proceedings AAAI fall symposium on POMDPs.

Hansen, E. (1998). Solving POMDPs by searching in policy space. In: Conference on uncertainty in artificial intelligence (UAI)(pp. 211–219).

Hansen, E. A. (2007). Indefinite-horizon POMDPs with action-based termination. In National conference on artificial intelligence (AAAI) (pp. 1237–1242).

Hauskrecht, M. (1997). Incremental methods for computing bounds in partially observable Markov decision processes. In: National conference on artificial intelligence (pp. 734–739).

Hauskrecht, M. (2000). Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research (JAIR), 13, 33–94. http://www.cs.washington.edu/research/jair/abstracts/hauskrecht00a.html .

Hauskrecht M., Fraser H. S. F. (2000) Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine 18(3): 221–244

Hoey J., Poupart P., von Bertoldi A., Craig T., Boutilier C., Mihailidis A. (2010) Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process. Computer Vision and Image Understanding 114(5): 503–519

Howard R. A. (1960) Dynamic programming and Markov processes. MIT Press, Cambridge, MA

Hsiao, K., Kaelbling, L. P., & Lozano-Pérez, T. (2007). Grasping POMDPs. In IEEE international conference on robotics and automation (ICRA) (pp. 4685–4692).

Huynh, V. A., & Roy N. (2009). icLQG: Combining local and global optimization for control in information space. In IEEE international conference on robotics and automation (ICRA) (pp. 2851–2858).

Izadi, M. T., Rajwade, A. V., & Precup, D. (2005). Using core beliefs for point-based value iteration. In International joint conference on artificial intelligence (pp. 1751–1753).

Izadi, M. T., Precup, D., & Azar, D. (2006). Belief selection in point-based planning algorithms for POMDPs. In Canadian conference on artificial intelligence (pp. 383–394).

Ji, S., Parr, R., Li, H., Liao, X., & Carin, L. (2007). Point-based policy iteration. In National conference on artificial intelligence (AAAI) (pp. 1243–1249). AAAI Press.

Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. In Artificial intelligence (pp. 99–134).

Kaplow, R. (2010). Point-based POMDP solvers: Survey and comparative analysis. Master’s thesis, McGill University.

Kurniawati, H., Hsu, D., & Lee, W. (2008). SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Robotics: Science and systems (RSS).

Littman, M. L. (1996). Algorithms for sequential decision making. PhD thesis, Department of Computer Science, Brown University, Providence, RI. ftp://ftp.cs.brown.edu/pub/techreports/96/cs96-09.ps.Z . Also Technical Report CS-96-09.

Littman, M. L., Cassandra, A. R., & Kaelbling, L. P. (1995). Learning policies for partially observable environments: Scaling up. In International conference on machine learning (ICML) (pp. 362–370).

Littman, M. L., Sutton, R. S., & Singh, S. P. (2001). Predictive representations of state. In Advances in neural information processing systems (NIPS) (pp. 1555–1561).

Littman, M. L., Ravi, N., Fenson, E., & Howard, R. (2004). An instance-based state representation for network repair. In National conference on artificial intelligence (AAAI) (pp. 287–292).

Lovejoy W. S. (1991) Computationally feasible bounds for partially observed Markov decision processes. Operations Research 39(1): 162–175

Ng, A., Harada, D., & Russell, S. (1999). Policy invariance underreward transformations: Theory and application to reward shaping. In International conference on machine learning (ICML).

Pineau, J., & Gordon, G. (2005). POMDP planning for robust robot control. In International symposium on robotics research (ISRR) (Vol. 28, pp. 69–82). Springer.

Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In International joint conference on artificial intelligence (pp. 1025–1032).

Pineau, J., Gordon, G. J., & Thrun, S. (2003). Applying metric-trees to belief-point POMDPs. In Advances in neural information processing systems (NIPS).

Pineau J., Gordon G. J., Thrun S. (2006) Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research (JAIR) 27: 335–380

Poon, L. (2001). A fast heuristic algorithm for decision theoretic planning. Master’s thesis, The Hong-Kong University of Science and Technology.

Porta J. M., Vlassis N., Spaan M. T. J., Poupart P. (2006) Point-based value iteration for continuous POMDPs. Journal of Machine Learning Research 7: 2329–2367

Poupart, P. (2005). Exploiting structure to efficiently solve large scale partially observable Markov decision processes. PhD thesis, Department of Computer Science, University of Toronto.

Poupart, P., & Boutilier, C. (2003). Bounded finite state controllers. In Advances in neural information processing systems (NIPS)

Poupart, P., Kim, K. E., & Kim, D. (2011). Closing the gap: Improved bounds on optimal POMDP solutions. In International conference on planning and scheduling (ICAPS).

Puterman M. L. (1994) Markov decision processes: Discrete stochastic dynamic programming. Wiley, New York, NY

Ross, S., & Chaib-draa, B. (2007). AEMS: An anytime online search algorithm for approximate policy refinement in large POMDPs. In International joint conference on artificial intelligence (IJCAI) (pp. 2592–2598).

Sanner, S., & Kersting, K. (2010). Symbolic dynamic programming for first-order POMDPs. In National conference on artificial intelligence (AAAI).

Shani G. (2010) Evaluating point-based POMDP solvers on multicore machines. IEEE Transactions on Systems, Man, and Cybernetics, Part B 40(4): 1062–1074

Shani, G., & Meek, C. (2009). Improving existing fault recovery policies. In Advances in neural information processing systems (NIPS) (Vol. 22, pp. 1642–1650).

Shani G., Heckerman D., Brafman R. I. (2005) An MDP-based recommender system. Journal of Machine Learning Research 6: 1265–1295

Shani, G., Brafman, R., & Shimony, S. (2007). Forward search value iteration for POMDPs. In International joint conference on artificial intelligence (IJCAI).

Shani G., Brafman R. I., Shimony S. E. (2008) Prioritizing point-based POMDP solvers. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(6): 1592–1605

Shani, G., Poupart, P., Brafman, R. I., & Shimony, S. E. (2008). Efficient ADD operations for point-based algorithms. In International conference on automated scheduling and planning (ICAPS) (pp. 330–337).

Sim, H. S., Kim, K. E., Kim, J. H., Chang, D. S., & Koo, M. W. (2008). Symbolic heuristic search value iteration for factored POMDPs. In National conference on artificial intelligence (pp. 1088–1093).

Singh S. P., Sutton R. S. (1996) Reinforcement learning with replacing eligibility traces. Machine Learning 22: 123–158

Smith, T., & Simmons, R. (2004). Heuristic search value iteration for POMDPs. In Conference on uncertainty in artificial intelligence (UAI).

Smith, T., & Simmons, R. G. (2005). Point-based POMDP algorithms: Improved analysis and implementation. In Conference on uncertainty in artificial intelligence (UAI) (pp. 542–547).

Sondik, E. (1971). The optimal control of partially observable Markov decision processes. PhD thesis, Stanford University.

Sondik E. J. (1978) The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations Research 26: 282–304

Spaan, M., & Vlassis, N. (2004). A point-based POMDP algorithm for robot planning. In IEEE international conference on robotics and automation (ICRA) (pp. 2399–2404).

Spaan M., Vlassis N. (2005) Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24: 195–220

Sutton R. S., Barto A. G. (1998) Reinforcement learning: An introduction. MIT Press, Cambridge, MA

Szepesvari, C. (2009). Reinforcement learning algorithms for MDPs—a survey. Technical report TR09-13, University Of Alberta.

Virin, Y., Shani, G., Shimony, S. E., & Brafman, R. I. (2007). Scaling up: Solving POMDPs through value based clustering. In: National conference on artificial intelligence (AAAI) (pp. 1290–1295).

Wang, C., & Khardon, R. (2010). Relational partially observable mdps. In National conference on artificial intelligence (AAAI).

Williams J. D., Young S. (2007) Partially observable Markov decision processes for spoken dialog systems. Computer Speech & Language 21(2): 393–422

Wingate D., Seppi K. D. (2005) Prioritization methods for accelerating MDP solvers. Journal of Machine Learning Research (JMLR) 6: 851–881

Zhang N. L., Zhang S. (2001) Speeding up the convergence of value iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research (JAIR) 14: 29–51

Zilberstein S. (1996) Using anytime algorithms in intelligent systems. AI Magazine 17: 73–83