Solving H-horizon, stationary Markov decision problems in time proportional to log(H)

Operations Research Letters - Tập 9 - Trang 287-297 - 1990
Paul Tseng1
1Laboratory for Information and Decision Systems, Massachussetts Institute of Technology, Cambridge, MA 02139, USA

Tài liệu tham khảo

Bertsekas, 1987 Bertsekas, 1989 Cook, 1981, Towards a complexity theory of synchronous parallel computation, Enseign. Math., 2, 99 Federgruen, 1978, Discounted and undiscounted value-iteration in Markov decision problems: A survey, 23 Fox, 1989 Garey, 1979 Householder, 1964 Howard, 1960 Karmarkar, 1984, A new polynomial-time algorithm for linear programming, Combinatorica, 4, 373, 10.1007/BF02579150 Lewis, 1981 1989 Orlin, 1981, The complexity of dynamic languages and dynamic optimization problems, 218 Papadimitriou, 1985, Games against nature, J. Comput. System Sci., 31, 288, 10.1016/0022-0000(85)90045-5 Papadimitriou, 1982 Papadimitriou, 1987, The complexity of Markov decision processes, Math. Oper. Res., 12, 441, 10.1287/moor.12.3.441 Parberry, 1987 Veinott, 1969, Discrete dynamic programming with sensitive discount optimality criterion, Ann. Math. Stat., 40, 1635, 10.1214/aoms/1177697379 White, 1978, Elimination of nonoptimal actions in Markov decision processes, 131