Optimally solving Markov decision processes with total expected discounted reward function: Linear programming revisited

Computers & Industrial Engineering - Tập 87 - Trang 311-316 - 2015
Oguzhan Alagoz1, Mehmet U.S. Ayvaci2, Jeffrey T. Linderoth1
1Department of Industrial and Systems Engineering, University of Wisconsin, Madison, WI, United States
2Jindal School of Management, University of Texas at Dallas, Dallas, TX, United States

Tài liệu tham khảo

Agrawal, P., Signh, J. P., Alpcan, T., & Sharma, V. (2007). In 2007 IEEE international symposium world of wireless, mobile and multimedia networks. Akselrod, D., & Kirubarajan, T. (2008). Modified value iteration algorithm and dynamic element matching based mdp for distributed data fusion and sensor management. In 2008 International conference on information fusion. Alagoz, 2004, The optimal timing of living-donor liver transplantation, Management Science, 50, 1420, 10.1287/mnsc.1040.0287 Alagoz, 2007, Choosing among cadaveric and living-donor livers, Management Science, 53, 1702, 10.1287/mnsc.1070.0726 Alagoz, 2007, Determining the acceptance of cadaveric livers using an implicit model of the waiting list, Operations Research, 55, 24, 10.1287/opre.1060.0329 Al-Zubaidy, H., Talim, J., & Lambadaris, I. (2007). Dynamic scheduling in high speed downlink packet access networks: Heuristic approach. In 2007 Military communications conference. Al-Zubaidy, 2010, Optimial scheduling in high-speed downlink packet access networks, ACM Transactions on Modeling and Computer Simulation, 21, 3:1, 10.1145/1870085.1870088 Arruda, 2011, Approximate dynamic programming via direct search space of value function approximations, European Journal of Operational Research, 211, 343, 10.1016/j.ejor.2010.11.019 Asadian, A., Kermani, M. R., & Patel, R. V. (2010). Accelerated needle steering using partitioned value iteration, In 2010 American control conference. Bello, D., & Riano, G. (2006). Linear programming solvers for markov decision processes. In 2006 IEEE systems and information engineering design symposium. Bixby, 2002, Solving real-world linear programs: A decade and more of progress, Operations Research, 50, 3, 10.1287/opre.50.1.3.17780 Buongiorno, 2011, Further generalization of faustmann’s formula for stochastic interest rates, Journal of Forest Economics, 17, 248, 10.1016/j.jfe.2011.03.002 Chamberland, J. F., Ko, Y. M., & Gautam, N. (2007). Optimal policies for control of peers in online multimedia services. In 2007 IEEE conference on decision and control. Chang, H. S. & Chong, E. K. P. (2005). On solving controlled markov set-chains via multi-policy improvement. In 2005 IEEE conference on decision and control, European control conference. Chen, M., & Cheng, C. (2007). Sensitivity analysis for the optimal minimal repair/replacement policies under the framework of Markov decision process. In 2007 IEEM international conference on industrial engineering and engineering management. Chen, 2011, Indirect reciprocity game modelling for cooperation stimulation in cognitive networks, IEEE Transactions on Communications, 59, 159, 10.1109/TCOMM.2010.110310.100143 Demmel, 1999, A supernodal approach to sparse partial pivoting, SIAM Journal on Matrix Analysis and Applications, 20, 720, 10.1137/S0895479895291765 D’Epenoux, 1963, A probabilistic production and inventory problem, Management Science, 10, 98, 10.1287/mnsc.10.1.98 Erenay, 2014, Optimizing colonoscopy screening for colorectal cancer prevention and surveillance, Manufacturing and Service Operations Management, 16, 381, 10.1287/msom.2014.0484 Farran, 2009, Comparative analysis of life-cycle costing for rehabilitating infrastructure systems, Journal of Performance of Constructed Facilities, 23, 320, 10.1061/(ASCE)CF.1943-5509.0000038 Farrokh, 2009, Optimal adaptive modulation and coding with switching costs, IEEE Transactions on Communications, 57, 697, 10.1109/TCOMM.2009.03.070115 Flapper, 2012, Control of a production-inventory system with returns under imperfect advance return information, European Journal of Operational Research, 218, 392, 10.1016/j.ejor.2011.10.051 Glazebrook, 2005, Index policies for the maintenance of a collection of machines by a set of repairmen, European Journal of Operational Research, 165, 267, 10.1016/j.ejor.2004.01.036 Grizzle, 2008, Shortest path stochastic control for hybrid electric vehicles, Internation Journal of Robust and Nonlinear Control, 18, 1409, 10.1002/rnc.1288 Idoumghar, L., & Schott, R. (2006). A new hybrid ga-mdp algorithm for the frequency assignment problem. In 2006 IEEE international conference on tools with artificial intelligence. Kallenberg, 1983 Kuppuswamy, 2005, On subscription admission control for network service provision, IEEE Communications Letters, 9, 66, 10.1109/LCOMM.2005.1375244 Kurt, 2011, The structure of optimal statin initiation policies for patients with Type 2 diabetes, IIE Transactions on Healthcare Systems Engineering, 1, 49, 10.1080/19488300.2010.550180 Kurt, 2010, Optimally maintaining a markovian deteriorating system with limited imperfect repairs, European Journal of Operational Reserach, 205, 368, 10.1016/j.ejor.2010.01.009 Le Ny, J., & Feron, E. (2006). Restless bandits with swtiching costs: Linear programming relaxations, performance bounds and limited lookahead policies. In 2006 American control conference. Littman, M. L., Dean, T. L., & Kaelbling, L. P. (1995). On the complexity of solving Markov decision problems. In Proceedings of the eleventh conference on uncertainty in artificial intelligence (pp. 394–402). Citeseer. Min, 2010, An eleective surgery scheduling problem considering patient priority, Computers and Operations Research, 37, 1091, 10.1016/j.cor.2009.09.016 Morton, 1971, On the asymptotic convergence rate of cost differences for Markovian decision processes, Operations Research, 19, 244, 10.1287/opre.19.1.244 Mosharaf, 2005, Optimal resource allocation and fairness control in all-optical wdm networks, IEEE Journal on Selected Areas in Communications, 23, 1496, 10.1109/JSAC.2005.851791 Powell, 2007 Puterman, 1994 Puterman, 1978, Modified policy iteration algorithms for discounted Markov decision problems, Management Science, 24, 1127, 10.1287/mnsc.24.11.1127 Rezaei Yousefi, 2012, Optimal intervention strategies for therapeutic methods with fixed-length duration of drug effectiveness, IEEE Transactions on Signal Processing, PP Sandıkçı, 2008, Estimating the patients price of privacy in liver transplantation, Operations Research, 56, 1393, 10.1287/opre.1080.0648 Schaefer, 2004, Modeling medical treatment using Markov decision processes, 597 Sharna, S. A., Amin, M. R., & Murshed, M. (2011). Call admission control policy for multiclass traffic in heterogeneous wireless networks. In 2011 International symposium on communications and information technologies. Shechter, 2008, The optimal time to initiate HIV therapy under ordered health states, Operations Research, 56, 20, 10.1287/opre.1070.0480 Stevens-Navarro, 2008, An mdp-based vertical handoff decision algorithm for heterogeneous wireless networks, IEEE Transactions on Vehicular Technology, 57, 1243, 10.1109/TVT.2007.907072 Sun, 2011, A constrained mdp-based vertical handoff decision algorithm for 4g heterogenous wireless networks, Wireless Networks, 17, 1063, 10.1007/s11276-011-0335-x Viet, 2012, Using markov decision processes to define an adaptive strategy to control the spread of an animal disease, Computers and Electronics in Agriculture, 80, 71, 10.1016/j.compag.2011.10.015 Wang, L., & Schonfeld, D. (2010). Game theoretic model for control of gene regulatory networks. In 2010 International conference on acoustics speech and signal processing. White, 1993, Markov decision processes: Discounted expected reward or average expected reward?, Journal of Mathematical Analysis and Applications, 172, 375, 10.1006/jmaa.1993.1031 Ye, Y. (2015). The simplex method is strongly polynomial for the Markov decision problem with a fixed discount rate. Working paper, <http://www.stanford.edu/yyye/simplexmdp.pdf> Accessed 10.02.15. Zobel, 2005, An empirical study of policy confergence in markov decision process value iteration, Computers and Operations Research, 32, 127, 10.1016/S0305-0548(03)00207-7