Basic Ideas for Event-Based Optimization of Markov Systems
Tóm tắt
Từ khóa
Tài liệu tham khảo
Barto, A., and Mahadevan, S. 2003. Recent advances in hierarchical reinforcement learning, special issue on reinforcement learning. Discret. Event Dyn. Syst. Theory Appl. 13: 41?77.
Baxter, J., and Bartlett, P. L. 2001. Infinite-horizon policy-gradient estimation. J. Artif. Intell. Res. 15: 319?350.
Baxter, J., Bartlett, P. L., and Weaver, L. 2001. Experiments with infinite-horizon policy-gradient estimation. J. Artif. Intell. Res. 15: 351?381.
Bertsekas, D. P. 1995. Dynamic Programming and Optimal Control, Volume I and II. Belmont, MA: Athena Scientific.
Cao, X. R. 1994. Realization Probabilities: The Dynamics of Queueing Systems. New York: Springer-Verlag.
Cao, X. R. 1998. The relation among potentials, perturbation analysis, Markov decision processes, and other topics. J. Discret. Event Dyn. Syst. 8: 71?87.
Cao, X. R. 1999. Single sample path based optimization of Markov chains. J. Optim. Theory Appl. 100(3): 527?548.
Cao, X. R. 2000. A unified approach to Markov decision problems and performance sensitivity analysis. Automatica 36: 771?774.
Cao, X. R. 2004a. The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Trans. Automat. Contr. 49: 2129?2142.
Cao, X. R. 2004b. A basic formula for on-line policy gradient algorithms, IEEE Trans. Automat. Contr. to appear.
Cao, X. R. 2004c. Event-based optimization of Markov systems. Manuscript to be submitted.
Cao, X. R., and Chen, H. F. 1997. Perturbation realization, potentials and sensitivity analysis of Markov processes. IEEE Trans. Automat. Contr. 42: 1382?1393.
Cao, X. R., and Guo, X. 2004. A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: Multichain cases. Automatica 40: 1749?1759.
Cao, X. R., and Wan, Y. W. 1998. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization. IEEE Trans. Control Syst. Technol. 6: 482?494.
Cao, X. R., Yuan, X. M., and Qiu, L. 1996. A single sample path-based performance sensitivity formula for Markov chains. IEEE Trans. Automat. Contr. 41: 1814?1817.
Cao, X. R., Ren, Z. Y., Bhatnagar, S., Fu, M., and Marcus, S. 2002. A time aggregation approach to Markov decision processes. Automatica 38: 929?943.
Chong, E. K. P., and Ramadge, P. J. 1994. Stochastic optimization of regenerative systems using infinitesimal perturbation analysis. IEEE Trans. Automat. Contr. 39: 1400?1410.
Cooper, W. L., Henderson, S. G., and Lewis, M. E. 2003. Convergence of simulation-based policy iteration. Probab. Eng. Inf. Sci. 17: 213?234.
Dijk, N. V. 1993. Queueing Networks and Product Forms: A Systems Approach. Chichester: John Willey and Sons.
Fang, H. T., and Cao, X. R. 2004. Potential-based on-line policy iteration algorithms for Markov decision processes. IEEE Trans. Automat. Contr. 49: 493?505.
Ho, Y. C., and Cao, X. R. 1983. Perturbation analysis and optimization of queueing networks. J. Optim. Theory Appl. 40(4): 559?582.
Ho, Y. C., and Cao, X. R. 1991. Perturbation Analysis of Discrete-Event Dynamic Systems. Boston: Kluwer Academic Publisher.
Ho, Y. C., Zhao, Q. C., and Pepyne, D. L. 2003. The no free lunch theorem, complexity and computer security. IEEE Trans. Automat. Contr. 48: 783?793.
Marbach, P., and Tsitsiklis, T. N. 2001. Simulation-based optimization of Markov reward processes. IEEE Trans. Automat. Contr. 46: 191?209.
Meuleau, N., Peshkin, L., Kim, K. -E., and Kaelbling, P. L. 1999. Learning finite-state controllers for partially observable environments. Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence.
Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: Wiley.
Suri, R., and Leung, Y. T. 1989. Single run optimization of discrete event simulations?An empirical study using the M/M/1 queue. IIE Trans. 21: 35?49.
Theocharous, G., and Kaelbling, P. L. 2004. Approximate planning in POMDPS with macro-actions. Advances in Neural Information Processing Systems 16 (NIPS-03). Cambridge, MA: MIT Press. 775-782.
Watkins, C., and Dayan, P. 1992. Q-learning. Mach. Learn. 8: 279?292.