Accelerating Reinforcement Learning with Suboptimal Guidance
Tài liệu tham khảo
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., and Zaremba, W. (2017). Hindsight Experience Replay. In 31st Conference on Neural Information Processing Systems (NIPS 2017).
Fujimoto, S., van Hoof, H., and Meger, D. (2018). Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning.
Gu, S., Holly, E., Lillicrap, T., and Levine, S. (2016). Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates. In 2017 IEEE International Conference on Robotics and Automation (ICRA).
Hill, A., Raffin, A., Ernestus, M., Traore, R., Dhariwal, P., Hesse, C, Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., and Wu, Y. (2018). Stable baselines, https://github.com/hill-a/stable-baselines.
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. In J^th International Conference on Learning Representations, ICLR 2016.
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., and Abbeel, P. (2017). Overcoming Exploration in Reinforcement Learning with Demonstrations. In 2018 IEEE International Conference on Robotics and Automation (ICRA).
Ng, AY. and Russell, S.J. (2000). Algorithms for Inverse Reinforcement Learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ‘00, 663-670. San Francisco, CA, USA.
OpenAI, Berner, O, Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C, Farhi, D., Fischer, Q., Hashme, S., Hesse, C, Józefowicz, R., Gray, S., Olsson, C, Pachocki, J., Petrov, M., de Oliveira Pinto, H.P., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., and Zhang, S. (2019). Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680.
Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., Kumar, V., and Zaremba, W. (2018). Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research. arXiv preprint arXiv: 1802.09464.
Ross, S., Gordon, G.J., and Bagnell, J.A. (2011). A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS) 2011.
Schaul, T., Horgan, D., Gregor, K., and Silver, D. (2015). Universal Value Function Approximators. In Proceedings of the 32nd International Conference on Machine Learning.
Silver, 2016, Mastering the game of Go with deep neural networks and tree search, Nature, 529, 484, 10.1038/nature16961
Sun, W., Venkatraman, A., Gordon, G.J., Boots, B., and Bagnell, J.A. (2017). Deeply AggreVaTeD: Differen-tiable Imitation Learning for Sequential Prediction. In Proceedings of the 34th International Conference on Machine Learning.
Sutton, 2018
Todorov, E., Erez, T., and Tassa, Y. (2012). Mu-joco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 5026-5033.
Vecerik, M., Hester, T., Scholz, J., Wang, F., Pietquin, O., Piot, B., Heess, N., Röthorl, T., Lampe, T., and Riedmiller, M. (2018). Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards. arXw:1707.08817 [cs].
Xie, L., Wang, S., Rosa, S., Markham, A., and Trigoni, N. (2018). Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), 6276-6283.