Accelerating Reinforcement Learning with Suboptimal Guidance

IFAC-PapersOnLine - Tập 53 - Trang 8090-8096 - 2020
Eivind Bøhn1, Signe Moe1,2, Tor Arne Johansen2
1SINTEF Digital, Oslo, Norway
2Centre for Autonomous Marine Operations and Systems, Department of Engineering Cybernetics, Norwegian University of Science and Technology, Trondheim, Norway

Tài liệu tham khảo

Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., and Zaremba, W. (2017). Hindsight Experience Replay. In 31st Conference on Neural Information Processing Systems (NIPS 2017). Fujimoto, S., van Hoof, H., and Meger, D. (2018). Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning. Gu, S., Holly, E., Lillicrap, T., and Levine, S. (2016). Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates. In 2017 IEEE International Conference on Robotics and Automation (ICRA). Hill, A., Raffin, A., Ernestus, M., Traore, R., Dhariwal, P., Hesse, C, Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., and Wu, Y. (2018). Stable baselines, https://github.com/hill-a/stable-baselines. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. In J^th International Conference on Learning Representations, ICLR 2016. Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., and Abbeel, P. (2017). Overcoming Exploration in Reinforcement Learning with Demonstrations. In 2018 IEEE International Conference on Robotics and Automation (ICRA). Ng, AY. and Russell, S.J. (2000). Algorithms for Inverse Reinforcement Learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ‘00, 663-670. San Francisco, CA, USA. OpenAI, Berner, O, Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C, Farhi, D., Fischer, Q., Hashme, S., Hesse, C, Józefowicz, R., Gray, S., Olsson, C, Pachocki, J., Petrov, M., de Oliveira Pinto, H.P., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., and Zhang, S. (2019). Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680. Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., Kumar, V., and Zaremba, W. (2018). Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research. arXiv preprint arXiv: 1802.09464. Ross, S., Gordon, G.J., and Bagnell, J.A. (2011). A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS) 2011. Schaul, T., Horgan, D., Gregor, K., and Silver, D. (2015). Universal Value Function Approximators. In Proceedings of the 32nd International Conference on Machine Learning. Silver, 2016, Mastering the game of Go with deep neural networks and tree search, Nature, 529, 484, 10.1038/nature16961 Sun, W., Venkatraman, A., Gordon, G.J., Boots, B., and Bagnell, J.A. (2017). Deeply AggreVaTeD: Differen-tiable Imitation Learning for Sequential Prediction. In Proceedings of the 34th International Conference on Machine Learning. Sutton, 2018 Todorov, E., Erez, T., and Tassa, Y. (2012). Mu-joco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 5026-5033. Vecerik, M., Hester, T., Scholz, J., Wang, F., Pietquin, O., Piot, B., Heess, N., Röthorl, T., Lampe, T., and Riedmiller, M. (2018). Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards. arXw:1707.08817 [cs]. Xie, L., Wang, S., Rosa, S., Markham, A., and Trigoni, N. (2018). Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), 6276-6283.