Deep Reinforcement Learning Agent with Varying Actions Strategy for Solving the Eco-Approach and Departure Problem at Signalized Intersections

Transportation Research Record - Tập 2674 Số 8 - Trang 119-131 - 2020
Saleh Mousa1, Sherif Ishak2, Ragab M. Mousa3,4, Julius Codjoe5, Mohammed Elhenawy6
1Booz Allen Hamilton, Washington, DC
2Department of Civil and Environmental Engineering, Old Dominion University, Norfolk, VA
3Faculty of Engineering, Cairo University, Cairo, Egypt
4Ministry of Transport and Communications, Muscat, Sultanate of Oman
5Louisiana Department of Transportation and Development, LTRC, Baton Rouge, LA
6Centre for Accident Research and Road Safety, Queensland University of Technology, Kelvin Grove, Australia

Tóm tắt

Eco-approach and departure is a complex control problem wherein a driver’s actions are guided over a period of time or distance so as to optimize fuel consumption. Reinforcement learning (RL) is a machine learning paradigm that mimics human learning behavior, in which an agent attempts to solve a given control problem by interacting with the environment and developing an optimal policy. Unlike the methods implemented in previous studies for solving the eco-driving problem, RL does not require prior knowledge of the environment to be learned and processed. This paper develops a deep reinforcement learning (DRL) agent for solving the eco-approach and departure problem in the vicinity of signalized intersections for minimization of fuel consumption. The DRL algorithm utilizes a deep neural network for the RL. Novel strategies such as varying actions, prioritized experience replay, target network, and double learning were implemented to overcome the expected instabilities during the training process. The results revealed the significance of the DRL algorithm in reducing fuel consumption. Interestingly, the DRL algorithm was able to successfully learn the environment and guide vehicles through the intersection without red light running violation. On average, the DRL provided fuel savings of about 13.02% with no red light running violations.

Từ khóa


Tài liệu tham khảo

U.S. Energy Information Administration, 2018, June 2018 Monthly Energy Review

10.1109/TITS.2013.2247400

10.1016/j.trd.2009.01.004

10.3141/2427-01

10.1139/l03-017

10.1061/(ASCE)0733-947X(2002)128:4(347)

Davis S. C., 2016, Transportation Energy Data Book

Schrank D., Eisele B., Lomax T., Bak J. 2015 Urban Mobility Scorecard. 2015, p. 31. https://trid.trb.org/view/1367337.

10.1109/ITSC.2009.5309519

10.1109/TCST.2010.2047860

10.1109/TITS.2015.2486140

10.1109/TITS.2017.2766767

10.1016/j.trc.2015.11.001

Yang H., 2016, IEEE Transactions on Intelligent Transportation Systems, 18, 1575

10.1109/TITS.2016.2515023

10.1016/j.trc.2017.04.001

10.1109/ITSC.2013.6728538

Gao J., 2017, arXiv.org > cs > arXiv:1705.02755

10.1038/nature14236

10.1007/978-1-4419-6142-6_7

Watkins, 1989, Learning from Delayed Reward

10.1016/j.trd.2011.05.008

10.1177/0361198119839960

Hasselt H., 2016, Proc., 30th AAAI Conference on Artificial Intelligence, AAAI Press, 2094

GitHub - Wolfedwolf/SUMO_DQL. https://github.com/wolfedwolf/SUMO_DQL. Accessed July 30, 2019.