Deep Reinforcement Learning for Autonomous Driving: A Survey

IEEE Transactions on Intelligent Transportation Systems - Tập 23 Số 6 - Trang 4909-4926 - 2022
Bangalore Ravi Kiran1, Ibrahim Sobh2, Victor Talpaert3,4, Patrick Mannion5, Ahmad A. Al Sallab1, Senthil Yogamani6, Patrick Pérez7
1Navya, Courbevoie, France
2Valeo Cairo AI Team, Giza, Egypt
3AKKA Technologies, Guyancourt, France
4ENSTA Paris, Institut Polytechnique de Paris, Palaiseau Cedex, France
5School of Computer Science, National University of Ireland, Galway, Ireland
6Valeo Vision Systems, Tuam, Ireland
7valeo.ai, Paris, France

Tóm tắt

Từ khóa


Tài liệu tham khảo

Sutton, 2018, Reinforcement Learning: An Introduction

10.5220/0007520300002108

10.1109/ITSC.2017.8317714

10.1109/ITSC.2019.8917447

10.1109/ITSC.2018.8569744

10.1109/ITSC.2018.8569665

10.1109/ITSC.2019.8917178

10.1109/ITSC.2019.8917043

10.1109/ICCV.2019.00940

10.1109/CVPRW.2018.00062

10.1017/CBO9780511546877

10.1177/02783640122067453

Team, 2020, Dimensions Publication Trends

10.1109/TCST.2008.2012116

10.1109/TIV.2016.2578706

10.1146/annurev-control-060117-105157

10.1109/TITS.2019.2962338

10.1007/978-1-4613-2279-5

Russell, 2009, Artificial Intelligence: A Modern Approach

Hong, 2018, Diversity-driven exploration strategy for deep reinforcement learning, Advances in Neural Information Processing Systems, 31, 10489

van Otterlo, 2012, Reinforcement Learning: State-of-the-Art

10.1002/SERIES1345

10.1023/A:1022676722315

10.1038/nature14236

Watkins, 1989, Learning from delayed rewards

Silver, Deterministic policy gradient algorithms, Proc. ICML, 387

10.1007/bf00992696

Schulman, Trust region policy optimization, Proc. Int. Conf. Mach. Learn., 1889

Schulman, 2017, Proximal policy optimization algorithms, arXiv:1707.06347

Lillicrap, Continuous control with deep reinforcement learning, Proc. 4th Int. Conf. Learn. Represent. (ICLR), 1

Mnih, Asynchronous methods for deep reinforcement learning, Proc. Int. Conf. Mach. Learn., 1928

Haarnoja, Reinforcement learning with deep energy-based policies, Proc. 34th Int. Conf. Mach. Learn. (JMLR), 70, 1352

Haarnoja, 2018, Soft actor-critic algorithms and applications, arXiv:1812.05905

10.1016/B978-1-55860-141-3.50030-4

10.1162/153244303765208377

10.1145/1390156.1390278

Rummery, 1994, On-line Q-learning using connectionist systems

Sutton, 2018, Reinforcement Learning an Introduction

Bellman, 1957, Dynamic Programming

10.1162/neco.1994.6.2.215

10.1038/nature16961

10.18653/v1/D15-1001

Schaul, 2015, Prioritized experience replay, arXiv:1511.05952

10.1609/aaai.v30i1.10295

Wang, 2015, Dueling network architectures for deep reinforcement learning, arXiv:1511.06581

Hausknecht, 2015, Deep recurrent Q-learning for partially observable MDPs, arXiv:1507.06527

Skinner, 1938, The Behavior of Organisms: An Experimental Analysis

10.1007/978-1-4899-7687-1_966

Randløv, Learning to drive a bicycle using reinforcement learning and shaping, Proc. 15th Int. Conf. Mach. Learn., 463

10.1209/epl/i2000-00208-x

Ng, Policy invariance under reward transformations: Theory and application to reward shaping, Proc. 16th Int. Conf. Mach. Learn., 278

Devlin, Theoretical considerations of potential-based reward shaping for multi-agent systems, Proc. 10th Int. Conf. Auto. Agents Multiagent Syst. (AAMAS), 225

10.1016/j.neucom.2017.05.090

10.1145/2739480.2754770

Mannion, A theoretical and empirical analysis of reward transformations in multi-objective stochastic games, Proc. 16th Int. Conf. Auto. Agents Multiagent Syst. (AAMAS), 1

10.1007/978-3-642-14435-67

Mannion, Multi-objective dynamic dispatch optimisation using multi-agent reinforcement learning, Proc. 15th Int. Conf. Auto. Agents Multiagent Syst. (AAMAS), 1345

Mason, Applying multi-agent reinforcement learning to watershed management, Proc. Adapt. Learn. Agents Workshop (AAMAS), 1

Pareto, 1906, Manual Political Economy

10.1613/jair.3987

10.1007/s10458-019-09433-x

10.1016/j.neunet.2018.07.006

Raffin, 2019, Decoupling feature extraction from policy learning: Assessing benefits of state representation learning in goal based robotics, arXiv:1901.08651

10.1007/s13218-015-0356-1

10.1038/nature24270

10.1145/1102351.1102352

Kang, Policy optimization with demonstrations, Proc. Int. Conf. Mach. Learn., 2474

10.1609/aaai.v32i1.11757

Ibrahim, End-to-end framework for fast learning asynchronous agents, Proc. 32nd Conf. Neural Inf. Process. Syst., Imitation Learn. Challenges Robot. Workshop (NeurIPS)

10.1145/1015330.1015430

Ng, Algorithms for inverse reinforcement learning, Proc. ICML, 2

Ho, Generative adversarial imitation learning, Proc. Adv. Neural Inf. Process. Syst., 4565

10.3156/jsoft.29.5_177_2

10.2352/ISSN.2470-1173.2019.15.AVM-048

Leurent, 2018, A survey of state-action representations for autonomous driving

10.1109/CVPR.2017.376

10.1016/S0004-3702(99)00052-1

Dosovitskiy, CARLA: An open urban driving simulator, Proc. 1st Annu. Conf. Robot Learn., 1

Li, Urban driving with multi-objective deep reinforcement learning, Proc. 18th Int. Conf. Auto. Agents MultiAgent Syst., 359

Kardell, 2017, Autonomous vehicle control via deep reinforcement learning

10.1109/ITSC.2019.8917306

Sallab, End-to-end deep reinforcement learning for lane keeping assist, Proc. MLITS, NIPS Workshop, 2, 1

10.2352/ISSN.2470-1173.2017.19.AVM-023

10.1109/IVS.2018.8500556

10.1109/ITSC.2017.8317735

10.1109/TITS.2011.2106158

10.1109/ICRA.2018.8461233

Keselman, 2018, Reinforcement learning with A* and a deep heuristic, arXiv:1811.07745

Zhan, 2019, INTERACTION dataset: An INTERnational, adversarial and cooperative moTION dataset in interactive driving scenarios with semantic maps, arXiv:1910.03088

10.1109/ICRA.2019.8793742

Watter, Embed to control: A locally linear latent dynamics model for control from raw images, Proc. Adv. Neural Inf. Process. Syst., 2746

10.1016/j.ifacol.2015.12.271

Chiappa, Recurrent environment simulators, Proc. 5th Int. Conf. Learn. Represent., ICLR, 1

10.1146/annurev-control-053018-023825

Mania, Simple random search of static linear policies is competitive for reinforcement learning, Proc. 31st Annu. Conf. Adv. Neural Inf. Process. Syst. (NeurIPS), 1800

Wymann, 2000, Torcs, the Open Racing Car Simulator

10.1007/978-3-319-67361-5_40

10.1109/IROS.2004.1389727

10.1109/ITSC.2018.8569938

Quiter, 2018, Deepdrive/Deepdrive: 2.0

2019, Drive Constellation Now Available

Santara, 2019, Multi-Agent Autonomous Driving Simulator Built on Top of TORCS

Wu, 2017, Flow: Architecture and benchmarking for reinforcement learning in traffic control, arXiv:1710.05465

Leurent, 2019, A Collection of Environments for Autonomous Driving and Tactical Decision-Making Tasks

10.3390/s19030648

10.1109/ICRA.2014.6907423

German Ros, 2019, Carla Autonomous Driving Challenge

Najm, 2007, Pre-crash scenario typology for crash avoidance research

Pomerleau, Alvinn: An autonomous land vehicle in a neural network, Proc. Adv. Neural Inf. Process. Syst., 1

10.1162/neco.1991.3.1.88

Bojarski, End to end learning for self-driving cars, Proc. NIPS Deep Learn. Symp., 1

Bojarski, 2017, Explaining how a deep neural network trained with End-to-End learning steers a car, arXiv:1704.07911

10.1109/ICRA.2015.7139555

Sharifzadeh, Learning to drive using inverse reinforcement learning and deep Q-networks, Proc. NIPS Workshops, 1

10.1609/aaai.v32i1.11694

10.1109/ICRA.2019.8793740

10.1109/ICRA.2018.8460875

10.1109/ICRA.2018.8460528

10.5244/C.31.11

10.1109/ICRA.2019.8793668

10.1109/LRA.2019.2894216

10.1109/ITSC.2017.8317839

Wang, Sample efficient actor-critic with experience replay, Proc. 5th Int. Conf. Learn. Represent., ICLR, 1

Liaw, 2017, Composing meta-policies for autonomous driving using hierarchical deep reinforcement learning, arXiv:1711.01503

Taylor, 2009, Transfer learning for reinforcement learning domains: A survey, J. Mach. Learn. Res., 10, 1633

Isele, Transferring autonomous driving knowledge on simulated and real intersections, Proc. Lifelong Learn., Reinforcement Learn. Approach, ICML Workshop (NeurIPS)

Wang, Learning to reinforcement learn, Proc. Complete CogSci

Duan, 2016, RL2: Fast reinforcement learning via slow reinforcement learning, arXiv:1611.02779

Finn, Model-agnostic meta-learning for fast adaptation of deep networks, Proc. 34th Int. Conf. Mach. Learn., 70, 1126

Nichol, 2018, On first-order meta-learning algorithms, arXiv:1803.02999

Al-Shedivat, Continuous adaptation via meta-learning in nonstationary and competitive environments, Proc. 6th Int. Conf. Learn. Represent., ICLR, 1

Ha, Recurrent world models facilitate policy evolution, Proc. Adv. Neural Inf. Process. Syst., 1

Ross, Efficient reductions for imitation learning, Proc. 13th Int. Conf. Artif. Intell. Statist., 661

10.15607/RSS.2019.XV.031

10.1109/ICCVW.2019.00284

Chentanez, Intrinsically motivated reinforcement learning, Proc. Adv. Neural Inf. Process. Syst., 1281

10.1109/CVPRW.2017.70

Burda, 2018, Large-scale study of curiosity-driven learning, arXiv:1808.04355

10.1609/aaai.v31i1.10857

Shalev-Shwartz, 2016, Safe, multi-agent, reinforcement learning for autonomous driving, arXiv:1610.03295

Xiong, 2016, Combining deep reinforcement learning and safety based control for autonomous driving, arXiv:1612.00147

García, 2015, A comprehensive survey on safe reinforcement learning, J. Mach. Learn. Res., 16, 1437

10.1109/IJCNN48605.2020.9207663

10.1007/978-3-030-47358-7_7

10.24963/ijcai.2019/832

10.1109/TITS.2019.2893683

Dhariwal, 2017, OpenAI Baselines

Juliani, 2018, Unity: A general platform for intelligent agents, arXiv:1809.02627

Guadarrama, 2018, TF-Agents: A Library for Reinforcement Learning in Tensorflow

Stooke, 2019, Rlpyt: A research code base for deep reinforcement learning in PyTorch, arXiv:1909.01500

Osband, 2019, Behaviour suite for reinforcement learning, arXiv:1908.03568