Application and Experimental Verification of a Delay-Aware Deep Reinforcement Learning Method in End-to-End Autonomous Driving Control

Kazuya Emura; Ryuzo Hayashi

doi:10.20965/jrm.2025.p1024

single-rb.php

« previous

JRM Vol.37 No.5 pp. 1024-1033

doi: 10.20965/jrm.2025.p1024

(2025)

Paper:

Views over last 60 days: 21

Application and Experimental Verification of a Delay-Aware Deep Reinforcement Learning Method in End-to-End Autonomous Driving Control

Kazuya Emura and Ryuzo Hayashi

Tokyo University of Science
6-3-1 Niijuku, Katsushika-ku, Tokyo 125-8585, Japan

Received:

April 7, 2025

Accepted:

June 9, 2025

Published:

October 20, 2025

Keywords:

autonomous driving, reinforcement learning, end-to-end

Abstract

This study investigates the effect of sensor-to-actuation delay in end-to-end autonomous driving using deep reinforcement learning (DRL). Although DRL-based methods have demonstrated success in tasks such as lane keeping and obstacle avoidance, numerous challenges remain in real-world applications. A key issue is that real-world latency can violate the assumptions of the Markov decision process (MDP), resulting in degraded performance. To address this problem, a method is introduced wherein past actions are appended to the current state, thereby preserving the MDP property even under delayed control signals. The efficacy of this approach was evaluated by comparing three scenarios in simulation: no delay, delay without compensation, and delay compensation by including past actions. The results revealed that the scenario without delay compensation failed to learn effectively. Subsequently, the trained policy was deployed on a 1/10 scale experimental vehicle, demonstrating that explicitly modeling delay significantly enhances both stability and reliability in simulation and in physical trials. Moreover, when a longer delay was imposed, the learning process became slower and the action-value estimation was less stable, yet the simulated vehicle still performed successfully. Although experimental vehicle tests under extended delays exhibited some instability, it was confirmed that the approach accounted for such delays to a certain extent, thereby compensating effectively for latency in real-world environments.

Comparison of simulation and experiment

Cite this article as:

K. Emura and R. Hayashi, “Application and Experimental Verification of a Delay-Aware Deep Reinforcement Learning Method in End-to-End Autonomous Driving Control,” J. Robot. Mechatron., Vol.37 No.5, pp. 1024-1033, 2025.

Data files:

References

[1] S. A. Bagloee, M. Tavana, M. Asadi, and T. Oliver, “Autonomous vehicles: Challenges, opportunities, and future implications for transportation policies,” J. Mod. Transp., Vol.24, No.4, pp. 284-303, 2016. https://doi.org/10.1007/s40534-016-0117-3
[2] D. J. Fagnant and K. Kockelman, “Preparing a nation for autonomous vehicles: Opportunities, barriers and policy recommendations,” Transp. Res. A: Policy Pract., Vol.77, pp. 167-181, 2015. https://doi.org/10.1016/j.tra.2015.04.003
[3] Z. Chen and X. Huang, “End-to-end learning for lane keeping of self-driving cars,” 2017 IEEE Intell. Veh. Symp. (IV), pp. 1856-1860, 2017. https://doi.org/10.1109/IVS.2017.7995975.
[4] E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of autonomous driving: Common practices and emerging technologies,” IEEE Access, Vol.8, pp. 58443-58469, 2020. https://doi.org/10.1109/ACCESS.2020.2983149
[5] M. Bojarski et al., “End to end learning for self-driving cars,” arXiv:1604.07316, 2016. https://doi.org/10.48550/arXiv.1604.07316
[6] A. E. Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep reinforcement learning framework for autonomous driving,” Proc. IS&T Int. Symp. Electron. Imaging: Auton. Veh. Mach., pp. 70-76, 2017. https://doi.org/10.2352/ISSN.2470-1173.2017.19.AVM-023
[7] A. Carballo et al., “End-to-end autonomous mobile robot navigation with model-based system support,” J. Robot. Mechatron., Vol.30, No.4, pp. 563-583, 2018. https://doi.org/10.20965/jrm.2018.p0563
[8] V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, Vol.518, No.7540, pp. 529-533, 2015. https://doi.org/10.1038/nature14236
[9] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, Vol.529, No.7587, pp. 484-489, 2016. https://doi.org/10.1038/nature16961
[10] D. Silver et al., “Mastering the game of Go without human knowledge,” Nature, Vol.550, No.7676, pp. 354-359, 2017. https://doi.org/10.1038/nature24270
[11] D. Silver et al., “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” arXiv:1712.01815, 2017. https://doi.org/10.48550/arXiv.1712.01815
[12] P. Wolf et al., “Learning how to drive in a real world simulation with deep Q-networks,” 2017 IEEE Intell. Veh. Symp. (IV), pp. 244-250, 2017. https://doi.org/10.1109/IVS.2017.7995727
[13] C.-J. Hoel, K. Wolff, and L. Laine, “Automated speed and lane change decision making using deep reinforcement learning,” 2018 21st Int. Conf. Intell. Transp. Syst. (ITSC), pp. 2148-2155, 2018. https://doi.org/10.1109/ITSC.2018.8569568
[14] A. Kendall et al., “Learning to drive in a day,” 2019 Int. Conf. Robot. Autom. (ICRA), pp. 8248-8254, 2019. https://doi.org/10.1109/ICRA.2019.8793742
[15] X. Xiong, J. Wang, F. Zhang, and K. Li, “Combining deep reinforcement learning and safety based control for autonomous driving,” arXiv.1612.00147, 2016. https://doi.org/10.48550/arXiv.1612.00147
[16] T. Suzuki et al., “Acquisition of cooperative control of multiple vehicles through reinforcement learning utilizing vehicle-to-vehicle communication and map information,” J. Robot. Mechatron., Vol.36, No.3, pp. 642-657, 2024. https://doi.org/10.20965/jrm.2024.p0642
[17] J. Tobin et al., “Domain randomization for transferring deep neural networks from simulation to the real world,” 2017 IEEE/RSJ Int. Conf. Intell. Robots and Syst. (IROS), pp. 23-30, 2017. https://doi.org/10.1109/IROS.2017.8202133
[18] D. Kalaria, Q. Lin, and J. M. Dolan, “Delay-aware robust control for safe autonomous driving,” 2022 IEEE Intell. Veh. Symp. (IV), pp. 1565-1571, 2022. https://doi.org/10.1109/IV51971.2022.9827111
[19] D. Kalaria, Q. Lin, and J. M. Dolan, “Delay-aware robust control for safe autonomous driving and racing,” IEEE Trans. Intell. Transp. Syst., Vol.25, No.7, pp. 7140-7150, 2024. https://doi.org/10.1109/TITS.2023.3339708
[20] F. Naseer, M. N. Khan, A. Rasool, and N. Ayub, “A novel approach to compensate delay in communication by predicting teleoperator behaviour using deep learning and reinforcement learning to control telepresence robot,” Electron. Lett., Vol.59, No.9, Article No.e12806, 2023. https://doi.org/10.1049/ell2.12806
[21] T. J. Walsh, A. Nouri, L. Li, and M. L. Littman, “Learning and planning in environments with delayed feedback,” Auton. Agents Multi-Agent Syst., Vol.18, No.1, pp. 83-105, 2009. https://doi.org/10.1007/s10458-008-9056-7
[22] M. Hirano et al., “A transparent AI-based approach for controlling processes with time delays: With its experimental evaluation in a real-world plant operation,” Trans. Jpn. Soc. Artif. Intell., Vol.39, No.6, pp. A-O53_1-9, 2024 (in Japanese). https://doi.org/10.1527/tjsai.39-6-A-O53
[23] V. Mnih et al., “Playing Atari with deep reinforcement learning,” arXiv:1312.5602, 2013. https://doi.org/10.48550/arXiv.1312.5602

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] S. A. Bagloee, M. Tavana, M. Asadi, and T. Oliver, “Autonomous vehicles: Challenges, opportunities, and future implications for transportation policies,” J. Mod. Transp., Vol.24, No.4, pp. 284-303, 2016. https://doi.org/10.1007/s40534-016-0117-3

[2] [2] D. J. Fagnant and K. Kockelman, “Preparing a nation for autonomous vehicles: Opportunities, barriers and policy recommendations,” Transp. Res. A: Policy Pract., Vol.77, pp. 167-181, 2015. https://doi.org/10.1016/j.tra.2015.04.003

[3] [3] Z. Chen and X. Huang, “End-to-end learning for lane keeping of self-driving cars,” 2017 IEEE Intell. Veh. Symp. (IV), pp. 1856-1860, 2017. https://doi.org/10.1109/IVS.2017.7995975.

[4] [4] E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of autonomous driving: Common practices and emerging technologies,” IEEE Access, Vol.8, pp. 58443-58469, 2020. https://doi.org/10.1109/ACCESS.2020.2983149

[5] [5] M. Bojarski et al., “End to end learning for self-driving cars,” arXiv:1604.07316, 2016. https://doi.org/10.48550/arXiv.1604.07316

[6] [6] A. E. Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep reinforcement learning framework for autonomous driving,” Proc. IS&T Int. Symp. Electron. Imaging: Auton. Veh. Mach., pp. 70-76, 2017. https://doi.org/10.2352/ISSN.2470-1173.2017.19.AVM-023

[7] [7] A. Carballo et al., “End-to-end autonomous mobile robot navigation with model-based system support,” J. Robot. Mechatron., Vol.30, No.4, pp. 563-583, 2018. https://doi.org/10.20965/jrm.2018.p0563

[8] [8] V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, Vol.518, No.7540, pp. 529-533, 2015. https://doi.org/10.1038/nature14236

[9] [9] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, Vol.529, No.7587, pp. 484-489, 2016. https://doi.org/10.1038/nature16961

[10] [10] D. Silver et al., “Mastering the game of Go without human knowledge,” Nature, Vol.550, No.7676, pp. 354-359, 2017. https://doi.org/10.1038/nature24270

[11] [11] D. Silver et al., “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” arXiv:1712.01815, 2017. https://doi.org/10.48550/arXiv.1712.01815

[12] [12] P. Wolf et al., “Learning how to drive in a real world simulation with deep Q-networks,” 2017 IEEE Intell. Veh. Symp. (IV), pp. 244-250, 2017. https://doi.org/10.1109/IVS.2017.7995727

[13] [13] C.-J. Hoel, K. Wolff, and L. Laine, “Automated speed and lane change decision making using deep reinforcement learning,” 2018 21st Int. Conf. Intell. Transp. Syst. (ITSC), pp. 2148-2155, 2018. https://doi.org/10.1109/ITSC.2018.8569568

[14] [14] A. Kendall et al., “Learning to drive in a day,” 2019 Int. Conf. Robot. Autom. (ICRA), pp. 8248-8254, 2019. https://doi.org/10.1109/ICRA.2019.8793742

[15] [15] X. Xiong, J. Wang, F. Zhang, and K. Li, “Combining deep reinforcement learning and safety based control for autonomous driving,” arXiv.1612.00147, 2016. https://doi.org/10.48550/arXiv.1612.00147

[16] [16] T. Suzuki et al., “Acquisition of cooperative control of multiple vehicles through reinforcement learning utilizing vehicle-to-vehicle communication and map information,” J. Robot. Mechatron., Vol.36, No.3, pp. 642-657, 2024. https://doi.org/10.20965/jrm.2024.p0642

[17] [17] J. Tobin et al., “Domain randomization for transferring deep neural networks from simulation to the real world,” 2017 IEEE/RSJ Int. Conf. Intell. Robots and Syst. (IROS), pp. 23-30, 2017. https://doi.org/10.1109/IROS.2017.8202133

[18] [18] D. Kalaria, Q. Lin, and J. M. Dolan, “Delay-aware robust control for safe autonomous driving,” 2022 IEEE Intell. Veh. Symp. (IV), pp. 1565-1571, 2022. https://doi.org/10.1109/IV51971.2022.9827111

[19] [19] D. Kalaria, Q. Lin, and J. M. Dolan, “Delay-aware robust control for safe autonomous driving and racing,” IEEE Trans. Intell. Transp. Syst., Vol.25, No.7, pp. 7140-7150, 2024. https://doi.org/10.1109/TITS.2023.3339708

[20] [20] F. Naseer, M. N. Khan, A. Rasool, and N. Ayub, “A novel approach to compensate delay in communication by predicting teleoperator behaviour using deep learning and reinforcement learning to control telepresence robot,” Electron. Lett., Vol.59, No.9, Article No.e12806, 2023. https://doi.org/10.1049/ell2.12806

[21] [21] T. J. Walsh, A. Nouri, L. Li, and M. L. Littman, “Learning and planning in environments with delayed feedback,” Auton. Agents Multi-Agent Syst., Vol.18, No.1, pp. 83-105, 2009. https://doi.org/10.1007/s10458-008-9056-7

[22] [22] M. Hirano et al., “A transparent AI-based approach for controlling processes with time delays: With its experimental evaluation in a real-world plant operation,” Trans. Jpn. Soc. Artif. Intell., Vol.39, No.6, pp. A-O53_1-9, 2024 (in Japanese). https://doi.org/10.1527/tjsai.39-6-A-O53

[23] [23] V. Mnih et al., “Playing Atari with deep reinforcement learning,” arXiv:1312.5602, 2013. https://doi.org/10.48550/arXiv.1312.5602