Imitating with Sequential Masks: Alleviating Causal Confusion in Autonomous Driving

Huanghui Zhang; Zhi Zheng

doi:10.20965/jaciii.2024.p0882

single-jc.php

« previous

JACIII Vol.28 No.4 pp. 882-892

doi: 10.20965/jaciii.2024.p0882

(2024)

Research Paper:

Views over last 60 days: 622

Imitating with Sequential Masks: Alleviating Causal Confusion in Autonomous Driving

Huanghui Zhang^* and Zhi Zheng^*,**,†

^*College of Computer and Cyber Security, Fujian Normal University
No.8 Xuefu South Road, Shangjie, Minhou, Fuzhou, Fujian 350117, China

^**College of Control Science and Engineering, Zhejiang University
No.38 Zheda Road, West Lake District, Hangzhou, Zhejiang 310027, China

^†Corresponding author

Received:

December 15, 2023

Accepted:

March 21, 2024

Published:

July 20, 2024

Keywords:

causal confusion, invariant feature learning, imitation learning

Abstract

Imitation learning which uses only expert demonstrations is suitable for safety-crucial tasks, such as autonomous driving. However, causal confusion is a problem in imitation learning where, with more features offered, an agent may perform even worse. Hence, we aim to augment agents’ imitation ability in driving scenarios under sequential setting, using a novel method we proposed: sequential masking imitation learning (SEMI). Inspired by the idea of Granger causality, we improve the imitator’s performance through a random masking operation on the encoded features in a sequential setting. With this design, the imitator is forced to focus on critical features, leading to a robust model. We demonstrated that this method can alleviate causal confusion in driving simulations by deploying it the CARLA simulator and comparing it with other methods. The experimental results showed that SEMI can effectively reduce confusion during autonomous driving.

Alleviating causal confusion

Cite this article as:

H. Zhang and Z. Zheng, “Imitating with Sequential Masks: Alleviating Causal Confusion in Autonomous Driving,” J. Adv. Comput. Intell. Intell. Inform., Vol.28 No.4, pp. 882-892, 2024.

Data files:

References

[1] A. Saha et al., “Translating images into maps,” 2022 Int. Conf. on Robotics and Automation (ICRA). pp. 9200-9206, 2022. https://doi.org/10.1109/ICRA46639.2022.9811901
[2] Z. Jin et al., “Secure state estimation of cyber-physical system under cyber attacks: Q-learning vs. SARSA,” Electronics, Vol.11, No.19, Article No.3161, 2022. https://doi.org/10.3390/electronics11193161
[3] Z. Jin et al., “Security state estimation for cyber-physical systems against DoS attacks via reinforcement learning and game theory,” Actuators, Vol.11, No.7, Article No.192, 2022. https://doi.org/10.3390/act11070192
[4] Z. Han et al., “Secure state estimation for event-triggered cyber-physical systems against deception attacks,” J. of the Franklin Institute, Vol.359, No.18, pp. 11155-11185, 2022. https://doi.org/10.1016/j.jfranklin.2022.10.049
[5] S. Zhu, I. Ng, and Z. Chen, “Causal discovery with reinforcement learning,” International Conference on Learning Representations, 2020.
[6] S. Li, C. Wei, and Y. Wang, “Combining decision making and trajectory planning for lane changing using deep reinforcement learning,” IEEE Trans. on Intelligent Transportation Systems, Vol.23, No.9, pp. 16110-16136, 2022. https://doi.org/10.1109/TITS.2022.3148085
[7] X. Liang et al., “CIRL: Controllable Imitative Reinforcement Learning for Vision-Based Self-driving,” Proc. of 15th European Conf. on Computer Vision (ECCV 2018), pp. 604-620, 2018. https://doi.org/10.1007/978-3-030-01234-2_36
[8] J. Chen, S. E. Li, and M. Tomizuka, “Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning,” IEEE Trans. on Intelligent Transportation Systems, Vol.23, No.6, pp. 5068-5078, 2022. https://doi.org/10.1109/TITS.2020.3046646
[9] L. Anzalone, S. Barra, and M. Nappi, “Reinforced curriculum learning for autonomous driving in carla,” 2021 IEEE Int. Conf. on Image Processing (ICIP), pp. 3318-3322, 2021. https://doi.org/10.1109/ICIP42928.2021.9506673
[10] D. Hadfield-Menell et al., “Inverse reward design,” Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS’17), pp. 6768-6777, 2017.
[11] P. de Haan, D. Jayaraman, and S. Levine, “Causal confusion in imitation learning,” Proc. of the 33rd Int. Conf. on Neural Information Processing Systems (NeurIPS’19), pp. 11666-11677, 2019.
[12] A. Shojaie and E. B. Fox, “Granger causality: A review and recent advances,” Annual Review of Statistics and its Application, Vol.9, pp. 289-319, 2022. https://doi.org/10.1146/annurev-statistics-040120-010930
[13] A. Tank et al., “Neural Granger causality,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.44, No.8, pp. 4267-4279, 2022. https://doi.org/10.1109/TPAMI.2021.3065601
[14] J. Chen, Z. Xu, and M. Tomizuka, “End-to-end autonomous driving perception with sequential latent representation learning,” arXiv:2003.12464, 2020. https://doi.org/10.48550/arXiv.2003.12464
[15] J. Park et al., “Object-aware regularization for addressing causal confusion in imitation learning,” Proc. of the 35th Int. Conf. on Neural Information Processing Systems (NeurIPS’21), pp. 3029-3042, 2021.
[16] A. Dosovitskiy et al., “CARLA: An open urban driving simulator,” Proc. of the 1st Annual Conf. on Robot Learning (CoRL 2017), pp. 1-16, 2017.
[17] H. Zhang and Z. Zheng, “Sequential masking imitation learning for handling causal confusion in autonomous driving,” Proc. of the 8th Int. Workshop on Advanced Computational Intelligence and Intelligent Informatics (IWACIII 2023), Part 1, pp. 200-214, 2023. https://doi.org/10.1007/978-981-99-7590-7_17
[18] W. Zeng et al., “End-to-end interpretable neural motion planner,” 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 8652-8661, 2019. https://doi.org/10.1109/CVPR.2019.00886
[19] L. Tai et al., “Visual-based autonomous driving deployment from a stochastic and uncertainty-aware perspective,” 2019 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 2622-2628, 2019. https://doi.org/10.1109/IROS40897.2019.8968307
[20] A. Y. Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” Proc. of the 17th Int. Conf. on Machine Learning (ICML’00), pp. 663-670, 2000.
[21] P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” Proc. of the 21st Int. Conf. on Machine Learning, 2004. https://doi.org/10.1145/1015330.1015430
[22] N. D. Ratliff, J. A. Bagnell, and M. A. Zinkevich, “Maximum margin planning,” Proc. of the 23rd Int. Conf. on Machine Learning (ICML’06), pp. 729-736, 2006. https://doi.org/10.1145/1143844.1143936
[23] B. D. Ziebart et al., “Maximum entropy inverse reinforcement learning,” Proc. of the 23rd AAAI Conf. on Artificial Intelligence, pp. 1433-1438, 2008.
[24] F. Codevilla et al., “Exploring the limitations of behavior cloning for autonomous driving,” 2019 IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 9328-9337, 2019. https://doi.org/10.1109/ICCV.2019.00942
[25] B. Zheng et al., “Imitation learning: Progress, taxonomies and challenges,” IEEE Trans. on Neural Networks and Learning Systems, Vol.35, No.5, pp. 6322-6337, 2024. https://doi.org/10.1109/TNNLS.2022.3213246
[26] L. Le Mero et al., “A survey on imitation learning techniques for end-to-end autonomous vehicles,” IEEE Trans. on Intelligent Transportation Systems, Vol.23, No.9, pp. 14128-14147, 2022. https://doi.org/10.1109/TITS.2022.3144867
[27] G. Katz et al., “A novel parsimonious cause-effect reasoning algorithm for robot imitation and plan recognition,” IEEE Trans. on Cognitive and Developmental Systems, Vol.10, No.2, pp. 177-193, 2018. https://doi.org/10.1109/TCDS.2017.2651643
[28] N. Srivastava et al., “Dropout: A simple way to prevent neural networks from overfitting,” The J. of Machine Learning Research, Vol.15, No.1, pp. 1929-1958, 2014.
[29] S. Yun et al., “CutMix: Regularization strategy to train strong classifiers with localizable features,” 2019 IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 6022-6031, 2019. https://doi.org/10.1109/ICCV.2019.00612
[30] Z. Zhong et al., “Random erasing data augmentation,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.34, No.7, pp. 13001-13008, 2020. https://doi.org/10.1609/aaai.v34i07.7000
[31] P. A. Ortega et al., “Shaking the foundations: Delusions in sequence models for interaction and control,” arXiv:2110.10819, 2021. https://doi.org/10.48550/arXiv.2110.10819
[32] D. Kumor, J. Zhang, and E. Bareinboim, “Sequential causal imitation learning with unobserved confounders,” Proc. of the 35th Int. Conf. on Neural Information Processing Systems (NeurIPS’21), pp. 14669-14680, 2021.
[33] G. Swamy et al., “Sequence model imitation learning with unobserved contexts,” Proc. of the 36th Int. Conf. on Neural Information Processing Systems (NeurIPS’22), pp. 17665-17676. 2022.
[34] K. Ruan and X. Di, “Learning human driving behaviors with sequential causal imitation learning,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.36, No.4, pp. 4583-4592, 2022. https://doi.org/10.1609/aaai.v36i4.20382
[35] K. Ruan et al., “Causal imitation learning via inverse reinforcement learning,” The 11th Int. Conf. on Learning Representations (ICLR 2023), 2023.
[36] A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS’17), pp. 6309-6318, 2017.
[37] A. Kumar, A. Deshpande, and A. Sharma, “Causal effect regularization: Automated detection and removal of spurious correlations,” Proc. of the 37th Conf. on Neural Information Processing Systems (NeurIPS’23), pp. 20942-20984, 2023.
[38] S. Seo et al., “Regularized behavior cloning for blocking the leakage of past action information,” Proc. of the 37th Conf. on Neural Information Processing Systems (NeurIPS 2023), pp. 2128-2153, 2023.
[39] T. Zhao et al., “Interpretable imitation learning with dynamic causal relations,” Proc. of the 17th ACM Int. Conf. on Web Search and Data Mining (WSDM’24), pp. 967-975, 2024. https://doi.org/10.1145/3616855.3635827
[40] M. R. Samsami et al., “Causal imitative model for autonomous driving,” arXiv:2112.03908, 2021. https://doi.org/10.48550/arXiv.2112.03908
[41] J. Kim and J. Canny, “Interpretable learning for self-driving cars by visualizing causal attention,” 2017 IEEE Int. Conf. on Computer Vision (ICCV), pp. 2961-2969, 2017. https://doi.org/10.1109/ICCV.2017.320
[42] P. Hart and A. Knoll, “Counterfactual policy evaluation for decision-making in autonomous driving,” arXiv:2003.11919, 2020. https://doi.org/10.48550/arXiv.2003.11919
[43] A. Gleave et al., “imitation: Clean imitation learning implementations,” arXiv:2211.11972, 2022. https://doi.org/10.48550/arXiv.2211.11972

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] A. Saha et al., “Translating images into maps,” 2022 Int. Conf. on Robotics and Automation (ICRA). pp. 9200-9206, 2022. https://doi.org/10.1109/ICRA46639.2022.9811901

[2] [2] Z. Jin et al., “Secure state estimation of cyber-physical system under cyber attacks: Q-learning vs. SARSA,” Electronics, Vol.11, No.19, Article No.3161, 2022. https://doi.org/10.3390/electronics11193161

[3] [3] Z. Jin et al., “Security state estimation for cyber-physical systems against DoS attacks via reinforcement learning and game theory,” Actuators, Vol.11, No.7, Article No.192, 2022. https://doi.org/10.3390/act11070192

[4] [4] Z. Han et al., “Secure state estimation for event-triggered cyber-physical systems against deception attacks,” J. of the Franklin Institute, Vol.359, No.18, pp. 11155-11185, 2022. https://doi.org/10.1016/j.jfranklin.2022.10.049

[5] [5] S. Zhu, I. Ng, and Z. Chen, “Causal discovery with reinforcement learning,” International Conference on Learning Representations, 2020.

[6] [6] S. Li, C. Wei, and Y. Wang, “Combining decision making and trajectory planning for lane changing using deep reinforcement learning,” IEEE Trans. on Intelligent Transportation Systems, Vol.23, No.9, pp. 16110-16136, 2022. https://doi.org/10.1109/TITS.2022.3148085

[7] [7] X. Liang et al., “CIRL: Controllable Imitative Reinforcement Learning for Vision-Based Self-driving,” Proc. of 15th European Conf. on Computer Vision (ECCV 2018), pp. 604-620, 2018. https://doi.org/10.1007/978-3-030-01234-2_36

[8] [8] J. Chen, S. E. Li, and M. Tomizuka, “Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning,” IEEE Trans. on Intelligent Transportation Systems, Vol.23, No.6, pp. 5068-5078, 2022. https://doi.org/10.1109/TITS.2020.3046646

[9] [9] L. Anzalone, S. Barra, and M. Nappi, “Reinforced curriculum learning for autonomous driving in carla,” 2021 IEEE Int. Conf. on Image Processing (ICIP), pp. 3318-3322, 2021. https://doi.org/10.1109/ICIP42928.2021.9506673

[10] [10] D. Hadfield-Menell et al., “Inverse reward design,” Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS’17), pp. 6768-6777, 2017.

[11] [11] P. de Haan, D. Jayaraman, and S. Levine, “Causal confusion in imitation learning,” Proc. of the 33rd Int. Conf. on Neural Information Processing Systems (NeurIPS’19), pp. 11666-11677, 2019.

[12] [12] A. Shojaie and E. B. Fox, “Granger causality: A review and recent advances,” Annual Review of Statistics and its Application, Vol.9, pp. 289-319, 2022. https://doi.org/10.1146/annurev-statistics-040120-010930

[13] [13] A. Tank et al., “Neural Granger causality,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.44, No.8, pp. 4267-4279, 2022. https://doi.org/10.1109/TPAMI.2021.3065601

[14] [14] J. Chen, Z. Xu, and M. Tomizuka, “End-to-end autonomous driving perception with sequential latent representation learning,” arXiv:2003.12464, 2020. https://doi.org/10.48550/arXiv.2003.12464

[15] [15] J. Park et al., “Object-aware regularization for addressing causal confusion in imitation learning,” Proc. of the 35th Int. Conf. on Neural Information Processing Systems (NeurIPS’21), pp. 3029-3042, 2021.

[16] [16] A. Dosovitskiy et al., “CARLA: An open urban driving simulator,” Proc. of the 1st Annual Conf. on Robot Learning (CoRL 2017), pp. 1-16, 2017.

[17] [17] H. Zhang and Z. Zheng, “Sequential masking imitation learning for handling causal confusion in autonomous driving,” Proc. of the 8th Int. Workshop on Advanced Computational Intelligence and Intelligent Informatics (IWACIII 2023), Part 1, pp. 200-214, 2023. https://doi.org/10.1007/978-981-99-7590-7_17

[18] [18] W. Zeng et al., “End-to-end interpretable neural motion planner,” 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 8652-8661, 2019. https://doi.org/10.1109/CVPR.2019.00886

[19] [19] L. Tai et al., “Visual-based autonomous driving deployment from a stochastic and uncertainty-aware perspective,” 2019 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 2622-2628, 2019. https://doi.org/10.1109/IROS40897.2019.8968307

[20] [20] A. Y. Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” Proc. of the 17th Int. Conf. on Machine Learning (ICML’00), pp. 663-670, 2000.

[21] [21] P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” Proc. of the 21st Int. Conf. on Machine Learning, 2004. https://doi.org/10.1145/1015330.1015430

[22] [22] N. D. Ratliff, J. A. Bagnell, and M. A. Zinkevich, “Maximum margin planning,” Proc. of the 23rd Int. Conf. on Machine Learning (ICML’06), pp. 729-736, 2006. https://doi.org/10.1145/1143844.1143936

[23] [23] B. D. Ziebart et al., “Maximum entropy inverse reinforcement learning,” Proc. of the 23rd AAAI Conf. on Artificial Intelligence, pp. 1433-1438, 2008.

[24] [24] F. Codevilla et al., “Exploring the limitations of behavior cloning for autonomous driving,” 2019 IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 9328-9337, 2019. https://doi.org/10.1109/ICCV.2019.00942

[25] [25] B. Zheng et al., “Imitation learning: Progress, taxonomies and challenges,” IEEE Trans. on Neural Networks and Learning Systems, Vol.35, No.5, pp. 6322-6337, 2024. https://doi.org/10.1109/TNNLS.2022.3213246

[26] [26] L. Le Mero et al., “A survey on imitation learning techniques for end-to-end autonomous vehicles,” IEEE Trans. on Intelligent Transportation Systems, Vol.23, No.9, pp. 14128-14147, 2022. https://doi.org/10.1109/TITS.2022.3144867

[27] [27] G. Katz et al., “A novel parsimonious cause-effect reasoning algorithm for robot imitation and plan recognition,” IEEE Trans. on Cognitive and Developmental Systems, Vol.10, No.2, pp. 177-193, 2018. https://doi.org/10.1109/TCDS.2017.2651643

[28] [28] N. Srivastava et al., “Dropout: A simple way to prevent neural networks from overfitting,” The J. of Machine Learning Research, Vol.15, No.1, pp. 1929-1958, 2014.

[29] [29] S. Yun et al., “CutMix: Regularization strategy to train strong classifiers with localizable features,” 2019 IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 6022-6031, 2019. https://doi.org/10.1109/ICCV.2019.00612

[30] [30] Z. Zhong et al., “Random erasing data augmentation,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.34, No.7, pp. 13001-13008, 2020. https://doi.org/10.1609/aaai.v34i07.7000

[31] [31] P. A. Ortega et al., “Shaking the foundations: Delusions in sequence models for interaction and control,” arXiv:2110.10819, 2021. https://doi.org/10.48550/arXiv.2110.10819

[32] [32] D. Kumor, J. Zhang, and E. Bareinboim, “Sequential causal imitation learning with unobserved confounders,” Proc. of the 35th Int. Conf. on Neural Information Processing Systems (NeurIPS’21), pp. 14669-14680, 2021.

[33] [33] G. Swamy et al., “Sequence model imitation learning with unobserved contexts,” Proc. of the 36th Int. Conf. on Neural Information Processing Systems (NeurIPS’22), pp. 17665-17676. 2022.

[34] [34] K. Ruan and X. Di, “Learning human driving behaviors with sequential causal imitation learning,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.36, No.4, pp. 4583-4592, 2022. https://doi.org/10.1609/aaai.v36i4.20382

[35] [35] K. Ruan et al., “Causal imitation learning via inverse reinforcement learning,” The 11th Int. Conf. on Learning Representations (ICLR 2023), 2023.

[36] [36] A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS’17), pp. 6309-6318, 2017.

[37] [37] A. Kumar, A. Deshpande, and A. Sharma, “Causal effect regularization: Automated detection and removal of spurious correlations,” Proc. of the 37th Conf. on Neural Information Processing Systems (NeurIPS’23), pp. 20942-20984, 2023.

[38] [38] S. Seo et al., “Regularized behavior cloning for blocking the leakage of past action information,” Proc. of the 37th Conf. on Neural Information Processing Systems (NeurIPS 2023), pp. 2128-2153, 2023.

[39] [39] T. Zhao et al., “Interpretable imitation learning with dynamic causal relations,” Proc. of the 17th ACM Int. Conf. on Web Search and Data Mining (WSDM’24), pp. 967-975, 2024. https://doi.org/10.1145/3616855.3635827

[40] [40] M. R. Samsami et al., “Causal imitative model for autonomous driving,” arXiv:2112.03908, 2021. https://doi.org/10.48550/arXiv.2112.03908

[41] [41] J. Kim and J. Canny, “Interpretable learning for self-driving cars by visualizing causal attention,” 2017 IEEE Int. Conf. on Computer Vision (ICCV), pp. 2961-2969, 2017. https://doi.org/10.1109/ICCV.2017.320

[42] [42] P. Hart and A. Knoll, “Counterfactual policy evaluation for decision-making in autonomous driving,” arXiv:2003.11919, 2020. https://doi.org/10.48550/arXiv.2003.11919

[43] [43] A. Gleave et al., “imitation: Clean imitation learning implementations,” arXiv:2211.11972, 2022. https://doi.org/10.48550/arXiv.2211.11972

Imitating with Sequential Masks: Alleviating Causal Confusion in Autonomous Driving

Huanghui Zhang* and Zhi Zheng*,**,†

Huanghui Zhang^* and Zhi Zheng^*,**,†