single-jc.php

JACIII Vol.27 No.3 pp. 386-393
doi: 10.20965/jaciii.2023.p0386
(2023)

Research Paper:

Exploring Self-Attention for Visual Intersection Classification

Haruki Nakata, Kanji Tanaka, and Koji Takeda

Graduate School of Engineering, University of Fukui
3-9-1 Bunkyo, Fukui 910-8507, Japan

Received:
July 9, 2022
Accepted:
January 10, 2023
Published:
May 20, 2023
Keywords:
visual intersection classification, non-local contexts, self-attention, first-person-view net, third-person-view net
Abstract

Self-attention has recently emerged as a technique for capturing non-local contexts in robot vision. This study introduced a self-attention mechanism into an intersection recognition system to capture non-local contexts behind the scenes. This mechanism is effective in intersection classification because most parts of the local pattern (e.g., road edges, buildings, and sky) are similar; thus, the use of a non-local context (e.g., the angle between two diagonal corners around an intersection) would be effective. This study makes three major contributions to existing literature. First, we proposed a self-attention-based approach for intersection classification. Second, we integrated the self-attention-based classifier into a unified intersection classification framework to improve the overall recognition performance. Finally, experiments using the public KITTI dataset showed that the proposed self-attention-based system outperforms conventional recognition based on local patterns and recognition based on convolution operations.

Self-attention for enhancing third person views

Self-attention for enhancing third person views

Cite this article as:
H. Nakata, K. Tanaka, and K. Takeda, “Exploring Self-Attention for Visual Intersection Classification,” J. Adv. Comput. Intell. Intell. Inform., Vol.27 No.3, pp. 386-393, 2023.
Data files:
References
  1. [1] A. Ess, T. Muller, H. Grabner, and L. V. Gool, “Segmentation-Based Urban Traffic Scene Understanding,” BMVC, Vol.1, 2009.
  2. [2] H. Zhao, J. Jia, and V. Koltun, “Exploring self-attention for image recognition,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 10073-10082, 2020. https://doi.org/10.1109/CVPR42600.2020.01009
  3. [3] M. Oeljeklaus, F. Hoffmann, and T. Bertram, “A combined recognition and segmentation model for urban traffic scene understanding,” 2017 IEEE 20th Int. Conf. on Intelligent Transportation Systems (ITSC), 2017. https://doi.org/10.1109/ITSC.2017.8317713
  4. [4] K. Takeda and K. Tanaka, “Deep intersection classification using first and third person views,” 2019 IEEE Intelligent Vehicles (IV) Symp., pp. 454-459, 2019. https://doi.org/10.1109/IVS.2019.8813859
  5. [5] C. Richter, W. Vega-Brown, and N. Roy, “Bayesian learning for safe high-speed navigation in unknown environments,” A. Bicchi and W. Burgard (Eds.), “Robotics Research,” pp. 325-341, Springer, 2018. https://doi.org/10.1007/978-3-319-60916-4_19
  6. [6] M. Nolte, N. Kister, and M. Maurer, “Assessment of deep convolutional neural networks for road surface classification,” 2018 21st ITSC, pp. 381-386, 2018. https://doi.org/10.1109/ITSC.2018.8569396
  7. [7] F. Kruber, J. Wurst, and M. Botsch, “An unsupervised random forest clustering technique for automatic traffic scenario categorization,” 2018 21st ITSC, pp. 2811-2818, 2018. https://doi.org/10.1109/ITSC.2018.8569682
  8. [8] K. Zhang, H.-D. Cheng, and S. Gai, “Efficient dense-dilation network for pavement cracks detection with large input image size,” 2018 21st ITSC, pp. 884-889, 2018. https://doi.org/10.1109/ITSC.2018.8569958
  9. [9] T. Suleymanov, P. Amayo, and P. Newman, “Inferring road boundaries through and despite traffic,” 2018 21st ITSC, pp. 409-416, 2018. https://doi.org/10.1109/ITSC.2018.8569570
  10. [10] M. Koschi, C. Pek, M. Beikirch, and M. Althoff, “Set-based prediction of pedestrians in urban environments considering formalized traffic rules,” 2018 21st ITSC, pp. 2704-2711, 2018. https://doi.org/10.1109/ITSC.2018.8569434
  11. [11] A. Geiger, M. Lauer, C. Wojek, C. Stiller, and R. Urtasun, “3D traffic scene understanding from movable platforms,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.36, No.5, pp. 1012-1025, 2013. https://doi.org/10.1109/TPAMI.2013.185
  12. [12] A. Kumar, G. Gupta, A. Sharma, and K. M. Krishna, “Towards view-invariant intersection recognition from videos using deep network ensembles,” 2018 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 1053-1060, 2018. https://doi.org/10.1109/IROS.2018.8594449
  13. [13] X. Liu, M. Neuyen, and W. Q. Yan, “Vehicle-related scene understanding using deep learning,” Asian Conf. on Pattern Recognition, pp. 61-73, 2020. https://doi.org/10.1007/978-981-15-3651-9_7
  14. [14] D. Feng, Y. Zhou, C. Xu, M. Tomizuka, and W. Zhan, “A simple and efficient multi-task network for 3D object detection and road understanding,” 2021 IEEE/RSJ Int. Conf. on IROS, pp. 7067-7074, 2021. https://doi.org/10.1109/IROS51168.2021.9635858
  15. [15] R. Prykhodchenko and P. Skruch, “Road scene classification based on street-level images and spatial data,” Array, Vol.15, Article No.100195, 2022. https://doi.org/10.1016/j.array.2022.100195
  16. [16] M. A. Brubaker, A. Geiger, and R. Urtasun, “Map-based probabilistic visual self-localization,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.38, No.4, pp. 652-665, 2015. https://doi.org/10.1109/TPAMI.2015.2453975
  17. [17] A. Geiger, J. Ziegler, and C. Stiller, “StereoScan: Dense 3D reconstruction in real-time,” 2011 IEEE IV Symp., pp. 963-968, 2011. https://doi.org/10.1109/IVS.2011.5940405
  18. [18] J. Graeter, T. Strauss, and M. Lauer, “Momo: Monocular motion estimation on manifolds,” 2017 IEEE 20th ITSC, 2017. https://doi.org/10.1109/ITSC.2017.8317679
  19. [19] S. Wang, R. Clark, H. Wen, and N. Trigoni, “DeepVO: Towards end-to-end visual odometry with deep recurrent convolutional neural networks,” 2017 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 2043-2050, 2017. https://doi.org/10.1109/ICRA.2017.7989236
  20. [20] L. Kunze, T. Bruls, T. Suleymanov, and P. Newman, “Reading between the lanes: Road layout reconstruction from partially segmented scenes,” 2018 21st ITSC, pp. 401-408, 2018. https://doi.org/10.1109/ITSC.2018.8569270
  21. [21] H. Q. Dang, J. Fürnkranz, A. Biedermann, and M. Hoepfl, “Time-to-lane-change prediction with deep learning,” 2017 IEEE 20th ITSC, 2017. https://doi.org/10.1109/ITSC.2017.8317674
  22. [22] R. Izquierdo, I. Parra, J. Muñoz-Bulnes, D. Fernández-Llorca, and M. Sotelo, “Vehicle trajectory and lane change prediction using ANN and SVM classifiers,” 2017 IEEE 20th ITSC, 2017. https://doi.org/10.1109/ITSC.2017.8317838
  23. [23] V. Leonhardt and G. Wanielik, “Feature evaluation for lane change prediction based on driving situation and driver behavior,” 2017 20th Int. Conf. on Information Fusion (Fusion), 2017. https://doi.org/10.23919/ICIF.2017.8009848
  24. [24] S. Boschenriedter, P. Hossbach, C. Linnhoff, S. Luthardt, and S. Wu, “Multi-session visual roadway mapping,” 2018 21st ITSC, pp. 394-400, 2018. https://doi.org/10.1109/ITSC.2018.8570004
  25. [25] D. Bhatt, D. Sodhi, A. Pal, V. Balasubramanian, and M. Krishna, “Have I reached the intersection: A deep learning-based approach for intersection detection from monocular cameras,” 2017 IEEE/RSJ Int. Conf. on IROS, pp. 4495-4500, 2017. https://doi.org/10.1109/IROS.2017.8206317
  26. [26] Y. Zhou, E. Chung, M. E. Cholette, and A. Bhaskar, “Real-time joint estimation of traffic states and parameters using cell transmission model and considering capacity drop,” 2018 21st ITSC, pp. 2797-2804, 2018. https://doi.org/10.1109/ITSC.2018.8569805
  27. [27] S. Li, X. Wu, Y. Cao, and H. Zha, “Generalizing to the open world: Deep visual odometry with online adaptation,” 2021 IEEE/CVF Conf. on CVPR, pp. 13179-13188, 2021. https://doi.org/10.1109/CVPR46437.2021.01298
  28. [28] W. Ye, X. Lan, S. Chen, Y. Ming, X. Yu, H. Bao, Z. Cui, and G. Zhang, “PVO: Panoptic visual odometry,” arXiv:2207.01610, 2022. https://doi.org/10.48550/arXiv.2207.01610
  29. [29] B. Yang, X. Xu, J. Ren, L. Cheng, L. Guo, and Z. Zhang, “SAM-Net: Semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications,” Pattern Recognition Letters, Vol.153, pp. 126-135, 2022. https://doi.org/10.1016/j.patrec.2021.11.028
  30. [30] Y. Byon, A. Shalaby, and B. Abdulhai, “Travel time collection and traffic monitoring via GPS technologies,” 2006 IEEE ITSC, pp. 677-682, 2006. https://doi.org/10.1109/ITSC.2006.1706820
  31. [31] B. Flade, M. Nieto, G. Velez, and J. Eggert, “Lane detection based camera to map alignment using open-source map data,” 2018 21st ITSC, pp. 890-897, 2018. https://doi.org/10.1109/ITSC.2018.8569304
  32. [32] A. Gupta and A. Choudhary, “A framework for real-time traffic sign detection and recognition using Grassmann manifolds,” 2018 21st ITSC, pp. 274-279, 2018. https://doi.org/10.1109/ITSC.2018.8569556
  33. [33] M. Bach, D. Stumper, and K. Dietmayer, “Deep convolutional traffic light recognition for automated driving,” 2018 21st ITSC, pp. 851-858, 2018. https://doi.org/10.1109/ITSC.2018.8569522
  34. [34] P. Amayo, T. Bruls, and P. Newman, “Semantic classification of road markings from geometric primitives,” 2018 21st ITSC, pp. 387-393, 2018. https://doi.org/10.1109/ITSC.2018.8569382
  35. [35] J. Müller and K. Dietmayer, “Detecting traffic lights by single shot detection,” 2018 21st ITSC, pp. 266-273, 2018. https://doi.org/10.1109/ITSC.2018.8569683
  36. [36] M. Weber, M. Huber, and J. M. Zöllner, “HDTLR: A CNN based hierarchical detector for traffic lights,” 2018 21st ITSC, pp. 255-260, 2018. https://doi.org/10.1109/ITSC.2018.8569794
  37. [37] C. Fernández, C. Guindel, N.-O. Salscheider, and C. Stiller, “A deep analysis of the existing datasets for traffic light state recognition,” 2018 21st ITSC, pp. 248-254, 2018. https://doi.org/10.1109/ITSC.2018.8569914
  38. [38] M. Suraj, H. Grimmett, L. Platinskỳ, and P. Ondrúŝka, “Predicting trajectories of vehicles using large-scale motion priors,” 2018 IEEE IV Symp., pp. 1639-1644, 2018. https://doi.org/10.1109/IVS.2018.8500604
  39. [39] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, “XLNet: Generalized autoregressive pretraining for language understanding,” Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2019.
  40. [40] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” 27th Int. Conf. on Machine Learning (ICML-10), pp. 807-814, 2010.
  41. [41] G. Farnebäck, “Two-frame motion estimation based on polynomial expansion,” J. Bigun and T. Gustavsson (Eds.), “SCIA 2003: Image Analysis,” pp. 363-370, Springer, 2003. https://doi.org/10.1007/3-540-45103-X_50
  42. [42] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” 2012 IEEE Conf. on CVPR, pp. 3354-3361, 2012. https://doi.org/10.1109/CVPR.2012.6248074
  43. [43] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556, 2014. https://doi.org/10.48550/arXiv.1409.1556
  44. [44] D. G. Lowe, “Object recognition from local scale-invariant features,” Proc. of the 7th IEEE Int. Conf. on Computer Vision, Vol.2, pp. 1150-1157, 1999. https://doi.org/10.1109/ICCV.1999.790410
  45. [45] T. Tommasi and B. Caputo, “Frustratingly easy NBNN domain adaptation,” 2013 IEEE Int. Conf. on Computer Vision, pp. 897-904, 2013. https://doi.org/10.1109/ICCV.2013.116
  46. [46] S. Takuma, T. Kanji, and Y. Kousuke, “Leveraging object proposals for object-level change detection,” 2018 IEEE IV Symp., pp. 397-402, 2018. https://doi.org/10.1109/IVS.2018.8500475
  47. [47] O. Boiman, E. Shechtman, and M. Irani, “In defense of nearest-neighbor based image classification,” 2008 IEEE Conf. on CVPR, 2008. https://doi.org/10.1109/CVPR.2008.4587598

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Apr. 22, 2024