single-jc.php

JACIII Vol.27 No.2 pp. 198-206
doi: 10.20965/jaciii.2023.p0198
(2023)

Research Paper:

3D Street Object Detection from Monocular Images Using Deep Learning and Depth Information

Wei Liu*,**,***, Tao Zhang*,**,***, Yun Ma*,**,***,†, and Longsheng Wei*,**,***

*School of Automation, China University of Geosciences
No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China

**Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems
No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China

***Engineering Research Center of Intelligent Technology for Geo-Exploration, Ministry of Education
No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China

Corresponding author

Received:
April 20, 2022
Accepted:
November 2, 2022
Published:
March 20, 2023
Keywords:
3D detection, monocular image, deep learning, street object
Abstract

In this study, we present a three-dimensional (3D) object detection algorithm based on monocular images by constructing an end-to-end network, that incorporates depth information. The entire network consists of three parts. The first part includes the basic object detection neural network as the main body, that uses the region proposal network to obtain the two-dimensional (2D) region proposal of the object. The second part is the depth estimation branch network, that obtains the depth information of the object pixels and calculates the corresponding 3D point cloud. In the last part, concatenated features obtained from the aforementioned two parts are fed into the fully-connected layers. Subsequently, 2D and 3D detection results are obtained. Compared with certain existing methods, the accuracy of the detection results is improved in this study.

Cite this article as:
W. Liu, T. Zhang, Y. Ma, and L. Wei, “3D Street Object Detection from Monocular Images Using Deep Learning and Depth Information,” J. Adv. Comput. Intell. Intell. Inform., Vol.27 No.2, pp. 198-206, 2023.
Data files:
References
  1. [1] A. Mousavian, D. Anguelov, J. Flynn, and J. Košecká, “3D bounding box estimation using deep learning and geometry,” Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 5632-5640, 2017. https://doi.org/10.1109/CVPR.2017.597
  2. [2] P. Li, X. Chen, and S. Shen, “Stereo R-CNN based 3D object detection for autonomous driving,” Proc. of IEEE/CVF Conf. on CVPR, pp. 7636-7644, 2019. https://doi.org/10.1109/CVPR.2019.00783
  3. [3] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, “Monocular 3D Object Detection for Autonomous Driving,” Proc. of IEEE Conf. on CVPR, pp. 2147-2156, 2016. https://doi.org/10.1109/CVPR.2016.236
  4. [4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” Advances in Neural Information Processing Systems 28, pp. 91-99, 2015.
  5. [5] C. Godard, O. M. Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” Proc. of IEEE Conf. on CVPR, pp. 6602-6611, 2017. https://doi.org/10.1109/CVPR.2017.699
  6. [6] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.37, No.9, pp. 1904-1916, 2015. https://doi.org/10.1109/TPAMI.2015.2389824
  7. [7] R. Girshick, “Fast R-CNN,” Proc. of IEEE Int. Conf. on Computer Vision (ICCV), pp. 1440-1448, 2015. https://doi.org/10.1109/ICCV.2015.169
  8. [8] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object detection via region-based fully convolutional networks,” Advances in Neural Information Processing Systems 29, pp. 379-387, 2016.
  9. [9] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” Proc. of IEEE Conf. on CVPR, pp. 779-788, 2016. https://doi.org/10.1109/CVPR.2016.91
  10. [10] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” Proc. of IEEE Conf. on CVPR, pp. 6517-6525, 2017. https://doi.org/10.1109/CVPR.2017.690
  11. [11] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv:1804.02767, 2018.
  12. [12] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” European Conf. on Computer Vision (ECCV), pp. 21-37, 2016. https://doi.org/10.1007/978-3-319-46448-0_2
  13. [13] S. Shi, X. Wang, and H. Li, “PointRCNN: 3D object proposal generation and detection from point cloud,” Proc. of IEEE/CVF Conf. on CVPR, pp. 770-779, 2019. https://doi.org/10.1109/CVPR.2019.00086
  14. [14] Z. Yang, Y. Sun, S. Liu, and J. Jia, “3DSSD: Point-based 3D single stage object detector,” Proc. of IEEE/CVF Conf. on CVPR, pp. 11037-11045, 2020. https://doi.org/10.1109/CVPR42600.2020.01105
  15. [15] T. Yin, X. Zhou, and P. Krähenbühl, “Center-based 3D object detection and tracking,” Proc. of IEEE/CVF Conf. on CVPR, pp. 11779-11788, 2021. https://doi.org/10.1109/CVPR46437.2021.01161
  16. [16] Y. Chen, S. Liu, X. Shen, and J. Jia, “DSGN: Deep stereo geometry network for 3D object detection,” Proc. of IEEE/CVF Conf. on CVPR, pp. 12533-12542, 2020. https://doi.org/10.1109/CVPR42600.2020.01255
  17. [17] Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving,” Proc. of IEEE/CVF Conf. on CVPR, pp. 8437-8445, 2019. https://doi.org/10.1109/CVPR.2019.00864
  18. [18] Y. You, Y. Wang, W.-L. Chao, D. Garg, G. Pleiss, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Pseudo-LiDAR++: Accurate depth for 3D object detection in autonomous driving,” arXiv:1906.06310, 2019.
  19. [19] R. Qian, D. Garg, Y. Wang, Y. You, S. Belongie, B. Hariharan, M. Campbell, K. Q. Weinberger, and W.-L. Chao, “End-to-end pseudo-LiDAR for image-based 3D object detection,” Proc. of IEEE/CVF Conf. on CVPR, pp. 5880-5889, 2020. https://doi.org/10.1109/CVPR42600.2020.00592
  20. [20] F. Chabot, M. Chaouch, J. Rabarisoa, C. Teuliére, and T. Chateau, “Deep MANTA: A coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image,” Proc. of IEEE Conf. on CVPR, pp. 1827-1836, 2017. https://doi.org/10.1109/CVPR.2017.198
  21. [21] B. Xu and Z. Chen, “Multi-level fusion based 3D object detection from monocular images,” Proc. of IEEE/CVF Conf. on CVPR, pp. 2345-2353, 2018. https://doi.org/10.1109/CVPR.2018.00249
  22. [22] A. Kundu, Y. Li, and J. M. Rehg, “3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare,” Proc. of IEEE/CVF Conf. on CVPR, pp. 3559-3568, 2018. https://doi.org/10.1109/CVPR.2018.00375
  23. [23] X. Weng and K. Kitani, “Monocular 3D object detection with pseudo-LiDAR point cloud,” Proc. of IEEE/CVF Int. Conf. on Computer Vision Workshops (ICCVW), pp. 857-866, 2019. https://doi.org/10.1109/ICCVW.2019.00114
  24. [24] L. Liu, J. Lu, C. Xu, Q. Tian, and J. Zhou, “Deep fitting degree scoring network for monocular 3D object detection,” Proc. of IEEE/CVF Conf. on CVPR, pp. 1057-1066, 2019. https://doi.org/10.1109/CVPR.2019.00115
  25. [25] F. Manhardt, W. Kehl, and A. Gaidon, “ROI-10D: Monocular lifting of 2D detection to 6D pose and metric shape,” Proc. of IEEE/CVF Conf. on CVPR, pp. 2064-2073, 2019. https://doi.org/10.1109/CVPR.2019.00217
  26. [26] Z. Liu, Z. Wu, and R. Tóth, “SMOKE: Single-stage monocular 3D object detection via keypoint estimation,” Proc. of IEEE/CVF Conf. on CVPR Workshops (CVPRW), pp. 4289-4298, 2020. https://doi.org/10.1109/CVPRW50498.2020.00506
  27. [27] Y.-L. Boureau, J. Ponce, and Y. LeCun, “A theoretical analysis of feature pooling in visual recognition,” Proc. of the 27th Int. Conf. on Machine Learning (ICML-10), pp. 111-118, 2010.
  28. [28] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3D object detection network for autonomous driving,” Proc. of IEEE Conf. on CVPR, pp. 6526-6534, 2017. https://doi.org/10.1109/CVPR.2017.691
  29. [29] X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, “3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.40, No.5, pp. 1259-1272, 2018. https://doi.org/10.1109/TPAMI.2017.2706685
  30. [30] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556, 2014.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Oct. 01, 2024