single-jc.php

JACIII Vol.29 No.3 pp. 519-531
doi: 10.20965/jaciii.2025.p0519
(2025)

Research Paper:

Monocular 3D Object Detection Based on Reparametrized Cross-Dimension Focusing

Ruikai Li*, Chao Wang**,†, and Guopeng Tan*

*Information and Electrical Engineering School, Hebei University of Engineering
19 Taiji Street, Congtai District, Handan, Hebei 056038, China

**Hebei Key Laboratory of Security & Protection Information Sensing and Processing
19 Taiji Street, Congtai District, Handan, Hebei 056038, China

Corresponding author

Received:
June 10, 2024
Accepted:
February 10, 2025
Published:
May 20, 2025
Keywords:
3D object detection, structural reparameterization, cross-dimension focusing
Abstract

Deploying monocular 3D object detection networks on visual sensors of intelligent transportation assistance devices is a cost-effective and practical solution. Despite the progress made in existing monocular 3D object detection methods, there still exists a certain gap in the detection accuracy compared to 3D object detection methods based on point cloud data from LiDAR (light detection and ranging) sensors. Additionally, these methods incur relatively high computational costs. Addressing these issues, this paper proposes an improved monocular 3D object detection network, which optimizes the overall structure of the model through structural reparameterization, effectively alleviating the computational burden on computing devices. Simultaneously, we focus on the differences between 2D and 3D features and propose a cross-dimension focusing method to enhance the performance of ceiling the object detection method in extracting 3D object features. In the KITTI benchmarks, our framework achieved significantly superior performance in 3D object detection compared to other methods.

Cite this article as:
R. Li, C. Wang, and G. Tan, “Monocular 3D Object Detection Based on Reparametrized Cross-Dimension Focusing,” J. Adv. Comput. Intell. Intell. Inform., Vol.29 No.3, pp. 519-531, 2025.
Data files:
References
  1. [1] Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 8445-8453, 2019. https://doi.org/10.1109/CVPR.2019.00864
  2. [2] Y. You, Y. Wang, W.-L. Chao, D. Garg, G. Pleiss, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Pseudo-LiDAR++: Accurate depth for 3D object detection in autonomous driving,” arXiv preprint, arXiv:1906.06310, 2019. https://doi.org/10.48550/arXiv.1906.06310
  3. [3] P. Li, X. Chen, and S. Shen, “Stereo R-CNN based 3D object detection for autonomous driving,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 7644-7652, 2019. https://doi.org/10.1109/CVPR.2019.00783
  4. [4] X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, “3D object proposals using stereo imagery for accurate object class detection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.40, Issue 5, pp. 1259-1272, 2017. https://doi.org/10.1109/TPAMI.2017.2706685
  5. [5] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 2002-2011, 2018. https://doi.org/10.1109/CVPR.2018.00214
  6. [6] R. Furukawa, R. Sagawa, and H. Kawasaki, “Depth estimation using structured light flow—Analysis of projected pattern flow on an object’s surface,” Proc. of the IEEE Int. Conf. on Computer Vision, pp. 4640-4648, 2017. https://doi.org/10.1109/ICCV.2017.497
  7. [7] Y. Chen, L. Tai, K. Sun, and M. Li, “Monopair: Monocular 3D object detection using pairwise spatial relationships,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 12093-12102, 2020. https://doi.org/10.1109/CVPR42600.2020.01211
  8. [8] T. Wang, X. Zhu, J. Pang, and D. Lin, “FCOS3D: Fully convolutional one-stage monocular 3D object detection,” Proc. of the IEEE/CVF Int. Conf. on Computer Vision Workshops, pp. 913-922, 2021. https://doi.org/10.1109/ICCVW54120.2021.00107
  9. [9] R. Qian, D. Garg, Y. Wang, Y. You, S. Belongie, B. Hariharan, M. Campbell, K. Q. Weinberger, and W.-L. Chao, “End-to-end pseudo-LiDAR for image-based 3D object detection,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 5881-5890, 2020. https://doi.org/10.1109/CVPR42600.2020.00592
  10. [10] A. Mousavian, D. Anguelov, J. Flynn, and J. Košecká, “3D bounding box estimation using deep learning and geometry,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 7074-7082, 2017. https://doi.org/10.1109/CVPR.2017.597
  11. [11] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, “Monocular 3D object detection for autonomous driving,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2147-2156, 2016. https://doi.org/10.1109/CVPR.2016.236
  12. [12] Z. Qin, J. Wang, and Y. Lu, “MonoGRNet: A geometric reasoning network for monocular 3D object localization,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.33, No.01, pp. 8851-8858, 2019. https://doi.org/10.1609/aaai.v33i01.33018851
  13. [13] B. Li, W. Ouyang, L. Sheng, X. Zeng, and X. Wang, “GS3D: An efficient 3D object detection framework for autonomous driving,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 1019-1028, 2019. https://doi.org/10.1109/CVPR.2019.00111
  14. [14] J. Ku, A. D. Pon, and S. L. Waslander, “Monocular 3D object detection leveraging accurate proposals and shape reconstruction,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 11867-11876, 2019. https://doi.org/10.1109/CVPR.2019.01214
  15. [15] A. Simonelli, S. R. Bulò, L. Porzi, M. López-Antequera, and P. Kontschieder, “Disentangling monocular 3D object detection,” Proc. of the IEEE/CVF Int. Conf. on Computer Vision, pp. 1991-1999, 2019. https://doi.org/10.1109/ICCV.2019.00208
  16. [16] Z. Liu, Z. Wu, and R. Tóth, “SMOKE: Single-stage monocular 3D object detection via keypoint estimation,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops, pp. 996-997, 2020. https://doi.org/10.1109/CVPRW50498.2020.00506
  17. [17] G. Brazil and X. Liu, “M3D-RPN: Monocular 3D region proposal network for object detection,” Proc. of the IEEE/CVF Int. Conf. on Computer Vision, pp. 9287-9296, 2019. https://doi.org/10.1109/ICCV.2019.00938
  18. [18] P. Li, H. Zhao, P. Liu, and F. Cao, “RTM3D: Real-time monocular 3D detection from object keypoints for autonomous driving,” Proc. of the European Conf. on Computer Vision (ECCV), pp. 644-660, 2020. https://doi.org/10.1007/978-3-030-58580-8_38
  19. [19] M. Ding, Y. Huo, H. Yi, Z. Wang, J. Shi, Z. Lu, and P. Luo, “Learning depth-guided convolutions for monocular 3D object detection,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 1000-1001, 2020. https://doi.org/10.1109/CVPR42600.2020.01169
  20. [20] S. Luo, H. Dai, L. Shao, and Y. Ding, “M3DSSD: Monocular 3D single stage object detector,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 6145-6154, 2021. https://doi.org/10.1109/CVPR46437.2021.00608
  21. [21] X. Ding, Y. Guo, G. Ding, and J. Han, “Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks,” Proc. of the IEEE/CVF Int. Conf. on Computer Vision, pp. 1911-1920, 2019. https://doi.org/10.1109/ICCV.2019.00200
  22. [22] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “RepVGG: Making VGG-style convnets great again,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 13733-13742, 2021. https://doi.org/10.1109/CVPR46437.2021.01352
  23. [23] P. K. A. Vasu, J. Gabriel, J. Zhu, O. Tuzel, and A. Ranjan, “FastViT: A fast hybrid vision transformer using structural reparameterization,” Proc. of the IEEE/CVF Int. Conf. on Computer Vision, pp. 5785-5795, 2023. https://doi.org/10.1109/ICCV51070.2023.00532
  24. [24] Y. Li, P. Zhao, G. Yuan, X. Lin, Y. Wang, and X. Chen, “Pruning-as-search: Efficient neural architecture search via channel pruning and structural reparameterization,” arXiv preprint, arXiv:2206.01198, 2022. https://doi.org/10.48550/arXiv.2206.01198
  25. [25] S. Liu, D. Huang et al., “Receptive field block net for accurate and fast object detection,” Proc. of the European Conf. on Computer Vision (ECCV), pp. 385-400, 2018. https://doi.org/10.1007/978-3-030-01252-6_24
  26. [26] X. Ding, X. Zhang, J. Han, and G. Ding, “Scaling up your kernels to 31x31: Revisiting large kernel design in CNNs,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 11963-11975, 2022. https://doi.org/10.1109/CVPR52688.2022.01166
  27. [27] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint, arXiv:2010.11929, 2020. https://doi.org/10.48550/arXiv.2010.11929
  28. [28] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4700-4708, 2017. https://doi.org/10.1109/CVPR.2017.243
  29. [29] Z. Chen, Z. He, and Z.-M. Lu, “DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention,” IEEE Trans. on Image Processing, Vol.33, pp. 1002-1015, 2024. https://doi.org/10.1109/TIP.2024.3354108
  30. [30] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” 2012 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3354-3361, 2012. https://doi.org/10.1109/CVPR.2012.6248074

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on May. 19, 2025