single-jc.php

JACIII Vol.27 No.4 pp. 673-682
doi: 10.20965/jaciii.2023.p0673
(2023)

Research Paper:

Lightweight Bilateral Network for Real-Time Semantic Segmentation

Pengtao Wang ORCID Icon, Lihong Li ORCID Icon, Feiyang Pan ORCID Icon, and Lin Wang ORCID Icon

School of Information and Electrical Engineering, Hebei University of Engineering
No.19 Taiji Road, Handan, Hebei 056038, China

Corresponding author

Received:
February 11, 2023
Accepted:
April 4, 2023
Published:
July 20, 2023
Keywords:
real-time semantic segmentation, depth separable convolution, attention mechanism
Abstract

Herein, a dual-branch semantic segmentation model based on depth-separable convolution and attention mechanism is proposed for the real-time and accuracy requirement of semantic segmentation. The proposed approach overcomes the problems of poor segmentation effect and over-simplification of feature fusion arising from the constant downsample operations in semantic segmentation. The network is divided into spatial detail and semantic information paths. The spatial detail path utilizes a smaller downsample multiplier to maintain resolution and efficiently extract spatial information. The semantic information path is constructed by a non-bottleneck residual unit with dilated convolution; it extracts semantic features. For the feature aggregation problem, the feature-guided fusion module is designed to assign different weights to the parts of the two paths and fuse them to obtain the final output. The proposed algorithm achieves a segmentation accuracy of 69.6% and speed of 70 fps on the Cityscapes dataset, with a model parameter count of only 0.76 M, thus indicating some advantages over recent real-time semantic segmentation algorithms. The proposed method with depth separable convolution and attention mechanism can effectively extract features and compensate for the loss of accuracy caused by downsampling. The experiments demonstrate that the proposed fusion module outperforms other methods in fusing different features.

Cite this article as:
P. Wang, L. Li, F. Pan, and L. Wang, “Lightweight Bilateral Network for Real-Time Semantic Segmentation,” J. Adv. Comput. Intell. Intell. Inform., Vol.27 No.4, pp. 673-682, 2023.
Data files:
References
  1. [1] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3431-3440, 2015. https://doi.org/10.1109/CVPR.2015.7298965
  2. [2] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.39, No.12, pp. 2481-2495, 2017. https://doi.org/10.1109/TPAMI.2016.2644615
  3. [3] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv:1409.1556, 2014. https://doi.org/10.48550/arXiv.1409.1556
  4. [4] A. Paszke et al., “ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation,” arXiv:1606.02147, 2016. https://doi.org/10.48550/arXiv.1606.02147
  5. [5] H. Zhao et al., “ICNet for Real-Time Semantic Segmentation on High-Resolution Images,” European Conf. Computer Vision (ECCV 2018), pp. 418-434, 2018. https://doi.org/10.1007/978-3-030-01219-9_25
  6. [6] C. Peng et al., “Large Kernel Matters–Improve Semantic Segmentation by Global Convolutional Network,” IEEE Conf. on CVPR, pp. 1743-1751, 2017. https://doi.org/10.1109/CVPR.2017.189
  7. [7] C. Yu et al., “BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation,” ECCV 2018, pp. 334-349, 2018. https://doi.org/10.1007/978-3-030-01261-8_20
  8. [8] M. Fan et al., “Rethinking BiSeNet for Real-Time Semantic Segmentation,” 2021 IEEE/CVF Conf. on CVPR, pp. 9711-9720, 2021. https://doi.org/10.1109/CVPR46437.2021.00959
  9. [9] H. Li et al., “DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation,” 2019 IEEE/CVF Conf. on CVPR, pp. 9514-9523, 2019. https://doi.org/10.1109/CVPR.2019.00975
  10. [10] F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” 2017 IEEE/CVF Conf. on CVPR, pp. 1800-1807, 2017. https://doi.org/10.1109/CVPR.2017.195
  11. [11] A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv:1704.04861, 2017. https://doi.org/10.48550/arXiv.1704.04861
  12. [12] S.-Y. Lo et al., “Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation,” Proc. of the ACM Multimedia Asia (MMAsia’19), 2020. https://doi.org/10.1145/3338533.3366558
  13. [13] S. Mehta et al., “ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation,” ECCV 2018, pp. 561-580, 2018. https://doi.org/10.1007/978-3-030-01249-6_34
  14. [14] S. Mehta et al., “ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network,” 2019 IEEE/CVF Conf. on CVPR, pp. 9182-9192, 2019. https://doi.org/10.1109/CVPR.2019.00941
  15. [15] A. Vaswani et al., “Attention is All You Need,” Advances in Neural Information Processing Systems, Vol.30, pp. 5998-6008, 2017.
  16. [16] Y. Yuan et al., “OCNet: Object Context Network for Scene Parsing,” arXiv:1809.00916, 2018. https://doi.org/10.48550/arXiv.1809.00916
  17. [17] H. Zhao et al., “PSANet: Point-Wise Spatial Attention Network for Scene Parsing,” ECCV 2018, pp. 270-286, 2018. https://doi.org/10.1007/978-3-030-01240-3_17
  18. [18] J. Hu et al., “Squeeze-and-Excitation Networks,” 2018 IEEE/CVF Conf. on CVPR, pp. 7132-7141, 2018. https://doi.org/10.1109/cvpr.2018.00745
  19. [19] J. Fu et al., “Dual Attention Network for Scene Segmentation,” 2019 IEEE/CVF Conf. on CVPR, pp. 3141-3149, 2018. https://doi.org/10.1109/cvpr.2019.00326
  20. [20] H. Cao et al., “Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation,” arXiv:2105.05537, 2021. https://doi.org/10.48550/arXiv.2105.05537
  21. [21] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Medical Image Computing and Computer-Assisted Intervention, pp. 234-241, 2015. https://doi.org/10.1007/978-3-319-24574-4_28
  22. [22] F. Yu and V. Koltun, “Multi-Scale Context Aggregation by Dilated Convolutions,” arXiv:1511.07122, 2015. https://doi.org/10.48550/arXiv.1511.07122
  23. [23] L.-C. Chen et al., “Rethinking Atrous Convolution for Semantic Image Segmentation,” arXiv:1706.05587, 2017. https://doi.org/10.48550/arXiv.1706.05587
  24. [24] P. Wang et al., “Understanding Convolution for Semantic Segmentation,” 2018 IEEE Winter Conf. on Applications of Computer Vision (WACV), pp. 1451-1460, 2018. https://doi.org/10.1109/wacv.2018.00163
  25. [25] H. Zhao et al., “Pyramid Scene Parsing Network,” 2017 IEEE Conf. on CVPR, pp. 6230-6239, 2017. https://doi.org/10.1109/CVPR.2017.660
  26. [26] S. Liu, D. Huang, and Y. Wang, “Receptive Field Block Net for Accurate and Fast Object Detection,” ECCV 2018, pp. 404-419, 2018. https://doi.org/10.1007/978-3-030-01252-6_24
  27. [27] K. He et al., “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.37, No.9, pp. 1904-1916, 2015. https://doi.org/10.1109/TPAMI.2015.2389824
  28. [28] E. Romera et al., “ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation,” IEEE Trans. on Intelligent Transportation Systems, Vol.19, No.1, pp. 263-272, 2018. https://doi.org/10.1109/tits.2017.2750080
  29. [29] Y. Wang et al., “Lednet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation,” 2019 IEEE Int. Conf. on Image Processing (ICIP), pp. 1860-1864, 2019. https://doi.org/10.1109/ICIP.2019.8803154
  30. [30] L.-C. Chen et al., “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” ECCV 2018, pp. 833-851, 2018. https://doi.org/10.1007/978-3-030-01234-2_49
  31. [31] Q. Wang et al., “ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks,” 2020 IEEE/CVF Conf. on CVPR, pp. 11531-11539, 2020. https://doi.org/10.1109/CVPR42600.2020.01155
  32. [32] S. Woo et al., “CBAM: Convolutional Block Attention Module,” ECCV 2018, pp. 3-19, 2018. https://doi.org/10.1007/978-3-030-01234-2_1
  33. [33] K. He et al., “Deep Residual Learning for Image Recognition,” 2016 IEEE Conf. on CVPR, pp. 770-778, 2016. https://doi.org/10.1109/cvpr.2016.90
  34. [34] A. Shrivastava, A. Gupta, and R. Girshick, “Training Region-Based Object Detectors with Online Hard Example Mining,” 2016 IEEE Conf. on CVPR, pp. 761-769, 2016. https://doi.org/10.1109/cvpr.2016.89
  35. [35] M. Treml et al., “Speeding up Semantic Segmentation for Autonomous Driving,” 29th Conf. on Neural Information Processing Systems (NIPS 2016), 2016.
  36. [36] T. Wu et al., “CGNet: A Light-Weight Context Guided Network for Semantic Segmentation,” IEEE Trans. on Image Processing, Vol.30, pp. 1169-1179, 2021. https://doi.org/10.1109/tip.2020.3042065
  37. [37] R. P. K. Poudel, S. Liwicki, and R. Cipolla, “Fast-SCNN: Fast Semantic Segmentation Network,” arXiv:1902.04502, 2019. https://doi.org/10.48550/arXiv.1902.04502
  38. [38] G. Li et al., “Depth-Wise Asymmetric Bottleneck with Point-Wise Aggregation Decoder for Real-Time Semantic Segmentation in Urban Scenes,” IEEE Access, Vol.8, pp. 27495-27506, 2020. https://doi.org/10.1109/access.2020.2971760

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Apr. 22, 2024