Lightweight Encoder with Attention Mechanism for Pipe Recognition Network

Yang Tian; Xinyu Li; Shugen Ma

doi:10.20965/jrm.2024.p0343

single-rb.php

« previous

JRM Vol.36 No.2 pp. 343-352

doi: 10.20965/jrm.2024.p0343

(2024)

Paper:

Views over last 60 days: 430

Lightweight Encoder with Attention Mechanism for Pipe Recognition Network

Yang Tian , Xinyu Li, and Shugen Ma

Department of Robotics, Ritsumeikan University
1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan

Received:

September 27, 2023

Accepted:

January 12, 2024

Published:

April 20, 2024

Keywords:

object recognition, deep learning, BIM, pipe

Abstract

Utilizing building information modeling (BIM) for the analysis of existing pipelines necessitates the development of a swift and precise recognition method. Deep learning-based object recognition through imagery has emerged as a potent solution for tackling various recognition tasks. However, the direct application of these models is unfeasible due to their substantial computational requirements. In this research, we introduce a lightweight encoder explicitly for pipe recognition. By optimizing the network architecture using attention mechanisms, it ensures high-precision recognition while maintaining computational efficiency. The experimental results showcased in this study underscore the efficacy of the proposed lightweight encoder and its associated networks.

Recognition system in piping environment

Cite this article as:

Y. Tian, X. Li, and S. Ma, “Lightweight Encoder with Attention Mechanism for Pipe Recognition Network,” J. Robot. Mechatron., Vol.36 No.2, pp. 343-352, 2024.

Data files:

References

[1] R. Sacks, C. Eastman, G. Lee, and P. Teicholz, “BIM handbook: A guide to building information modeling for owners, designers, engineers, contractors, and facility managers,” John Wiley & Sons, 2018. https://doi.org/10.1002/9781119287568
[2] S. Alizadehsalehi, A. Hadavi, and J. C. Huang, “From bim to extended reality in aec industry,” Automation in Construction, Vol.116, Article No.103254, 2020. https://doi.org/10.1016/j.autcon.2020.103254
[3] A. Watson, “Digital buildings–challenges and opportunities,” Advanced Engineering Informatics, Vol.25, No.4, pp. 573-581, 2011. https://doi.org/10.1016/j.aei.2011.07.003
[4] D. W. Green and M. Z. Southard, “Perry’s chemical engineers’ handbook,” McGraw-Hill Education, 2019.
[5] D. Huber, B. Akinci, A. A. Oliver, E. Anil, B. E. Okorn, and X. Xiong, “Methods for automatically modeling and representing as-built building information models,” Proc. of the NSF CMMI Research Innovation Conf., Article No.856558, 2011.
[6] Y.-J. Liu, J.-B. Zhang, J.-C. Hou, J.-C. Ren, and W.-Q. Tang, “Cylinder detection in large-scale point cloud of pipeline plant,” IEEE Trans. on Visualization and Computer Graphics, Vol.19, No.10, pp. 1700-1707, 2013. https://doi.org/10.1109/TVCG.2013.74
[7] A. K. Patil, P. Holi, S. K. Lee, and Y. H. Chai, “An adaptive approach for the reconstruction and modeling of as-built 3d pipelines from point clouds,” Automation in Construction, Vol.75, pp. 65-78, 2017. https://doi.org/10.1016/j.autcon.2016.12.002
[8] Y. Lin, Y. Tian, and X. Li, “Development of a pipe recognition network based on attention mechanism,” Proc. of JSME Annual Conf. on Robotics and Mechatronics (Robomech) 2022, 2P1-R08, 2022. https://doi.org/10.1299/jsmermd.2022.2P1-R08
[9] F. Remondino and S. El-Hakim, “Image-based 3d modelling: A review,” The Photogrammetric Record, Vol.21, No.115, pp. 269-291, 2006. https://doi.org/10.1111/j.1477-9730.2006.00383.x
[10] J. D. Markley, J. R. Stutzman, and E. N. Harris, “Hybridization of photogrammetry and laser scanning technology for as-built 3d cad models,” Proc. of 2008 IEEE Aerospace Conf., 2008. https://doi.org/10.1109/AERO.2008.4526650
[11] R. B. Rusu, “Semantic 3d object maps for everyday manipulation in human living environments,” KI – Künstliche Intelligenz, Vol.24, No.4, pp. 345-348, 2010. https://doi.org/10.1007/s13218-010-0059-6
[12] D. D. Patil and S. G. Deore, “Medical image segmentation: A review,” Int. J. of Computer Science and Mobile Computing, Vol.2, No.1, pp. 22-27, 2013.
[13] Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, and M. Bennamoun, “Deep learning for 3d point clouds: A survey,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.43, No.12, pp. 4338-4364, 2020. https://doi.org/10.1109/TPAMI.2020.3005434
[14] S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a convolutional neural network,” Proc. of 2017 Int. Conf. on Engineering and Technology (ICET), 2017. https://doi.org/10.1109/ICEngTechnol.2017.8308186
[15] R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 652-660, 2017. https://doi.org/10.1109/CVPR.2017.16
[16] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” Advances in Neural Information Processing Systems, Vol.30, 2017.
[17] R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel, “Imagenet-trained cnns are biased towards texture: Increasing shape bias improves accuracy and robustness,” arXiv preprint, arXiv:1811.12231, 2018. https://doi.org/10.48550/arXiv.1811.12231
[18] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1912-1920, 2015. https://doi.org/10.1109/CVPR.2015.7298801
[19] W. Yuan, T. Khot, D. Held, C. Mertz, and M. Hebert, “Pcn: Point completion network,” Proc. of 2018 Int. Conf. on 3D Vision (3DV), pp. 728-737, 2018. https://doi.org/10.1109/3DV.2018.00088
[20] M. Aharchi and M. A. Kbir, “A review on 3d reconstruction techniques from 2d images,” Innovations in Smart Cities Applications Edition 3: Proc. of the 4th Int. Conf. on Smart City Applications, Vol.4, pp. 510-522, 2020. https://doi.org/10.1007/978-3-030-37629-1_37
[21] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint, arXiv:1409.1556, 2014. https://doi.org/10.48550/arXiv.1409.1556
[22] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 770-778, 2016. https://doi.org/10.1109/CVPR.2016.90
[23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2015. https://doi.org/10.1109/CVPR.2015.7298594
[24] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint, arXiv:1704.04861, 2017. https://doi.org/10.48550/arXiv.1704.04861
[25] X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 6848-6856, 2018. https://doi.org/10.1109/CVPR.2018.00716
[26] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” Proc. of Int. Conf. on Machine Learning, pp. 6105-6114, 2019.
[27] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.40, No.4, pp. 834-848, 2017. https://doi.org/10.1109/TPAMI.2017.2699184
[28] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” Proc. of the European Conf. on Computer Vision (ECCV), pp. 801-818, 2018. https://doi.org/10.1007/978-3-030-01234-2_49
[29] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” Proc. of 18th Int. Conf. on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Munich, Germany, October 5-9, 2015, Part III, pp. 234-241, 2015. https://doi.org/10.1007/978-3-319-24574-4_28
[30] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 770-778, 2016. https://doi.org/10.1109/CVPR.2016.90
[31] X. Ding, X. Zhang, J. Han, and G. Ding, “Scaling up your kernels to 31x31: Revisiting large kernel design in cnns,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 11963-11975, 2022. https://doi.org/10.1109/CVPR52688.2022.01166
[32] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018. https://doi.org/10.1109/CVPR.2018.00474
[33] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 7132-7141, 2018. https://doi.org/10.1109/CVPR.2018.00745
[34] P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for activation functions,” arXiv preprint, arXiv:1710.05941, 2017. https://doi.org/10.48550/arXiv.1710.05941
[35] Y. Tian, C. Ding, Y. F. Lin, S. Ma, and L. Li, “Automatic feature type selection in digital photogrammetry of piping,” Computer-Aided Civil and Infrastructure Engineering, Vol.37, No.10, pp. 1335-1348, 2022. https://doi.org/10.1111/mice.12840
[36] Y. Liu, L. Chu, G. Chen, Z. Wu, Z. Chen, B. Lai, and Y. Hao, “Paddleseg: A high-efficient development toolkit for image segmentation,” arXiv preprint, arXiv:2101.06175, 2021. https://doi.org/10.48550/arXiv.2101.06175
[37] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan et al., “Searching for mobilenetv3,” Proc. of the IEEE/CVF Int. Conf. on Computer Vision, pp. 1314-1324, 2019. https://doi.org/10.1109/ICCV.2019.00140
[38] Y. Tang, K. Han, J. Guo, C. Xu, C. Xu, and Y. Wang, “Ghostnetv2: Enhance cheap operation with long-range attention,” Advances in Neural Information Processing Systems, Vol.35, pp. 9969-9982, 2022.
[39] M. Fan, S. Lai, J. Huang, X. Wei, Z. Chai, J. Luo, and X. Wei, “Rethinking bisenet for real-time semantic segmentation,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 9716-9725, 2021. https://doi.org/10.1109/CVPR46437.2021.00959
[40] S. Tang, T. Sun, J. Peng, G. Chen, Y. Hao, M. Lin, Z. Xiao, J. You, and Y. Liu, “Pp-mobileseg: Explore the fast and accurate semantic segmentation model on mobile devices,” arXiv preprint, arXiv:2304.05152, 2023. https://doi.org/10.48550/arXiv.2304.05152
[41] J. Peng, Y. Liu, S. Tang, Y. Hao, L. Chu, G. Chen, Z. Wu, Z. Chen, Z. Yu, Y. Du, Q. Dang, B. Lai, Q. Liu, X. Hu, D. Yu, and Y. Ma, “Pp-liteseg: A superior real-time semantic segmentation model,” arXiv preprint, arXiv:2204.02681, 2022. https://doi.org/10.48550/arXiv.2204.02681
[42] C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, and N. Sang, “Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation,” Int. J. of Computer Vision, Vol.129, pp. 3051-3068, 2021. https://doi.org/10.1007/s11263-021-01515-2
[43] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, Vol.34, pp. 12077-12090, 2021.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] R. Sacks, C. Eastman, G. Lee, and P. Teicholz, “BIM handbook: A guide to building information modeling for owners, designers, engineers, contractors, and facility managers,” John Wiley & Sons, 2018. https://doi.org/10.1002/9781119287568

[2] [2] S. Alizadehsalehi, A. Hadavi, and J. C. Huang, “From bim to extended reality in aec industry,” Automation in Construction, Vol.116, Article No.103254, 2020. https://doi.org/10.1016/j.autcon.2020.103254

[3] [3] A. Watson, “Digital buildings–challenges and opportunities,” Advanced Engineering Informatics, Vol.25, No.4, pp. 573-581, 2011. https://doi.org/10.1016/j.aei.2011.07.003

[4] [4] D. W. Green and M. Z. Southard, “Perry’s chemical engineers’ handbook,” McGraw-Hill Education, 2019.

[5] [5] D. Huber, B. Akinci, A. A. Oliver, E. Anil, B. E. Okorn, and X. Xiong, “Methods for automatically modeling and representing as-built building information models,” Proc. of the NSF CMMI Research Innovation Conf., Article No.856558, 2011.

[6] [6] Y.-J. Liu, J.-B. Zhang, J.-C. Hou, J.-C. Ren, and W.-Q. Tang, “Cylinder detection in large-scale point cloud of pipeline plant,” IEEE Trans. on Visualization and Computer Graphics, Vol.19, No.10, pp. 1700-1707, 2013. https://doi.org/10.1109/TVCG.2013.74

[7] [7] A. K. Patil, P. Holi, S. K. Lee, and Y. H. Chai, “An adaptive approach for the reconstruction and modeling of as-built 3d pipelines from point clouds,” Automation in Construction, Vol.75, pp. 65-78, 2017. https://doi.org/10.1016/j.autcon.2016.12.002

[8] [8] Y. Lin, Y. Tian, and X. Li, “Development of a pipe recognition network based on attention mechanism,” Proc. of JSME Annual Conf. on Robotics and Mechatronics (Robomech) 2022, 2P1-R08, 2022. https://doi.org/10.1299/jsmermd.2022.2P1-R08

[9] [9] F. Remondino and S. El-Hakim, “Image-based 3d modelling: A review,” The Photogrammetric Record, Vol.21, No.115, pp. 269-291, 2006. https://doi.org/10.1111/j.1477-9730.2006.00383.x

[10] [10] J. D. Markley, J. R. Stutzman, and E. N. Harris, “Hybridization of photogrammetry and laser scanning technology for as-built 3d cad models,” Proc. of 2008 IEEE Aerospace Conf., 2008. https://doi.org/10.1109/AERO.2008.4526650

[11] [11] R. B. Rusu, “Semantic 3d object maps for everyday manipulation in human living environments,” KI – Künstliche Intelligenz, Vol.24, No.4, pp. 345-348, 2010. https://doi.org/10.1007/s13218-010-0059-6

[12] [12] D. D. Patil and S. G. Deore, “Medical image segmentation: A review,” Int. J. of Computer Science and Mobile Computing, Vol.2, No.1, pp. 22-27, 2013.

[13] [13] Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, and M. Bennamoun, “Deep learning for 3d point clouds: A survey,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.43, No.12, pp. 4338-4364, 2020. https://doi.org/10.1109/TPAMI.2020.3005434

[14] [14] S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a convolutional neural network,” Proc. of 2017 Int. Conf. on Engineering and Technology (ICET), 2017. https://doi.org/10.1109/ICEngTechnol.2017.8308186

[15] [15] R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 652-660, 2017. https://doi.org/10.1109/CVPR.2017.16

[16] [16] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” Advances in Neural Information Processing Systems, Vol.30, 2017.

[17] [17] R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel, “Imagenet-trained cnns are biased towards texture: Increasing shape bias improves accuracy and robustness,” arXiv preprint, arXiv:1811.12231, 2018. https://doi.org/10.48550/arXiv.1811.12231

[18] [18] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1912-1920, 2015. https://doi.org/10.1109/CVPR.2015.7298801

[19] [19] W. Yuan, T. Khot, D. Held, C. Mertz, and M. Hebert, “Pcn: Point completion network,” Proc. of 2018 Int. Conf. on 3D Vision (3DV), pp. 728-737, 2018. https://doi.org/10.1109/3DV.2018.00088

[20] [20] M. Aharchi and M. A. Kbir, “A review on 3d reconstruction techniques from 2d images,” Innovations in Smart Cities Applications Edition 3: Proc. of the 4th Int. Conf. on Smart City Applications, Vol.4, pp. 510-522, 2020. https://doi.org/10.1007/978-3-030-37629-1_37

[21] [21] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint, arXiv:1409.1556, 2014. https://doi.org/10.48550/arXiv.1409.1556

[22] [22] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 770-778, 2016. https://doi.org/10.1109/CVPR.2016.90

[23] [23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2015. https://doi.org/10.1109/CVPR.2015.7298594

[24] [24] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint, arXiv:1704.04861, 2017. https://doi.org/10.48550/arXiv.1704.04861

[25] [25] X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 6848-6856, 2018. https://doi.org/10.1109/CVPR.2018.00716

[26] [26] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” Proc. of Int. Conf. on Machine Learning, pp. 6105-6114, 2019.

[27] [27] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.40, No.4, pp. 834-848, 2017. https://doi.org/10.1109/TPAMI.2017.2699184

[28] [28] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” Proc. of the European Conf. on Computer Vision (ECCV), pp. 801-818, 2018. https://doi.org/10.1007/978-3-030-01234-2_49

[29] [29] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” Proc. of 18th Int. Conf. on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Munich, Germany, October 5-9, 2015, Part III, pp. 234-241, 2015. https://doi.org/10.1007/978-3-319-24574-4_28

[30] [30] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 770-778, 2016. https://doi.org/10.1109/CVPR.2016.90

[31] [31] X. Ding, X. Zhang, J. Han, and G. Ding, “Scaling up your kernels to 31x31: Revisiting large kernel design in cnns,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 11963-11975, 2022. https://doi.org/10.1109/CVPR52688.2022.01166

[32] [32] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018. https://doi.org/10.1109/CVPR.2018.00474

[33] [33] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 7132-7141, 2018. https://doi.org/10.1109/CVPR.2018.00745

[34] [34] P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for activation functions,” arXiv preprint, arXiv:1710.05941, 2017. https://doi.org/10.48550/arXiv.1710.05941

[35] [35] Y. Tian, C. Ding, Y. F. Lin, S. Ma, and L. Li, “Automatic feature type selection in digital photogrammetry of piping,” Computer-Aided Civil and Infrastructure Engineering, Vol.37, No.10, pp. 1335-1348, 2022. https://doi.org/10.1111/mice.12840

[36] [36] Y. Liu, L. Chu, G. Chen, Z. Wu, Z. Chen, B. Lai, and Y. Hao, “Paddleseg: A high-efficient development toolkit for image segmentation,” arXiv preprint, arXiv:2101.06175, 2021. https://doi.org/10.48550/arXiv.2101.06175

[37] [37] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan et al., “Searching for mobilenetv3,” Proc. of the IEEE/CVF Int. Conf. on Computer Vision, pp. 1314-1324, 2019. https://doi.org/10.1109/ICCV.2019.00140

[38] [38] Y. Tang, K. Han, J. Guo, C. Xu, C. Xu, and Y. Wang, “Ghostnetv2: Enhance cheap operation with long-range attention,” Advances in Neural Information Processing Systems, Vol.35, pp. 9969-9982, 2022.

[39] [39] M. Fan, S. Lai, J. Huang, X. Wei, Z. Chai, J. Luo, and X. Wei, “Rethinking bisenet for real-time semantic segmentation,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 9716-9725, 2021. https://doi.org/10.1109/CVPR46437.2021.00959

[40] [40] S. Tang, T. Sun, J. Peng, G. Chen, Y. Hao, M. Lin, Z. Xiao, J. You, and Y. Liu, “Pp-mobileseg: Explore the fast and accurate semantic segmentation model on mobile devices,” arXiv preprint, arXiv:2304.05152, 2023. https://doi.org/10.48550/arXiv.2304.05152

[41] [41] J. Peng, Y. Liu, S. Tang, Y. Hao, L. Chu, G. Chen, Z. Wu, Z. Chen, Z. Yu, Y. Du, Q. Dang, B. Lai, Q. Liu, X. Hu, D. Yu, and Y. Ma, “Pp-liteseg: A superior real-time semantic segmentation model,” arXiv preprint, arXiv:2204.02681, 2022. https://doi.org/10.48550/arXiv.2204.02681

[42] [42] C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, and N. Sang, “Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation,” Int. J. of Computer Vision, Vol.129, pp. 3051-3068, 2021. https://doi.org/10.1007/s11263-021-01515-2

[43] [43] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, Vol.34, pp. 12077-12090, 2021.