single-jc.php

JACIII Vol.29 No.6 pp. 1319-1328
doi: 10.20965/jaciii.2025.p1319
(2025)

Research Paper:

Linear Transformer Based U-Shaped Lightweight Segmentation Network

Hongli He* ORCID Icon, Changhao Sun** ORCID Icon, Zhaoyuan Wang** ORCID Icon, and Yongping Dan**,† ORCID Icon

*Rail Transit Institute, Henan College of Transportation
No.259 Tonghui Road, Zhengzhou 450061, China

**School of Integrated Circuits, Zhongyuan University of Technology
No.41 Zhongyuan Road, Zhengzhou 450007, China

Corresponding author

Received:
November 29, 2024
Accepted:
June 22, 2025
Published:
November 20, 2025
Keywords:
transformer, U-Net, medical segmentation
Abstract

The widespread development and application of embedded medical devices necessitate the corresponding research in lightweight, energy-efficient models. Although transformer-based segmentation models have shown promise in various visual tasks, inherent challenges, including the lack of inductive bias and an overreliance on extensive training data, emerge when striving for optimal model efficiency. By contrast, convolutional neural networks (CNNs), with their intrinsic inductive biases and parameter-sharing mechanisms, enable a reduction in the number of parameters and a focus on capturing local features, thereby lowering computational costs. However, reliance solely on transformers does not meet the practical demands of lightweight model efficiency. Hence, the integration of CNNs with transformers presents a promising research trajectory for constructing efficient and lightweight networks. This hybrid approach leverages the strengths of CNNs in feature extraction and the ability of transformers to model global dependencies, achieving a balance between model performance and efficiency. In this paper, we propose MobileViTv2s, a novel lightweight segmentation network that integrates CNNs with a linear transformer. The proposed network efficiently extracts local features via CNNs, whereas transformers adeptly manage complex feature relationships, thereby facilitating precise segmentation in intricate contexts such as medical imaging. The model demonstrates significant potential and applicability in the advancement of lightweight deep learning models. Experimental results revealed that the proposed model achieved up to a 14.34-fold improvement in efficiency, a 9.91-fold reduction in the number of parameters, and comparable or superior segmentation accuracy, while achieving a markedly lower Hausdorff distance.

Illustration of the network architecture

Illustration of the network architecture

Cite this article as:
H. He, C. Sun, Z. Wang, and Y. Dan, “Linear Transformer Based U-Shaped Lightweight Segmentation Network,” J. Adv. Comput. Intell. Intell. Inform., Vol.29 No.6, pp. 1319-1328, 2025.
Data files:
References
  1. [1] G. Du, X. Cao, J. Liang, X. Chen, and Y. Zhan, “Medical image segmentation based on u-net: A review.” J. of Imaging Science & Technology, Vol.64, No.2, Article No.jist0710, 2020. https://doi.org/10.2352/J.ImagingSci.Technol.2020.64.2.020508
  2. [2] N. Heller, F. Isensee, K. H. Maier-Hein, X. Hou, C. Xie, F. Li, Y. Nan, G. Mu, Z. Lin, M. Han et al., “The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the kits19 challenge,” Medical Image Analysis, Vol.67, Article No.101821, 2021. https://doi.org/10.1016/j.media.2020.101821
  3. [3] J. Li, J. Chen, Y. Tang, C. Wang, B. A. Landman, and S. K. Zhou, “Transforming medical imaging with transformers? a comparative review of key properties, current progresses, and future perspectives,” Medical Image Analysis, Vol.85, Article No.102762, 2023. https://doi.org/10.1016/j.media.2023.102762
  4. [4] G. Wang, W. Li, M. A. Zuluaga, R. Pratt, P. A. Patel, M. Aertsen, T. Doel, A. L. David, J. Deprest, S. Ourselin et al., “Interactive medical image segmentation using deep learning with image-specific fine tuning,” IEEE Trans. on Medical Imaging, Vol.37, No.7, pp. 1562-1573, 2018. https://doi.org/10.1109/TMI.2018.2791721
  5. [5] M. H. Hesamian, W. Jia, X. He, and P. Kennedy, “Deep learning techniques for medical image segmentation: Achievements and challenges,” J. of Digital Imaging, Vol.32, pp. 582-596, 2019. https://doi.org/10.1007/s10278-019-00227-x
  6. [6] M. Li and P. Vitányi, “Two decades of applied Kolmogorov complexity: In memoriam Andrei Nikolaevich Kolmogorov 1903-87,” Proc. Structure in Complexity Theory 3rd Annual Conf., pp. 80-101, 1988. https://doi.org/10.1109/SCT.1988.5265
  7. [7] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015. https://doi.org/10.1109/CVPR.2015.7298965
  8. [8] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” 18th Int. Conf. on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), pp. 234-241, 2015. https://doi.org/10.1007/978-3-319-24574-4_28
  9. [9] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz et al., “Attention u-net: Learning where to look for the pancreas,” arXiv:1804.03999, 2018.
  10. [10] Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, “UNet++: A nested u-net architecture for medical image segmentation,” 4th Int. Workshop on Deep Learning in Medical Image Analysis (DLMIA 2018) and 8th Int. Workshop on Multimodal Learning for Clinical Decision Support (ML-CDS 2018), Held in Conjunction with MICCAI 2018, pp. 3-11, 2018. https://doi.org/10.1007/978-3-030-00889-5_1
  11. [11] F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnU-net: A self-configuring method for deep learning-based biomedical image segmentation,” Nature Methods, Vol.18, No.2, pp. 203-211, 2021. https://doi.org/10.1038/s41592-020-01008-z
  12. [12] A. Vaswani, “Attention is all you need,” Advances in Neural Information Processing Systems (NIPS 2017), 2017.
  13. [13] A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv:2010.11929, 2020.
  14. [14] J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, and Y. Zhou, “TransUNet: Transformers make strong encoders for medical image segmentation,” arXiv:2102.04306, 2021.
  15. [15] T.-H. Pham, X. Li, and K.-D. Nguyen, “seUNet-trans: A simple yet effective UNet-transformer model for medical image segmentation,” IEEE Access, Vol.12, pp. 122139-122154, 2024. https://doi.org/10.1109/ACCESS.2024.3451304
  16. [16] F. Shamshad, S. Khan, S. W. Zamir, M. H. Khan, M. Hayat, F. S. Khan, and H. Fu, “Transformers in medical imaging: A survey,” Medical Image Analysis, Vol.88, Article No.102802, 2023. https://doi.org/10.1016/j.media.2023.102802
  17. [17] A. Lin, B. Chen, J. Xu, Z. Zhang, G. Lu, and D. Zhang, “DS-TransUNet: Dual swin transformer u-net for medical image segmentation,” IEEE Trans. on Instrumentation and Measurement, Vol.71, pp. 1-15, 2022. https://doi.org/10.1109/TIM.2022.3178991
  18. [18] G. Sun, Y. Pan, W. Kong, Z. Xu, J. Ma, T. Racharak, L.-M. Nguyen, and J. Xin, “ DA-TransUNet: Integrating spatial and channel dual attention with transformer u-net for medical image segmentation,” Frontiers in Bioengineering and Biotechnology, Vol.12, Article No.1398237, 2024. https://doi.org/10.3389/fbioe.2024.1398237
  19. [19] S. Pan, X. Liu, N. Xie, and Y. Chong, “EG-TransUNet: a transformer-based u-net with enhanced and guided models for biomedical image segmentation,” BMC Bioinformatics, Vol.24, No.1, Article No.85, 2023. https://doi.org/10.1186/s12859-023-05196-1
  20. [20] B. Niu, Q. Feng, B. Chen, C. Ou, Y. Liu, and J. Yang, “HSI-TransUNet: A transformer based semantic segmentation model for crop mapping from UAV hyperspectral imagery,” Computers and Electronics in Agriculture, Vol.201, Article No.107297, 2022. https://doi.org/10.1016/j.compag.2022.107297
  21. [21] D. Han, X. Pan, Y. Han, S. Song, and G. Huang, “Flatten transformer: Vision transformer using focused linear attention,” Proc. of the IEEE/CVF Int. Conf. on Computer Vision, pp. 5961-5971, 2023.
  22. [22] J. Chen, J. Mei, X. Li, Y. Lu, Q. Yu, Q. Wei, X. Luo, Y. Xie, E. Adeli, Y. Wang et al., “TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers,” Medical Image Analysis, Vol.97, Article No.103280, 2024. https://doi.org/10.1016/j.media.2024.103280
  23. [23] V. Torre and T. A. Poggio, “On edge detection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, No.2, pp. 147-163, 1986. https://doi.org/10.1109/TPAMI.1986.4767769
  24. [24] U. Mallik, M. Clapp, E. Choi, G. Cauwenberghs, and R. Etienne-Cummings, “Temporal change threshold detection imager,” 2005 IEEE Int. Digest of Technical Papers. Solid-State Circuits Conf., pp. 362-603, 2005. https://doi.org/10.1109/ISSCC.2005.1494019
  25. [25] G. Li, D. Jin, Q. Yu, and M. Qi, “IB-TransUNet: Combining information bottleneck and transformer for medical image segmentation,” J. of King Saud University-Computer and Information Sciences, Vol.35, No.3, pp. 249-258, 2023. https://doi.org/10.1016/j.jksuci.2023.02.012
  26. [26] Z. Zhu, M. Sun, G. Qi, Y. Li, X. Gao, and Y. Liu, “Sparse dynamic volume TransUNet with multi-level edge fusion for brain tumor segmentation,” Computers in Biology and Medicine, Article No.108284, 2024. https://doi.org/10.1016/j.compbiomed.2024.108284
  27. [27] S. Mehta and M. Rastegari, “MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer,” arXiv:2110.02178, 2021.
  28. [28] S. Mehta and M. Rastegari, “Separable self-attention for mobile vision transformers,” arXiv:2206.02680, 2022.
  29. [29] W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” Proc. of the IEEE/CVF Int. Conf. on Computer Vision, pp. 568-578, 2021. https://doi.org/10.1109/ICCV48922.2021.00061
  30. [30] K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang, “Transformer in transformer,” Advances in Neural Information Processing Systems, Vol.34, pp. 15908-15919, 2021.
  31. [31] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: Deformable transformers for end-to-end object detection,” arXiv:2010.04159, 2020.
  32. [32] H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, “CvT: Introducing convolutions to vision transformers,” Proc. of the IEEE/CVF Int. Conf. on Computer Vision, pp. 22-31, 2021. https://doi.org/10.1109/ICCV48922.2021.00009
  33. [33] S. P. Morozov, A. E. Andreychenko, N. Pavlov, A. Vladzymyrskyy, N. Ledikhova, V. Gombolevskiy, I. A. Blokhin, P. Gelezhe, A. Gonchar, and V. Y. Chernina, “Mosmeddata: Chest CT scans with covid-19 related findings dataset,” arXiv:2005.06465, 2020. https://doi.org/10.1101/2020.05.20.20100362
  34. [34] J. Ma, Y. Wang, X. An, C. Ge, Z. Yu, J. Chen, Q. Zhu, G. Dong, J. He, Z. He et al., “Toward data-efficient learning: A benchmark for covid-19 CT lung and infection segmentation,” Medical Physics, Vol.48, No.3, pp. 1197-1210, 2021. https://doi.org/10.1002/mp.14676
  35. [35] S. Jaeger, A. Karargyris, S. Candemir, L. Folio, J. Siegelman, F. Callaghan, Z. Xue, K. Palaniappan, R. K. Singh, S. Antani et al., “Automatic tuberculosis screening using chest radiographs,” IEEE Trans. on Medical Imaging, Vol.33, No.2, pp. 233-245, 2013. https://doi.org/10.1109/TMI.2013.2284099
  36. [36] S. Candemir, S. Jaeger, K. Palaniappan, J. P. Musco, R. K. Singh, Z. Xue, A. Karargyris, S. Antani, G. Thoma, and C. J. McDonald, “Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration,” IEEE Trans. on Medical Imaging, Vol.33, No.2, pp. 577-590, 2013. https://doi.org/10.1109/TMI.2013.2290491
  37. [37] X. Xiao, S. Lian, Z. Luo, and S. Li, “Weighted Res-UNet for high-quality retina vessel segmentation,” 2018 9th Int. Conf. on Information Technology in Medicine and Education (ITME), pp. 327-331, 2018. https://doi.org/10.1109/ITME.2018.00080
  38. [38] X. Huang, Z. Deng, D. Li, and X. Yuan, “MISSFormer: An effective medical image segmentation transformer,” arXiv:2109.07162, 2021.
  39. [39] M. Heidari, A. Kazerouni, M. Soltany, R. Azad, E. K. Aghdam, J. Cohen-Adad, and D. Merhof, “HiFormer: Hierarchical multi-scale representations using transformers for medical image segmentation,” Proc. of the IEEE/CVF Winter Conf. on Applications of Computer Vision, pp. 6202-6212, 2023. https://doi.org/10.1109/WACV56688.2023.00614
  40. [40] H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-Unet: Unet-like pure transformer for medical image segmentation,” European Conf. on Computer Vision, pp. 205-218, 2022. https://doi.org/10.1007/978-3-031-25066-8_9
  41. [41] L.-C. Chen, “Rethinking atrous convolution for semantic image segmentation,” arXiv:1706.05587, 2017.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Nov. 19, 2025