single-au.php

IJAT Vol.19 No.4 pp. 575-586
doi: 10.20965/ijat.2025.p0575
(2025)

Research Paper:

Motion Control of Mobile Robots in Snowy Environments Using Semantic Segmentation —Temporally Consistent GAN-Based Image-to-Image Translation from Winter to Summer—

Yugo Takagi, Fangzheng Li, Reo Miura, and Yonghoon Ji ORCID Icon

Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology
1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan

Corresponding author

Received:
November 30, 2024
Accepted:
February 3, 2025
Published:
July 5, 2025
Keywords:
mobile robotics, field robotics, semantic segmentation, generative adversarial network, snowy environment
Abstract

In recent years, autonomous mobile robots have been deployed in outdoor environments, including challenging conditions such as snow. In snowy environments, stable motion control is difficult because detecting pavement edges from camera images becomes unreliable due to snow coverage. To address this limitation, we propose a novel framework for autonomous motion control in snowy environments, utilizing semantic segmentation and generative adversarial networks (GANs). In our approach, winter images captured by a camera are transformed into summer-like images using a GAN, enabling automatic detection of snow-covered pavement through semantic segmentation. However, conventional GAN-based image translation has limited accuracy because it does not account for the temporal consistency of time-series images. To overcome this issue, we improve the temporal consistency of GAN-based image translation by incorporating the sequential characteristics of images captured by a monocular camera mounted on a mobile robot. The improved GAN demonstrates high temporal consistency in real-world datasets. Furthermore, we achieve stable motion control in snow-covered environments using a novel scheme that generates optimal subgoals based on pavement coplanarity.

Cite this article as:
Y. Takagi, F. Li, R. Miura, and Y. Ji, “Motion Control of Mobile Robots in Snowy Environments Using Semantic Segmentation —Temporally Consistent GAN-Based Image-to-Image Translation from Winter to Summer—,” Int. J. Automation Technol., Vol.19 No.4, pp. 575-586, 2025.
Data files:
References
  1. [1] S. Kobayashi, Y. Sasaki, A. Yorozu, and A. Ohya, “Probabilistic semantic occupancy grid mapping considering the uncertainty of semantic segmentation with IPM,” Proc. of the 2022 IEEE/ASME Int. Conf. on Advanced Intelligent Mechatronics (AIM 2022), pp. 250-255, 2022. https://doi.org/10.1109/AIM52237.2022.9863353
  2. [2] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2017), pp. 2881-2890, 2017. https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.660
  3. [3] S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, and H. Hajishirzi, “ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation,” Proc. of the 15th European Conf. on Computer Vision (ECCV 2018), pp. 552-568, 2018. https://doi.org/10.1007/978-3-030-01249-6_34
  4. [4] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems 34 (NeurIPS 2021), pp. 12077-12090, 2021.
  5. [5] S. Vachmanus, A. A. Ravankar, T. Emaru, and Y. Kobayasi, “Multi-modal sensor fusion-based semantic segmentation for snow driving scenarios,” IEEE Sensors J., Vol.21, No.15, pp. 16839-16851, 2021. https://doi.org/10.1109/JSEN.2021.3077029
  6. [6] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” Proc. of the 2017 IEEE Int. Conf. on Computer Vision (ICCV 2017), pp. 2223-2232, 2017. https://doi.org/10.1109/ICCV.2017.244
  7. [7] E. Jung, N. Yang, and D. Cremers, “Multi-Frame GAN: Image enhancement for stereo visual odometry in low light,” Proc. of the 16th European Conf. on Computer Vision (ECCV 2020), pp. 651-660, 2020.
  8. [8] Y. Takagi and Y. Ji, “Motion control of mobile robot based on semantic segmentation of GAN-generated fake image for snow-covered environment,” Proc. of the 19th Int. Conf. on Ubiquitous Robots (UR 2022), pp. 611-612, 2022.
  9. [9] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Proc. of the 28th Advances in Neural Information Processing Systems (NIPS 2014), pp. 2672-2680, 2014.
  10. [10] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local nash equilibrium,” Proc. of the 31st Advances in Neural Information Processing Systems (NIPS 2017), pp. 6626-6637, 2017.
  11. [11] P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2017), pp. 5967-5976, 2017. https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.632
  12. [12] J. Hwang, C. Yu, and Y. Shin, “SAR-to-optica image translation using SSIM and perceptual loss based cycle-consistent GAN,” Proc. of the 11th Int. Conf. on Information and Communication Technology Convergence (ICTC 2020), pp. 191-194, 2020. https://doi.org/10.1109/ICTC49870.2020.9289381
  13. [13] T. Park, A. A. Efros, R. Zhang, and J. Y. Zhu, “Contrastive learning for unpaired image-to-image translation,” Proc. of the 16th European Conf. on Computer Vision (ECCV 2020), pp. 319-345, 2020. https://doi.org/10.1007/978-3-030-58545-7_19
  14. [14] W.-S. Lai, J.-B. Huang, O. Wang, E. Shechtman, E. Yumer, and M.-H. Yang, “Learning blind video temporal consistency,” Proc. of the 15th European Conf. on Computer Vision (ECCV 2018), pp. 170-185, 2018. https://doi.org/10.1007/978-3-030-01267-0_11
  15. [15] K. Park, S. Woo, D. Kim, D. Cho, and I. S. Kweon, “Preserving semantic and temporal consistency for unpaired video-to-video translation,” Proc. of the 27th ACM Int. Conf. on Multimedia (ACM Multimedia 2019), pp. 1248-1257, 2019. https://doi.org/10.1145/3343031.3350864
  16. [16] K. Wang, K. Akash, and T. Misu, “Learning temporally and semantically consistent unpaired video-to-video translation through pseudo-supervision from synthetic optical flow,” Proc. of the 36th AAAI Conf. on Artificial Intelligence (AAAI 2022), pp. 2477-2486, 2022. https://doi.org/10.1609/aaai.v36i3.20148
  17. [17] M. Chu, Y. Xie, J. Mayer, L. L-Taixé, and N. Thuerey, “Learning temporal coherence via self-supervision for GAN-based video generation,” ACM Trans. on Graphics (TOG), Vol.39, No.4, Article No.75, 2020. https://doi.org/10.1145/3386569.3392457
  18. [18] S. Li, B. Han, Z. Yu, C. H. Liu, K. Chen, and S. Wang, “I2V-GAN: Unpaired infrared-to-visible video translation,” Proc. of the 29th ACM Int. Conf. on Multimedia (ACM Multimedia 2021), pp. 3061-3069, 2021. https://doi.org/10.1145/3474085.3475445
  19. [19] M. Obuchi, T. Emaru, and A. A. Ravankar, “Improved scan matching performance in snowy environments using semantic segmentation,” Proc. of the 2021 IEEE/SICE Int. Symp. on System Integration (SII 2021), pp. 702-703, 2021. https://doi.org/10.1109/IEEECONF49454.2021.9382713
  20. [20] Z. Huang, X. Shi, C. Zhang, Q. Wang, K. C. Cheung, H. Qin, J. Dai, and H. Li, “FlowFormer: A transformer architecture for optical flow,” Proc. of the 17th European Conf. on Computer Vision (ECCV 2022), pp. 668-685, 2022. https://doi.org/10.1007/978-3-031-19790-1_40
  21. [21] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The Cityscapes dataset for semantic urban scene understanding,” Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2016), pp. 3213-3223, 2016. https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.350
  22. [22] S. Niklaus, L. Mai, J. Yang, and F. Liu, “3D Ken Burns effect from a single image,” ACM Trans. on Graphics (TOG), Vol.38, No.6, Article No.184, 2019. https://doi.org/10.1145/3355089.3356528
  23. [23] Y. Kanayama, Y. Kimura, F. Miyazaki, and T. Noguchi, “A stable tracking control method for a non-holonomic mobile robot,” Proc. of the 1991 IEEE/RSJ Int. Workshop on Intelligent Robots and Systems (IROS 1991), pp. 1236-1241, 1991. https://doi.org/10.1109/IROS.1991.174669
  24. [24] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “FlowNet 2.0: Evolution of optical flow estimation with deep networks,” Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2017), pp. 2462-2470, 2017. https://doi.org/10.1109/CVPR.2017.179

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Jul. 04, 2025