Research Paper:
Motion Control of Mobile Robots in Snowy Environments Using Semantic Segmentation —Temporally Consistent GAN-Based Image-to-Image Translation from Winter to Summer—
Yugo Takagi, Fangzheng Li, Reo Miura, and Yonghoon Ji

Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology
1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan
Corresponding author
In recent years, autonomous mobile robots have been deployed in outdoor environments, including challenging conditions such as snow. In snowy environments, stable motion control is difficult because detecting pavement edges from camera images becomes unreliable due to snow coverage. To address this limitation, we propose a novel framework for autonomous motion control in snowy environments, utilizing semantic segmentation and generative adversarial networks (GANs). In our approach, winter images captured by a camera are transformed into summer-like images using a GAN, enabling automatic detection of snow-covered pavement through semantic segmentation. However, conventional GAN-based image translation has limited accuracy because it does not account for the temporal consistency of time-series images. To overcome this issue, we improve the temporal consistency of GAN-based image translation by incorporating the sequential characteristics of images captured by a monocular camera mounted on a mobile robot. The improved GAN demonstrates high temporal consistency in real-world datasets. Furthermore, we achieve stable motion control in snow-covered environments using a novel scheme that generates optimal subgoals based on pavement coplanarity.
- [1] S. Kobayashi, Y. Sasaki, A. Yorozu, and A. Ohya, “Probabilistic semantic occupancy grid mapping considering the uncertainty of semantic segmentation with IPM,” Proc. of the 2022 IEEE/ASME Int. Conf. on Advanced Intelligent Mechatronics (AIM 2022), pp. 250-255, 2022. https://doi.org/10.1109/AIM52237.2022.9863353
- [2] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2017), pp. 2881-2890, 2017. https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.660
- [3] S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, and H. Hajishirzi, “ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation,” Proc. of the 15th European Conf. on Computer Vision (ECCV 2018), pp. 552-568, 2018. https://doi.org/10.1007/978-3-030-01249-6_34
- [4] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems 34 (NeurIPS 2021), pp. 12077-12090, 2021.
- [5] S. Vachmanus, A. A. Ravankar, T. Emaru, and Y. Kobayasi, “Multi-modal sensor fusion-based semantic segmentation for snow driving scenarios,” IEEE Sensors J., Vol.21, No.15, pp. 16839-16851, 2021. https://doi.org/10.1109/JSEN.2021.3077029
- [6] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” Proc. of the 2017 IEEE Int. Conf. on Computer Vision (ICCV 2017), pp. 2223-2232, 2017. https://doi.org/10.1109/ICCV.2017.244
- [7] E. Jung, N. Yang, and D. Cremers, “Multi-Frame GAN: Image enhancement for stereo visual odometry in low light,” Proc. of the 16th European Conf. on Computer Vision (ECCV 2020), pp. 651-660, 2020.
- [8] Y. Takagi and Y. Ji, “Motion control of mobile robot based on semantic segmentation of GAN-generated fake image for snow-covered environment,” Proc. of the 19th Int. Conf. on Ubiquitous Robots (UR 2022), pp. 611-612, 2022.
- [9] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Proc. of the 28th Advances in Neural Information Processing Systems (NIPS 2014), pp. 2672-2680, 2014.
- [10] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local nash equilibrium,” Proc. of the 31st Advances in Neural Information Processing Systems (NIPS 2017), pp. 6626-6637, 2017.
- [11] P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2017), pp. 5967-5976, 2017. https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.632
- [12] J. Hwang, C. Yu, and Y. Shin, “SAR-to-optica image translation using SSIM and perceptual loss based cycle-consistent GAN,” Proc. of the 11th Int. Conf. on Information and Communication Technology Convergence (ICTC 2020), pp. 191-194, 2020. https://doi.org/10.1109/ICTC49870.2020.9289381
- [13] T. Park, A. A. Efros, R. Zhang, and J. Y. Zhu, “Contrastive learning for unpaired image-to-image translation,” Proc. of the 16th European Conf. on Computer Vision (ECCV 2020), pp. 319-345, 2020. https://doi.org/10.1007/978-3-030-58545-7_19
- [14] W.-S. Lai, J.-B. Huang, O. Wang, E. Shechtman, E. Yumer, and M.-H. Yang, “Learning blind video temporal consistency,” Proc. of the 15th European Conf. on Computer Vision (ECCV 2018), pp. 170-185, 2018. https://doi.org/10.1007/978-3-030-01267-0_11
- [15] K. Park, S. Woo, D. Kim, D. Cho, and I. S. Kweon, “Preserving semantic and temporal consistency for unpaired video-to-video translation,” Proc. of the 27th ACM Int. Conf. on Multimedia (ACM Multimedia 2019), pp. 1248-1257, 2019. https://doi.org/10.1145/3343031.3350864
- [16] K. Wang, K. Akash, and T. Misu, “Learning temporally and semantically consistent unpaired video-to-video translation through pseudo-supervision from synthetic optical flow,” Proc. of the 36th AAAI Conf. on Artificial Intelligence (AAAI 2022), pp. 2477-2486, 2022. https://doi.org/10.1609/aaai.v36i3.20148
- [17] M. Chu, Y. Xie, J. Mayer, L. L-Taixé, and N. Thuerey, “Learning temporal coherence via self-supervision for GAN-based video generation,” ACM Trans. on Graphics (TOG), Vol.39, No.4, Article No.75, 2020. https://doi.org/10.1145/3386569.3392457
- [18] S. Li, B. Han, Z. Yu, C. H. Liu, K. Chen, and S. Wang, “I2V-GAN: Unpaired infrared-to-visible video translation,” Proc. of the 29th ACM Int. Conf. on Multimedia (ACM Multimedia 2021), pp. 3061-3069, 2021. https://doi.org/10.1145/3474085.3475445
- [19] M. Obuchi, T. Emaru, and A. A. Ravankar, “Improved scan matching performance in snowy environments using semantic segmentation,” Proc. of the 2021 IEEE/SICE Int. Symp. on System Integration (SII 2021), pp. 702-703, 2021. https://doi.org/10.1109/IEEECONF49454.2021.9382713
- [20] Z. Huang, X. Shi, C. Zhang, Q. Wang, K. C. Cheung, H. Qin, J. Dai, and H. Li, “FlowFormer: A transformer architecture for optical flow,” Proc. of the 17th European Conf. on Computer Vision (ECCV 2022), pp. 668-685, 2022. https://doi.org/10.1007/978-3-031-19790-1_40
- [21] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The Cityscapes dataset for semantic urban scene understanding,” Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2016), pp. 3213-3223, 2016. https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.350
- [22] S. Niklaus, L. Mai, J. Yang, and F. Liu, “3D Ken Burns effect from a single image,” ACM Trans. on Graphics (TOG), Vol.38, No.6, Article No.184, 2019. https://doi.org/10.1145/3355089.3356528
- [23] Y. Kanayama, Y. Kimura, F. Miyazaki, and T. Noguchi, “A stable tracking control method for a non-holonomic mobile robot,” Proc. of the 1991 IEEE/RSJ Int. Workshop on Intelligent Robots and Systems (IROS 1991), pp. 1236-1241, 1991. https://doi.org/10.1109/IROS.1991.174669
- [24] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “FlowNet 2.0: Evolution of optical flow estimation with deep networks,” Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2017), pp. 2462-2470, 2017. https://doi.org/10.1109/CVPR.2017.179
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.