single-jc.php

JACIII Vol.25 No.1 pp. 3-12
doi: 10.20965/jaciii.2021.p0003
(2021)

Paper:

Complementary Convolution Residual Networks for Semantic Segmentation in Street Scenes with Deep Gaussian CRF

Yongbo Li*,**, Yuanyuan Ma*,**, Wendi Cai*,**, Zhongzhao Xie*,**, and Tao Zhao*,**

*School of Automation, China University of Geosciences
No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China

**Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems
No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China

Received:
June 5, 2019
Accepted:
July 22, 2020
Published:
January 20, 2021
Keywords:
image semantic segmentation, complementary convolution residual networks, Gaussian conditional random fields
Abstract

To understand surrounding scenes accurately, the semantic segmentation of images is vital in autonomous driving tasks, such as navigation, and route planning. Currently, convolutional neural networks (CNN) are widely employed in semantic segmentation to perform precise prediction in the dense pixel level. A recent trend in network design is the stacking of small convolution kernels. In this work, small convolution kernels (3 × 3) are decomposed into complementary convolution kernels (1 × 3 + 3 × 1, 3 × 1 + 1 × 3), the complementary small convolution kernels perform better in the classification and location tasks of semantic segmentation. Subsequently, a complementary convolution residual network (CCRN) is proposed to improve the speed and accuracy of semantic segmentation. To further locate the edge of objects precisely, A coupled Gaussian conditional random field (G-CRF) is utilized for CCRN post-processing. Proposal approach achieved 81.8% and 73.1% mean Intersection-over-Union (mIoU) on PASCAL VOC-2012 test set and Cityscapes test set, respectively.

Cite this article as:
Yongbo Li, Yuanyuan Ma, Wendi Cai, Zhongzhao Xie, and Tao Zhao, “Complementary Convolution Residual Networks for Semantic Segmentation in Street Scenes with Deep Gaussian CRF,” J. Adv. Comput. Intell. Intell. Inform., Vol.25, No.1, pp. 3-12, 2021.
Data files:
References
  1. [1] L. Ladicky, J. Shi, and M. Pollefeys, “Pulling things out of perspective,” Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 89-96, 2014.
  2. [2] R. Rangkuti, V. Dewanto, Aprinaldi, and W. Jatmiko, “Utilizing Google Images for Training Classifiers in CRF-Based Semantic Segmentation,” J. Adv. Comput. Intell. Intell. Inform., Vol.20, No.3, pp. 455-461, doi: 10.20965/jaciii.2016.p0455, 2016.
  3. [3] S. Gould, J. Rodgers, D. Cohen, G. Elidan, and D. Koller, “Multi-class segmentation with relative location prior,” Int. J. of Computer Vision, Vol.80, No.3, pp. 300-316, 2008.
  4. [4] B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Simultaneous Detection and Segmentation,” Proc. of European Conf. on Computer Vision (ECCV), Lecture Notes in Computer Science, Vol.8695, pp. 297-312, 2014.
  5. [5] P. Arbeláez, B. Hariharan, C. Gu, S. Gupta, L. Bourdev, and J. Malik, “Semantic segmentation using regions and parts,” Proc. of 2012 Conf. on Computer Vision and Pattern Recognition (CVPR), 2012.
  6. [6] Y. Zhu, R. Urtasun, R. Salakhutdinov, and S. Fidler, “segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection,” Proc. of 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4703-4711, 2015.
  7. [7] W. Liu, S. Chen, and L. Wei, “Improving Street Object Detection Using Transfer Learning: From Generic Model to Specific Model,” J. Adv. Comput. Intell. Intell. Inform., Vol.22, No.6, pp. 869-874, doi: 10.20965/jaciii.2018.p0869, 2018.
  8. [8] G. Ghiasi and C. C. Fowlkes, “Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation,” Proc. of European Conf. on Computer Vision (ECCV), Lecture Notes in Computer Science, No.9907, pp. 519-534, 2016.
  9. [9] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” Proc. of 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3431-3440, 2015.
  10. [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, Vol.60, No.2, pp. 84-90, 2012.
  11. [11] K. Simonyan, and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Proc. of Int. Conf. on Learning Representations (ICLR), 2014.
  12. [12] C. Szegedy, W. Liu, Y. Jia et al., “Going Deeper with Convolutions,” Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, 2015.
  13. [13] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proc. of 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.
  14. [14] G. Papandreou, I. Kokkinos, and P.-A. Savalle, “Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection,” Proc. of 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 390-399, 2015.
  15. [15] L.-C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille, “Attention to scale: Scale-aware semantic image segmentation,” Proc. of 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3640-3649, 2016.
  16. [16] I. Kokkinos, “Pushing the boundaries of boundary detection using deep learning,” Proc. of Int. Conf. on Learning Representations (ICLR), 2016.
  17. [17] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” Proc. of European Conf. on Computer Vision (ECCV), Lecture Notes in Computer Science, Vol.8689, pp. 818-833, 2014.
  18. [18] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” Proc. of 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3431-3440, 2015.
  19. [19] J. M. Gonfaus, X. Boix, J. V. Weijer, A. D. Bagdanov, J. Serrat, and J. Gonzàlez, “Harmony potentials for joint classification and segmentation,” Proc. of 2010 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3280-3287, 2010.
  20. [20] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” Proc. of 2006 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2006.
  21. [21] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.37, No.9, pp. 1904-1916, 2015.
  22. [22] H. Noh, S. Hong, and B. Han, “Learning Deconvolution Network for Semantic Segmentation,” Proc. of 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 1520-1528, 2015.
  23. [23] F. Yu and V. Koltun, “Multi-Scale Context Aggregation by Dilated Convolutions,” Proc. of Int. Conf. on Learning Representations (ICLR), 2016.
  24. [24] R. Mottaghi, X. Chen, X. Liu et al., “The role of context for object detection and semantic segmentation in the wild,” Proc. of 2014 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 891-898, 2014.
  25. [25] T.-Y. Lin, M. Maire, S. Belongie et al., “Microsoft COCO: Common Objects in Context,” Proc. of European Conf. on Computer Vision (ECCV), Lecture Notes in Computer Science, Vol.8693, pp. 740-755, 2014.
  26. [26] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes,” Proc. of 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3234-3243, 2018.
  27. [27] A. Gaidon, Q. Wang, Y. Cabon, and E. Vig, “Virtual Worlds as Proxy for Multi-Object Tracking Analysis,” Proc. of 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4340-4349, 2016.
  28. [28] M. Cordts, M. Omran, S. Ramos et al., “The Cityscapes Dataset for Semantic Urban Scene Understanding,” Proc. of 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016.
  29. [29] J. Dai, K. He, and J. Sun, “Convolutional Feature Masking for Joint Object and Stuff Segmentation,” Proc. of 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3-7, 2015.
  30. [30] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning Hierarchical Features for Scene Labeling,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.35, No.8, pp. 1915-1929, 2013.
  31. [31] A. Sharma, O. Tuzel, and D. W. Jacobs, “Deep Hierarchical Parsing for Semantic Segmentation,” Proc. of 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 530-538, 2015.
  32. [32] A. Sharma, O. Tuzel, and M.-Y. Liu, “Recursive context propagation network for semantic scene labeling,” Proc. of the 27th Int. Conf. on Neural Information Processing Systems (NIPS 2014), pp. 2447-2455, 2014.
  33. [33] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs,” 3rd Int. Conf. on Learning Representations (ICLR 2015), 2015.
  34. [34] B. Hariharan, P.A. Arbeláez, R. Girshick, and J. Malik, “Hypercolumns for Object Segmentation and Fine-grained Localization,” Proc. of 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 447-456, 2015.
  35. [35] M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich, “Feedforward Semantic Segmentation with Zoom-out Features,” Proc. of 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3376-3385, 2015.
  36. [36] D. Eigen and R. Fergus, “Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture,” Proc. of 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 2650-2658, 2016.
  37. [37] A. Roy and S. Todorovic, “A Multi-scale CNN for Affordance Segmentation in RGB Images,” Proc. of European Conf. on Computer Vision (ECCV), Lecture Notes in Computer Science, Vol.9908, pp. 186-201, 2016.
  38. [38] X. Bian, S. N. Lim, and N. Zhou, “Multiscale fully convolutional network with application to industrial inspection,” Proc. of 2016 IEEE Winter Conf. on Applications of Computer Vision (WACV), pp. 1-8, 2016.
  39. [39] W. Liu, A. Rabinovich, and A. C. Berg, “ParseNet: Looking Wider to See Better,” arXiv preprint, arXiv:1506.04579, 2015.
  40. [40] P. O. Pinheiro, T.-Y. Lin, R. Collobert, and P. Dollár, “Learning to Refine Object Segments,” Proc. of European Conf. on Computer Vision (ECCV), Lecture Notes in Computer Science, Vol.9905, pp. 75-91, 2016.
  41. [41] S. Bell, P. Upchurch, N. Snavely, and K. Bala, “Material Recognition in the Wild with the Materials in Context Database,” Proc. of 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3479-3487, 2015.
  42. [42] G. Papandreou, L.-C. Chen, K. P. Murphy, and A. L. Yuille, “Weakly and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation,” Proc. of 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 1742-1750, 2015.
  43. [43] X. Qi, J. Shi, S. Liu, R. Liao, and J. Jia, “Semantic Segmentation with Object Clique Potentials,” Proc. of 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 2587-2595, 2015.
  44. [44] P. Krähenb and V. Koltun, “Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials,” Proc. of the 24th Int. Conf. on Neural Information Processing Systems (NIPS 2011), pp. 109-117, 2012.
  45. [45] S. Zheng, S. Jayasumana, B. Romera-Paredes et al., “Conditional Random Fields as Recurrent Neural Networks,” Proc. of 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 1529-1537, 2015.
  46. [46] A. G. Schwing and R. Urtasun, “Fully Connected Deep Structured Networks,” arXiv preprint, arXiv:1503.02351v1, 2015.
  47. [47] G. Lin, C. Shen, A. van den Hengel, and I. Reid, “Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation,” Proc. of 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016.
  48. [48] V. Badrinarayanan, A. Handa, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling,” arXiv preprint, arXiv:1505.07293, 2015.
  49. [49] J. Dai, K. He, and J. Sun, “Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation,” Proc. of 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 1635-1643, 2015.
  50. [50] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.40, No.4, pp. 834-848, 2017.
  51. [51] S. Chandra and I. Kokkinos, “Fast, exact and multi-scale inference for semantic image segmentation with deep Gaussian CRFs,” Proc. of European Conf. on Computer Vision (ECCV), Lecture Notes in Computer Science, Vol.9911, pp. 402-418, 2016.
  52. [52] Z. Liu, X. Li, P. Luo, C.-C. Loy, and X. Tang, “Semantic image segmentation via deep parsing network,” Proc. of 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 1377-1385, 2015.
  53. [53] I. Kreso, D. Causevic, J. Krapac, and S. Segvic, “Convolutional scale invariance for semantic segmentation,” Proc. of German Conf. on Pattern Recognition (GCPR), Lecture Notes in Computer Science, Vol.9796, pp. 64-75, 2016.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Mar. 05, 2021