JACIII Vol.20 No.3 pp. 455-461
doi: 10.20965/jaciii.2016.p0455


Utilizing Google Images for Training Classifiers in CRF-Based Semantic Segmentation

Rizki Perdana Rangkuti*, Vektor Dewanto*,**, Aprinaldi*, and Wisnu Jatmiko*

*Faculty of Computer Science, Universitas Indonesia
16424, Depok, Jawa Barat, Indonesia
**Department of Computer Science, Bogor Agricultural University
Bogor, Jawa Barat, Indonesia

June 18, 2015
March 1, 2016
Online released:
May 19, 2016
May 19, 2016
conditional random fields, Google images, saliency filter, classifier learning, semantic segmentation

One promising approach to pixel-wise semantic segmentation is based on conditional random fields (CRFs). CRF-based semantic segmentation requires ground-truth annotations to supervisedly train the classifier that generates unary potentials. However, the number of (public) annotation data for training is limitedly small. We observe that the Internet can provide relevant images for any given keywords. Our idea is to convert keyword-related images to pixel-wise annotated images, then use them as training data. In particular, we rely on saliency filters to identify the salient object (foreground) of a retrieved image, which mostly agrees with the given keyword. We utilize saliency information for back-and-foreground CRF-based semantic segmentation to further obtain pixel-wise ground-truth annotations. Experiment results show that training data from Google images improves both the learning performance and the accuracy of semantic segmentation. This suggests that our proposed method is promising for harvesting substantial training data from the Internet for training the classifier in CRF-based semantic segmentation.

  1. [1] J. Shotton, J. Winn, C. Rother, and A. Criminisi, “TextonBoostfor Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context,” Int. J. Comput. Vision, Vol.81, No.1, pp. 2-23, 2009.
  2. [2] P. Kohli, L. Ladickacutey, and P. H. Torr, “Robust Higher Order Potentials for Enforcing Label Consistency,” Int. J. Comput. Vision, Vol.82, No.3, pp. 302-324, 2009.
  3. [3] P. Kra"henbühl and V. Koltun, “Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials,” Advances in Neural Information Processing Systems, Vol.24, pp. 109-117, Curran Associates, Inc., 2011.
  4. [4] M. Szummer, P. Kohli, and D. Hoiem, “Learning CRFs using Graph Cuts,” European Conf. on Computer Vision, 2008.
  5. [5] T. Joachims, T. Hofmann, Y. Yue, and C. N. Yu, “Predicting Structured Objects with Support Vector Machines,” Communications of the ACM, Research Highlight, Vol.52, No.11, pp. 97-104, 2009.
  6. [6] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results,” [Accessed November 2014].
  7. [7], “Object Class Recognition – Microsoft Research,” 2015.
  8. [8] S. Geman and D. Geman, “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,” IEEE Trans. Pattern Anal. Mach. Intell., Vol.6, No.6, pp. 721-741, 1984.
  9. [9] J. M. Hammersley and P. Clifford, “Markov fields on finite graphs and lattices,” 1971.
  10. [10] F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Saliencyfilters: Contrast based filtering for salient region detection,” CVPR, pp. 733-740, 2012.
  11. [11] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “SLIC Superpixels,” Technical Report 149300, EPFL, 2010.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, IE9,10,11, Opera.

Last updated on Mar. 24, 2017