Bag-of-Bounding-Boxes: An Unsupervised    Approach for Object-Level View Image Retrieval

Kanji Tanaka; Masatoshi Ando; Yousuke Inagaki

doi:10.20965/jaciii.2014.p0784

single-jc.php

« previous

JACIII Vol.18 No.5 pp. 784-791

doi: 10.20965/jaciii.2014.p0784

(2014)

Paper:

Views over last 60 days: 942

Bag-of-Bounding-Boxes: An Unsupervised Approach for Object-Level View Image Retrieval

Kanji Tanaka, Masatoshi Ando, and Yousuke Inagaki

University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan

Received:

July 10, 2013

Accepted:

May 7, 2014

Published:

September 20, 2014

Keywords:

mobile robot, view image retrieval, bag-of-words, common pattern discovery

Abstract

We propose a novel bag-of-words (BoW) framework for building and retrieving a compact database of view images for use in robotic localization, mapping, and SLAM applications. Unlike most previous methods, our method does not describe an image based on its many small local features (e.g., bag-of-SIFT-features). Instead, the proposed bag-of-bounding-boxes (BoBB) approach attempts to describe an image based on fewer larger object patterns, which leads to a semantic and compact image descriptor. To make the view retrieval systemmore practical and autonomous, the object pattern discovery process is unsupervised through a common pattern discovery (CPD) between the input and known reference images without requiring the use of a pre-trained object detector. Moreover, our CPD subtask does not rely on good image segmentation techniques and is able to handle scale variations by exploiting the recently developed CPD technique, i.e., a spatial randompartition. Following a traditional bounding-box based object annotation and knowledge transfer, we compactly describe an image in a BoBB form. Using a slightly modified inverted file system, we efficiently index and/or search for the BoBB descriptors. Experiments using the publicly available “Robot-Car” dataset show that the proposed method achieves accurate object-level view image retrieval using significantly compact image descriptors, e.g., 20 words per image.

Cite this article as:

K. Tanaka, M. Ando, and Y. Inagaki, “Bag-of-Bounding-Boxes: An Unsupervised Approach for Object-Level View Image Retrieval,” J. Adv. Comput. Intell. Intell. Inform., Vol.18 No.5, pp. 784-791, 2014.

Data files:

References

[1] M. Cummins and P. Newman, “Highly scalable appearance-only slam – fab-map 2.0,” In Robotics: Science and Systems, 2009.
[2] W. P. Maddern, M. Milford, and G. Wyeth, “Capping computation time and storage requirements for appearance-based localization with cat-slam,” In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pp. 822-827, 2012.
[3] A.Wendel, M. Maurer, G. Graber, T. Pock, and H. Bischof, “Dense reconstruction on-the-fly,” In Proc. Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1450-1457, 2012.
[4] J. Sivic and A. Zisserman, “Video google: Efficient visual search of videos,” In Toward Category-Level Object Recognition, pp. 127-144, 2006.
[5] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2007.
[6] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 257-263, 2003.
[7] D. Nistér and H. Stewénius, “Scalable recognition with a vocabulary tree,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2161-2168, 2006.
[8] M. Ando, Y. Chokushi, Y. Inagaki, S. Hanada, and K. Tanaka, “Object-level view image retrieval via bag-of-bounding-boxes,” In Workshop on Planning, Perception and Navigation for Intelligent Vehicles – PPNIV Workshop, IROS’13, 2013.
[9] H.-K. Tan and C.-W. Ngo, “Common pattern discovery using earth mover’s distance and local flow maximization,” In Proc. Int. Conf. Computer Vision (ICCV), pp. 1222-1229, 2005.
[10] D. S. Hochbaum and V. Singh, “An efficient algorithm for cosegmentation,” In Proc. Int. Conf. Computer Vision (ICCV), pp. 269-276, 2009.
[11] C. H. Lampert, M. B. Blaschko, and T. Hofmann, “Beyond sliding windows: Object localization by efficient subwindow search,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2008.
[12] Y. Jiang, J. Meng, and J. Yuan, “Randomized visual phrases for object search,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3100-3107, 2012.
[13] J. Yuan and Y. Wu, “Spatial random partition for common visual pattern discovery,” In Proc. Int. Conf. Computer Vision (ICCV), pp. 1-8, 2007.
[14] M. Guillaumin and V. Ferrari, “Large-scale knowledge transfer for object localization in imagenet,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3202-3209, 2012.
[15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Lost in quantization: Improving particular object retrieval in large scale image databases,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2008.
[16] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2007.
[17] H. Jegou, M. Douze, and C. Schmid, “Hamming embedding and weak geometric consistency for large scale image search,” In European Conf. on Computer Vision (ECCV), pp. 304-317, 2008.
[18] L.-J. Li, H. Su, E. P. Xing, and F.-F. Li, “Object bank: A high-level image representation for scene classification & semantic feature sparsification,” In Conf. on Neural Information Processing Systems (NIPS), pp. 1378-1386, 2010.
[19] H. Jegou, M. Douze, and C. Schmid, “On the burstiness of visual elements,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1169-1176, 2009.
[20] O. Chum, J. Philbin, M. Isard, and A. Zisserman, “Scalable near identical image and shot detection,” In Proc. the 6th ACMInt. Conf. on Image and video retrieval (CIVR), pp. 549-556, 2007.
[21] O. Chum, J. Philbin, and A. Zisserman, “Near duplicate image detection: min-hash and tf-idf weighting,” In BMVC, 2008.
[22] H. Jégou, M. Douze, and C. Schmid, “Packing bag-of-features,” In Proc. Int. Conf. Computer Vision (ICCV), pp. 2357-2364, 2009.
[23] K. Tanaka and K. Kondo, “Multi-scale bag-of-features for scalable map retrieval,” J. of Advanced Computational Intelligence and Intelligent Informatics, pp. 793-799, 2013.
[24] J. Civera, D. Gálvez-López, L. Riazuelo, J. D. Tardós, and J. M. M. Montiel, “Towards semantic slam using a monocular camera,” In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), pp. 1277-1284, 2011.
[25] M. Bennewitz, C. Stachniss, W. Burgard, and S. Behnke, “Metric localization with scale-invariant visual features using a single perspective camera,” In EUROS, pp. 195-209, 2006.
[26] M. F. Fallon, H. Johannsson, and J. J. Leonard, “Efficient scene simulation for robust Monte Carlo localization using an rgb-d camera,” In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pp. 1663-1670, 2012.
[27] K. Tanaka and E. Kondo, “A scalable algorithm for monte carlo localization using an incremental E²LSH-database of high dimensional features,” In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pp. 2784-2791, 2008.
[28] K. Ikeda and K. Tanaka, “Visual robot localization using compact binary landmarks,” In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pp. 4397-4403, 2010.
[29] T. Nagasaka and K. Tanaka, “An incremental scheme for dictionarybased compressive slam,” In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), pp. 872-879, 2011.
[30] G. Kim, E. P. Xing, F.-F. Li, and T. Kanade, “Distributed cosegmentation via submodular optimization on anisotropic diffusion,” In Proc. Int. Conf. Computer Vision (ICCV), pp. 169-176, 2011.
[31] J. Cech, J. Matas, and M. Perdoch, “Efficient sequential correspondence selection by cosegmentation,” IEEE Trans. Pattern Anal. Mach. Intell.,Vol.32, No.9, pp. 1568-1581, 2010.
[32] Y. Chokushi, K. Tanaka, and M. Ando, “Common landmark discovery in urban scenes,” IAPR Int. Conf. Machine Vision Applications, 2013.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] M. Cummins and P. Newman, “Highly scalable appearance-only slam – fab-map 2.0,” In Robotics: Science and Systems, 2009.

[2] [2] W. P. Maddern, M. Milford, and G. Wyeth, “Capping computation time and storage requirements for appearance-based localization with cat-slam,” In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pp. 822-827, 2012.

[3] [3] A.Wendel, M. Maurer, G. Graber, T. Pock, and H. Bischof, “Dense reconstruction on-the-fly,” In Proc. Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1450-1457, 2012.

[4] [4] J. Sivic and A. Zisserman, “Video google: Efficient visual search of videos,” In Toward Category-Level Object Recognition, pp. 127-144, 2006.

[5] [5] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2007.

[6] [6] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 257-263, 2003.

[7] [7] D. Nistér and H. Stewénius, “Scalable recognition with a vocabulary tree,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2161-2168, 2006.

[8] [8] M. Ando, Y. Chokushi, Y. Inagaki, S. Hanada, and K. Tanaka, “Object-level view image retrieval via bag-of-bounding-boxes,” In Workshop on Planning, Perception and Navigation for Intelligent Vehicles – PPNIV Workshop, IROS’13, 2013.

[9] [9] H.-K. Tan and C.-W. Ngo, “Common pattern discovery using earth mover’s distance and local flow maximization,” In Proc. Int. Conf. Computer Vision (ICCV), pp. 1222-1229, 2005.

[10] [10] D. S. Hochbaum and V. Singh, “An efficient algorithm for cosegmentation,” In Proc. Int. Conf. Computer Vision (ICCV), pp. 269-276, 2009.

[11] [11] C. H. Lampert, M. B. Blaschko, and T. Hofmann, “Beyond sliding windows: Object localization by efficient subwindow search,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2008.

[12] [12] Y. Jiang, J. Meng, and J. Yuan, “Randomized visual phrases for object search,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3100-3107, 2012.

[13] [13] J. Yuan and Y. Wu, “Spatial random partition for common visual pattern discovery,” In Proc. Int. Conf. Computer Vision (ICCV), pp. 1-8, 2007.

[14] [14] M. Guillaumin and V. Ferrari, “Large-scale knowledge transfer for object localization in imagenet,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3202-3209, 2012.

[15] [15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Lost in quantization: Improving particular object retrieval in large scale image databases,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2008.

[16] [16] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2007.

[17] [17] H. Jegou, M. Douze, and C. Schmid, “Hamming embedding and weak geometric consistency for large scale image search,” In European Conf. on Computer Vision (ECCV), pp. 304-317, 2008.

[18] [18] L.-J. Li, H. Su, E. P. Xing, and F.-F. Li, “Object bank: A high-level image representation for scene classification & semantic feature sparsification,” In Conf. on Neural Information Processing Systems (NIPS), pp. 1378-1386, 2010.

[19] [19] H. Jegou, M. Douze, and C. Schmid, “On the burstiness of visual elements,” In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1169-1176, 2009.

[20] [20] O. Chum, J. Philbin, M. Isard, and A. Zisserman, “Scalable near identical image and shot detection,” In Proc. the 6th ACMInt. Conf. on Image and video retrieval (CIVR), pp. 549-556, 2007.

[21] [21] O. Chum, J. Philbin, and A. Zisserman, “Near duplicate image detection: min-hash and tf-idf weighting,” In BMVC, 2008.

[22] [22] H. Jégou, M. Douze, and C. Schmid, “Packing bag-of-features,” In Proc. Int. Conf. Computer Vision (ICCV), pp. 2357-2364, 2009.

[23] [23] K. Tanaka and K. Kondo, “Multi-scale bag-of-features for scalable map retrieval,” J. of Advanced Computational Intelligence and Intelligent Informatics, pp. 793-799, 2013.

[24] [24] J. Civera, D. Gálvez-López, L. Riazuelo, J. D. Tardós, and J. M. M. Montiel, “Towards semantic slam using a monocular camera,” In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), pp. 1277-1284, 2011.

[25] [25] M. Bennewitz, C. Stachniss, W. Burgard, and S. Behnke, “Metric localization with scale-invariant visual features using a single perspective camera,” In EUROS, pp. 195-209, 2006.

[26] [26] M. F. Fallon, H. Johannsson, and J. J. Leonard, “Efficient scene simulation for robust Monte Carlo localization using an rgb-d camera,” In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pp. 1663-1670, 2012.

[27] [27] K. Tanaka and E. Kondo, “A scalable algorithm for monte carlo localization using an incremental E²LSH-database of high dimensional features,” In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pp. 2784-2791, 2008.

[28] [28] K. Ikeda and K. Tanaka, “Visual robot localization using compact binary landmarks,” In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pp. 4397-4403, 2010.

[29] [29] T. Nagasaka and K. Tanaka, “An incremental scheme for dictionarybased compressive slam,” In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), pp. 872-879, 2011.

[30] [30] G. Kim, E. P. Xing, F.-F. Li, and T. Kanade, “Distributed cosegmentation via submodular optimization on anisotropic diffusion,” In Proc. Int. Conf. Computer Vision (ICCV), pp. 169-176, 2011.

[31] [31] J. Cech, J. Matas, and M. Perdoch, “Efficient sequential correspondence selection by cosegmentation,” IEEE Trans. Pattern Anal. Mach. Intell.,Vol.32, No.9, pp. 1568-1581, 2010.

[32] [32] Y. Chokushi, K. Tanaka, and M. Ando, “Common landmark discovery in urban scenes,” IAPR Int. Conf. Machine Vision Applications, 2013.