Mining Visual Phrases for Visual Robot Localization

Kanji Tanaka; Yuuto Chokushi; Masatoshi Ando

doi:10.20965/jaciii.2016.p0057

single-jc.php

« previous

JACIII Vol.20 No.1 pp. 57-65

doi: 10.20965/jaciii.2016.p0057

(2016)

Paper:

Views over last 60 days: 1,255

Mining Visual Phrases for Visual Robot Localization

Kanji Tanaka, Yuuto Chokushi, and Masatoshi Ando

University of Fukui
2-7-1 Bunkyo, Fukui 910-8507, Japan

Received:

March 27, 2015

Accepted:

November 13, 2015

Online released:

January 19, 2016

Published:

January 20, 2016

Keywords:

long-term visual SLAM, common pattern discovery, mining visual phrases

Abstract

We propose a discriminative and compact scene descriptor for single-view place recognition that facilitates long-term visual SLAM in familiar, semi-dynamic, and partially changing environments. In contrast to popular bag-of-words scene descriptors, which rely on a library of vector quantized visual features, our proposed scene descriptor is based on a library of raw image data (such as an available visual experience, images shared by other colleague robots, and publicly available image data on the Web) and directly mine it to find visual phrases (VPs) that discriminatively and compactly explain an input query/database image. Our mining approach is motivated by recent success achieved in the field of common pattern discovery – specifically mining of common visual patterns among scenes – and requires only a single library of raw images that can be acquired at different times or on different days. Experimental results show that, although our scene descriptor is significantly more compact than conventional descriptors, its recognition performance is relatively high.

Cite this article as:

K. Tanaka, Y. Chokushi, and M. Ando, “Mining Visual Phrases for Visual Robot Localization,” J. Adv. Comput. Intell. Intell. Inform., Vol.20 No.1, pp. 57-65, 2016.

Data files:

References

[1] M. Milford and G. FraserWyeth, “Seqslam: Visual routebased navigation for sunny summer days and stormy winter nights,” ICRA, pp. 1643-1649, 2012.
[2] W. Churchill and P. Newman, “Practice makes perfect? managing and leveraging visual experiences for lifelong navigation,” ICRA, pp. 4525-4532, 2012.
[3] E. Johns and G.-Z. Yang, “Feature co-occurrence maps: Appearance-based localisation throughout the day,” ICRA, pp. 3212-3218, 2013.
[4] M. Milford, “Vision-based place recognition: how low can you go?,” I. J. Robotic Res., Vol.32, No.7, pp. 766-789, 2013.
[5] A. Cunningham, K. M. Wurm, W. Burgard, and F. Dellaert, “Fully distributed scalable smoothing and mapping with robust multi-robot data association,” ICRA, pp. 1093-1100, 2012.
[6] A. S. Huang, M. E. Antone, E. Olson, L. Fletcher, D. Moore, S. J. Teller, and J. J. Leonard, “A high-rate, heterogeneous data set from the darpa urban challenge,” I. J. Robotic Res., Vol.29, No.13, pp. 1595-1601, 2010.
[7] N. Carlevaris-Bianco and R. M. Eustice, “Long-term simultaneous localization and mapping with generic linear constraint node removal,” IROS, pp. 1034-1041, 2013.
[8] J. McDonald, M. Kaess, C. Dario, C. Lerma, J. Neira, and J. J. Leonard, “Real-time 6-dof multi-session visual slam over large-scale environments,” Robot Auton Systems, Vol.61, No.10, pp. 1144-1158, 2013.
[9] Q.-F. Zheng, W.-Q. Wang, and W. Gao, “Effective and efficient object-based image retrieval using visual phrases,” ACM Int. Conf. Multimedia, pp. 77-80, 2006.
[10] J. Yuan, Y. Wu, and M. Yang, “Discovery of collocation patterns: from visual words to visual phrases,” IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1-8, 2007.
[11] M. A. Sadeghi and A. Farhadi, “Recognition using visual phrases,” IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1745-1752, 2011.
[12] M. Cummins and P. Newman, “Highly scalable appearanceonly slam-fab-map 2.0,” Robotics: Science and Systems, 2009.
[13] Y. Jiang, J. Meng, and J. Yuan, “Randomized visual phrases for object search,” IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3100-3107, 2012.
[14] H.-K. Tan and C.-W. Ngo, “Common pattern discovery using earth mover s distance and local flow maximization,” IEEE Int. Conf. Computer Vision (ICCV), pp. 1222-1229, 2005.
[15] M. Cho, Y. M. Shin, and K. M. Lee, “Unsupervised detection and segmentation of identical objects,” IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1617-1624, 2010.
[16] H. Jéegou, F. Perronnin, M. Douze, J. Sáanchez, P. Péerez, and C. Schmid, “Aggregating local image descriptors into compact codes,” IEEE Trans. Pattern Anal. Mach. Intell., Vol.34, No.9, pp. 1704-1716, 2012.
[17] M. A. Fischler and R. A. Elschlager, “The representation and matching of pictorial structures,” IEEE Trans. Computers, C-22(1), pp. 67-92, 1973.
[18] B. Leibe, A. Leonardis, and B. Schiele, “Combined object categorization and segmentation with an implicit shape model,” ECCV workshop on statistical learning in computer vision, pp. 17.32, 2004.
[19] P. Arbelaez, B. Hariharan, C. Gu, S. Gupta, L. D. Bourdev, and J. Malik, “Semantic segmentation using regions and parts,” IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3378.3385, 2012.
[20] Li-Jia Li, Hao Su, Eric P. Xing, and Fei-Fei Li, “Object bank: A high-level image representation for scene classification & semantic feature sparsification,” Conf. Neural Information Processing Systems (NIPS), pp. 1378-1386, 2010.
[21] L. Bo, X. Ren, and D. Fox, “Unsupervised Feature Learning for RGB-D Based Object Recognition,” ISER, June 2012.
[22] S.N. Parizi, J.G. Oberlin, and P.F. Felzenszwalb. Reconfigurable models for scene recognition,” IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2775-2782, 2012.
[23] W. Churchill and P. Newman, “Continually improving large scale long term visual navigation of a vehicle in dynamic urban environments,” 2012 15^th Int. IEEE Conf. on Intelligent Transportation Systems (ITSC), pp. 1371-1376, Sept 2012.
[24] K. Konolige and J. Bowman, “Towards lifelong visual maps,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS) 2009, pp. 1156-1163, 2009.
[25] M. Milford and G. Wyeth, “Persistent navigation and mapping using a biologically inspired slam system,” I. J. Robotic Res., Vol.29, No.9, pp. 1131-1153, 2010.
[26] N. Süunderhauf, P. Neubert, and P. Protzel, “Are we there yet? challenging seqslam on a 3000 km journey across all four seasons,” Workshop on Long-Term Autonomy held in conjunction with the Int. Conf. on Robotics and Automation (ICRA), 2013.
[27] K. Ikeda and K. Tanaka, “Visual robot localization using compact binary landmarks,” ICRA, pp. 4397-4403, 2010.
[28] K. Saeki, K. Tanaka, and T. Ueda, “Lsh-ransac: An incremental scheme for scalable localization,” ICRA, pp. 3523-3530, 2009.
[29] S. Hanada and K. Tanaka, “Part-slam: Unsupervised partbased scene modeling for fast succinct map matching,” IROS, 2013. http://rc.his.u-fukui.ac.jp/PARTSLAM.pdf.
[30] H. Zha, K. Tanaka, and T. Hasegawa, “Detecting changes in a dynamic environment for updating its maps by using a mobile robot,” IROS, pp. 1729-1734, 1997.
[31] K. Tanaka, Y. Kimuro, N. Okada, and E. Kondo, “Global localization with detection of changes in non-stationary environments,” ICRA, pp. 1487-1492, 2004.
[32] K. Tanaka, Y. Chokushi, and M. Ando, “Mining visual phrases for long-term visual slam,” IROS, pp. 136-142, IEEE, 2014.
[33] M. Ando, Y. Chokushi, and K. Tanaka, “Landmark discovery for single-view cross-season localization,” IROS Workshop on Planning, Perception and Navigation for Intelligent Vehicles (PPNIV), 2014.
[34] H. Zhang, “Borf: Loop-closure detection with scale invariant visual features,” ICRA, pp. 3125-3130, 2011.
[35] M. Muja and D. G. Lowe, “Fast approximate nearest neighbors with automatic algorithm configuration,” Int. Conf. Computer Vision Theory and Application, pp. 331-340, INSTICC Press, 2009.
[36] P. Viola and M. Jones, “Robust real-time object detection,” Int. J. Computer Vision, 2001.
[37] M. Ando, Y. Chokushi, K. Tanaka, and K. Yanagihara, “Leveraging image-based prior in cross-season place recognition,” 2015 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 5455-5461, 2015.
[38] O. Chum, J. Matas, and J. Kittler, “Locally optimized ransac,” Pattern Recognition, pp. 236-243, Springer, 2003.
[39] P. Felzenszwalb, D. McAllester, and D. Ramanan, “A discriminatively trained, multiscale, deformable part model,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2008, pp. 1-8, 2008.
[40] T. Tsukamoto and K. Tanaka, “Leveraging image based prior for visual place recognition,” 14^th IAPR Int. Conf. on Machine Vision Applications (MVA 2015), Miraikan, Tokyo, Japan, 18-22 May, 2015, pp. 194-197, 2015.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] M. Milford and G. FraserWyeth, “Seqslam: Visual routebased navigation for sunny summer days and stormy winter nights,” ICRA, pp. 1643-1649, 2012.

[2] [2] W. Churchill and P. Newman, “Practice makes perfect? managing and leveraging visual experiences for lifelong navigation,” ICRA, pp. 4525-4532, 2012.

[3] [3] E. Johns and G.-Z. Yang, “Feature co-occurrence maps: Appearance-based localisation throughout the day,” ICRA, pp. 3212-3218, 2013.

[4] [4] M. Milford, “Vision-based place recognition: how low can you go?,” I. J. Robotic Res., Vol.32, No.7, pp. 766-789, 2013.

[5] [5] A. Cunningham, K. M. Wurm, W. Burgard, and F. Dellaert, “Fully distributed scalable smoothing and mapping with robust multi-robot data association,” ICRA, pp. 1093-1100, 2012.

[6] [6] A. S. Huang, M. E. Antone, E. Olson, L. Fletcher, D. Moore, S. J. Teller, and J. J. Leonard, “A high-rate, heterogeneous data set from the darpa urban challenge,” I. J. Robotic Res., Vol.29, No.13, pp. 1595-1601, 2010.

[7] [7] N. Carlevaris-Bianco and R. M. Eustice, “Long-term simultaneous localization and mapping with generic linear constraint node removal,” IROS, pp. 1034-1041, 2013.

[8] [8] J. McDonald, M. Kaess, C. Dario, C. Lerma, J. Neira, and J. J. Leonard, “Real-time 6-dof multi-session visual slam over large-scale environments,” Robot Auton Systems, Vol.61, No.10, pp. 1144-1158, 2013.

[9] [9] Q.-F. Zheng, W.-Q. Wang, and W. Gao, “Effective and efficient object-based image retrieval using visual phrases,” ACM Int. Conf. Multimedia, pp. 77-80, 2006.

[10] [10] J. Yuan, Y. Wu, and M. Yang, “Discovery of collocation patterns: from visual words to visual phrases,” IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1-8, 2007.

[11] [11] M. A. Sadeghi and A. Farhadi, “Recognition using visual phrases,” IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1745-1752, 2011.

[12] [12] M. Cummins and P. Newman, “Highly scalable appearanceonly slam-fab-map 2.0,” Robotics: Science and Systems, 2009.

[13] [13] Y. Jiang, J. Meng, and J. Yuan, “Randomized visual phrases for object search,” IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3100-3107, 2012.

[14] [14] H.-K. Tan and C.-W. Ngo, “Common pattern discovery using earth mover s distance and local flow maximization,” IEEE Int. Conf. Computer Vision (ICCV), pp. 1222-1229, 2005.

[15] [15] M. Cho, Y. M. Shin, and K. M. Lee, “Unsupervised detection and segmentation of identical objects,” IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1617-1624, 2010.

[16] [16] H. Jéegou, F. Perronnin, M. Douze, J. Sáanchez, P. Péerez, and C. Schmid, “Aggregating local image descriptors into compact codes,” IEEE Trans. Pattern Anal. Mach. Intell., Vol.34, No.9, pp. 1704-1716, 2012.

[17] [17] M. A. Fischler and R. A. Elschlager, “The representation and matching of pictorial structures,” IEEE Trans. Computers, C-22(1), pp. 67-92, 1973.

[18] [18] B. Leibe, A. Leonardis, and B. Schiele, “Combined object categorization and segmentation with an implicit shape model,” ECCV workshop on statistical learning in computer vision, pp. 17.32, 2004.

[19] [19] P. Arbelaez, B. Hariharan, C. Gu, S. Gupta, L. D. Bourdev, and J. Malik, “Semantic segmentation using regions and parts,” IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3378.3385, 2012.

[20] [20] Li-Jia Li, Hao Su, Eric P. Xing, and Fei-Fei Li, “Object bank: A high-level image representation for scene classification & semantic feature sparsification,” Conf. Neural Information Processing Systems (NIPS), pp. 1378-1386, 2010.

[21] [21] L. Bo, X. Ren, and D. Fox, “Unsupervised Feature Learning for RGB-D Based Object Recognition,” ISER, June 2012.

[22] [22] S.N. Parizi, J.G. Oberlin, and P.F. Felzenszwalb. Reconfigurable models for scene recognition,” IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2775-2782, 2012.

[23] [23] W. Churchill and P. Newman, “Continually improving large scale long term visual navigation of a vehicle in dynamic urban environments,” 2012 15^th Int. IEEE Conf. on Intelligent Transportation Systems (ITSC), pp. 1371-1376, Sept 2012.

[24] [24] K. Konolige and J. Bowman, “Towards lifelong visual maps,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS) 2009, pp. 1156-1163, 2009.

[25] [25] M. Milford and G. Wyeth, “Persistent navigation and mapping using a biologically inspired slam system,” I. J. Robotic Res., Vol.29, No.9, pp. 1131-1153, 2010.

[26] [26] N. Süunderhauf, P. Neubert, and P. Protzel, “Are we there yet? challenging seqslam on a 3000 km journey across all four seasons,” Workshop on Long-Term Autonomy held in conjunction with the Int. Conf. on Robotics and Automation (ICRA), 2013.

[27] [27] K. Ikeda and K. Tanaka, “Visual robot localization using compact binary landmarks,” ICRA, pp. 4397-4403, 2010.

[28] [28] K. Saeki, K. Tanaka, and T. Ueda, “Lsh-ransac: An incremental scheme for scalable localization,” ICRA, pp. 3523-3530, 2009.

[29] [29] S. Hanada and K. Tanaka, “Part-slam: Unsupervised partbased scene modeling for fast succinct map matching,” IROS, 2013. http://rc.his.u-fukui.ac.jp/PARTSLAM.pdf.

[30] [30] H. Zha, K. Tanaka, and T. Hasegawa, “Detecting changes in a dynamic environment for updating its maps by using a mobile robot,” IROS, pp. 1729-1734, 1997.

[31] [31] K. Tanaka, Y. Kimuro, N. Okada, and E. Kondo, “Global localization with detection of changes in non-stationary environments,” ICRA, pp. 1487-1492, 2004.

[32] [32] K. Tanaka, Y. Chokushi, and M. Ando, “Mining visual phrases for long-term visual slam,” IROS, pp. 136-142, IEEE, 2014.

[33] [33] M. Ando, Y. Chokushi, and K. Tanaka, “Landmark discovery for single-view cross-season localization,” IROS Workshop on Planning, Perception and Navigation for Intelligent Vehicles (PPNIV), 2014.

[34] [34] H. Zhang, “Borf: Loop-closure detection with scale invariant visual features,” ICRA, pp. 3125-3130, 2011.

[35] [35] M. Muja and D. G. Lowe, “Fast approximate nearest neighbors with automatic algorithm configuration,” Int. Conf. Computer Vision Theory and Application, pp. 331-340, INSTICC Press, 2009.

[36] [36] P. Viola and M. Jones, “Robust real-time object detection,” Int. J. Computer Vision, 2001.

[37] [37] M. Ando, Y. Chokushi, K. Tanaka, and K. Yanagihara, “Leveraging image-based prior in cross-season place recognition,” 2015 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 5455-5461, 2015.

[38] [38] O. Chum, J. Matas, and J. Kittler, “Locally optimized ransac,” Pattern Recognition, pp. 236-243, Springer, 2003.

[39] [39] P. Felzenszwalb, D. McAllester, and D. Ramanan, “A discriminatively trained, multiscale, deformable part model,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2008, pp. 1-8, 2008.

[40] [40] T. Tsukamoto and K. Tanaka, “Leveraging image based prior for visual place recognition,” 14^th IAPR Int. Conf. on Machine Vision Applications (MVA 2015), Miraikan, Tokyo, Japan, 18-22 May, 2015, pp. 194-197, 2015.