Paper:

# Improvement of PCA-Based Approximate Nearest Neighbor Search Using Distance Statistics

## Toshiro Ogita^{*}, Hidetomo Ichihashi^{**}, Akira Notsu^{*},

and Katsuhiro Honda^{*}

^{*}Graduate School of Engineering, Osaka Prefecture University, 1-1 Gakuen-cho, Nakaku, Sakai, Osaka 599-8531, Japan

^{**}Department of Economics, Osaka University of Economics and Law, 6-10 Gakuonji, Yao, Osaka 581-8511, Japan

In many computer vision applications, nearest neighbor searching in high-dimensional spaces is often the most time consuming component and we have few algorithms for solving these high-dimensional nearest neighbor search problems that are faster than linear search. Approximately nearest neighbor search algorithms can play an important role in achieving significantly faster running times with relatively small errors. This paper considers the improvement of the PCA-tree nearest neighbor search algorithm [1] by employing nearest neighbor distance statistics. During the preprocessing phase of the PCA-tree nearest neighbor search algorithm, a data set is partitioned into clusters by successive use of Principal Component Analysis (PCA). The search performance is significantly improved if the data points are sorted by leaf node, and the threshold value is updated each time a smaller distance is found. The threshold is updated by the ε-approximate nearest neighbor approach together with the fixed-threshold approach. Performance can be further improved by the annulus bound approach. Moreover, nearest neighbor distance statistics is employed for further improving the efficiency of the search algorithm and the several experimental results are shown for demonstrating how its efficiency is improved.

*J. Adv. Comput. Intell. Intell. Inform.*, Vol.18, No.4, pp. 658-664, 2014.

- [1] R. F. Sproull, “Refinements to nearest-neighbor searching in
*k*-dimensional trees,” Algorithmica, Vol.6, pp. 579-589, 1991. - [2] A. Andoni and P. Indyk, “Near-optimal hashing for approximate nearest neighbor in high dimensions,” Communications of the ACM, Vol.51, No.1, pp. 117-122, 2008.
- [3] P. Indyk and R. Motwani, “Approximate nearest neighbor: Towards removing the curse of dimensionality,” Proc, 30th Symp. on Theory of Computing, pp. 604-613, 1998.
- [4] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Localitysensitive hashing scheme based on p-stable distributions,” Proc. 20th annual Symp. on Computational geometry, pp. 253-262, 2004.
- [5] Y.Weiss, A. Torralba, and R. Fergus, “Spectral Hashing,” Advances in Neural Information Processing Systems, pp. 1753-1760, 2008.
- [6] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu, “An optimal algorithm for approximate nearest neighbor searching in fixed dimensions,” J. of the ACM, Vol.45, No.6, pp. 891-923, Nov. 1998.
- [7] S. M. Omohundro, “Five balltree construction algorithms,” Technical Report, pp. 89-063, 1989.
- [8] J. L. Bentley, “Multidimensional binary search trees used for associative searching,” Communications of the ACM, Vol.18, No.9, pp. 509-517, 1975.
- [9] P. N. Yianilos, “Data structures and algorithms for nearest neighbor search in general metric spaces,” Symp. on Discrete algorithms, pp. 311-321, 1993.
- [10] H. Ichihashi, A. Notsu, K. Honda, T. Katada, and M. Fujiyoshi, “Improvement in the performance of camera based vehicle detector for parking lot,” Proc. of 2010 IEEE Int. Conf. on Fuzzy System, Barcelona, Spain, Juli. 18-23, 2010.
- [11] H. Ichihashi, K. Honda, and A. Notsu, “Comparison of Scaling Behavior Between Fuzzy c-Means Based Classifier with Many Parameters and LibSVM,” Proc. of the IEEE Int. Conf. on Fuzzy Systems, pp. 386-393, 2011.
- [12] H. Ichihashi, K. Honda, and A. Notsu, “Fuzzy c-means based classifier – Comparison with LibSVM in terms of training time for parameter optimization,” Proc. of 12th Int. Symp. on Advanced Intelligent Systems, pp. 390-393, 2011.
- [13] H. Ichihashi, T. Ogita, K. Honda, and A. Notsu, “Improvement by sorting and thresholding in PCA based nearest neighbor search,” Proc. of 2012 IEEE Int. Conf. on Fuzzy Systems, pp. 53-58, 2012.
- [14] H. Ichihashi, T. Ogita, A. Notsu, K. Honda, “PCA-Tree NNS with Two Approximation Methods and Annulus Bound Method,” Proc. 6th Int. Conf. on Soft Computing and Intelligent Systems and 13th Int. Symp. on Advanced Intelligent Systems, pp. 1999-2003, 2012.
- [15] K. Zatloukal, M. H. Johnson, and R. E. Ladner, “Nearest neighbor search for data compression,” DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Volume 59, Data Structures, Near Neighbor Searches, and Methodology: Fifth and Sixth DIMACS Implementation Challenges,” M. H. Goldwasser, D. S. Johnson, and C. C. McGeoch, (Eds.), American Mathematical Society, pp. 69-86, 2002.
- [16] Y. S. Ahmed, “Multiple random projection for fast, approximate nearest neighbor search in high dimensions,” Master Thesis at University of Toronto, 2005.