Feature Extraction with Space Folding Model and its Application to Machine Learning
Minh Tuan Pham*, Tomohiro Yoshikawa*, Takeshi Furuhashi*,
and Kanta Tachibana**
*Department of Computational Science and Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
**Department of Information Design, Faculty of Informatics, Kogakuin University, 1-24-2 Nishi-Shinjuku, Tokyo 163-8677, Japan
Feature extraction provides an essential element in most machine learning methods, including supervised learning with neural networks. Linearly inseparable data distributions are often non-linearly transformed in some way to make them more linearly separable in the feature space. In this paper, we propose a method of feature extraction with a space folding model. In the proposed method, each basis vector in the m-dimensional data space is divided in the positive and negative directions to optimize it with 2m m-dimensional vectors as variables. 2m variable vectors are estimated to minimize the cross entropy of class labels and distances so that instances in the same classes are gathered closer together and those in other classes are separated farther apart. The proposed method, in which linear transformation is applied to each quadrant to collectively realize a nonlinear transformation, is expected to lead to improvements in accuracy of discrimination over conventional methods of feature extraction using single linear transformations. In this paper, we have confirmed the effectiveness of the proposed method of feature extraction with a space folding model on a UCI benchmark problem.
-  M. Aizerman, E. Braverman, and L. Rozonoer, “Theoretical foundations of the potential function method in pattern recognition learning,” Automation and Remote Control, Vol.25, 821-837, 1964.
-  D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by error propagation,” In D. E. Rumelhart, J. L. Mc-Clelland, and the PDP Research Group (Eds.), Parallel distributed processing, Cambridge, MA: MIT Press, Vol.1, pp. 318-362, 1986.
-  N. Cristianini, J. Kandola, A. Elisseeff, and J. Shawe-Taylor, “On kernel target alignment,” J. of Machine Learning Research, 2002.
-  C. E. Shannon, “A mathematical theory of communication,” Bell System Technical J., 27, pp. 379-423, 1948.
-  H. Theil, “Economics and Information Theory,” Rand McNally, 1967.
-  A. Asuncion and D. J. Newman, “UCI Machine Learning Repository,” Irvine, CA: University of California, School of Information and Computer Science, 2007.
-  M. T. Pham, T. Yoshikawa, T. Furuhashi, and K. Tachibana, “A Proposal of Space Folding Model for Pattern Recognition Problem and Study of its Learning Algorithm,” 26th Fuzzy System Symposium, pp. 935-940, 2010.
-  J. H. Friedman, “Regularized Discriminant Analysis,” J. of the American Statistical Association, 1989.
-  T. Yamada, K. Saito, and N. Ueda, “Cross-Entropy Directed Embedding of Network Data,” Proc. of the Twentieth Int. Conf. on Machine Learning, pp. 832-839, 2003.
-  M. T. Pham, K. Tachibana, T. Yoshikawa, and T. Furuhashi, “Feature Extraction with Geometric Algebra for Semi-Supervised Learning of Time-Series Spatial Vector,” Int. Workshop on Data-Mining and Statistical Science, 2008.
-  W. S. Torgerson, “Theory and methods of scaling,” New York, Wiley, 1958.
-  A. Buja, D. F. Swayne, M. Littman, N. Dean, and H. Hofmann, “XGvis, Interactive data visualization with multidimensional scaling,” J. of Computational and Graphical Statistics, 2001.