Relative Relaxation and Weighted Information Loss to Simplify and Stabilize Feature Detection

Ryotaro Kamimura

doi:10.20965/jaciii.2009.p0489

single-jc.php

« previous

JACIII Vol.13 No.4 pp. 489-498

doi: 10.20965/jaciii.2009.p0489

(2009)

Paper:

Views over last 60 days: 748

Relative Relaxation and Weighted Information Loss to Simplify and Stabilize Feature Detection

Ryotaro Kamimura

IT Education Center, Tokai University, Kanagawa, Japan

Received:

November 27, 2008

Accepted:

March 24, 2009

Published:

July 20, 2009

Keywords:

mutual information, information loss, weighted information loss, relative relaxation, competitive learning

Abstract

In this paper, we propose new information-theoretic methods to stabilize feature detection. We have introduced information-theoretic methods to realize competitive learning. It turned out that mutual information maximization corresponds to a process of competition among neurons. This means that mutual information can be effective in describing competitive processes. Thus, by using this mutual information, we have introduced information loss to interpret internal representations. By relaxing competitive units by some components such as units and connection weights, a neural network’s information is decreased. If the information loss is sufficiently large, the components play important roles. However, with the information loss, there have been some problems, such as the instability of final representations. This means that final outputs are significantly dependent upon chosen parameters. To stabilize final representations, we introduce two computational methods, that is, relative relaxation and weighted information loss. The relative relaxation is introduced because mutual information is dependent upon the Gaussian width. Thus, we can relax competitive units or softly delete some components, relative only to a predetermined base state. In addition, we introduce weighted information loss to take into account information on related components. We applied the methods to the well-known Iris problem and a problem regarding the extinction of animals and plants. In the Iris problem, experimental results confirmed that final representations were significantly stable if we appropriately chose the parameter for the base state. On the other hand, in the extinction problem, weighted information losses showed better performance, where final outputs were significantly more stable than those by the other methods.

Cite this article as:

R. Kamimura, “Relative Relaxation and Weighted Information Loss to Simplify and Stabilize Feature Detection,” J. Adv. Comput. Intell. Intell. Inform., Vol.13 No.4, pp. 489-498, 2009.

Data files:

References

[1] D. E. Rumelhart, G. E. Hinton, and R. Williams, “Learning internal representations by error propagation,” In D. E. Rumelhart and G. E. H. e. al., (Eds.), Parallel Distributed Proc., Vol.1, pp. 318-362, MIT Press, Cambridge, 1986.
[2] T. Kohonen, “The self-organizing maps,” Proc. of the IEEE, Vol.78(9), pp. 1464-1480, 1990.
[3] R. Kamimura, “Information loss to extract distinctive features in competitive learning Learning,” In Proc. of IEEE Conf. on Systems, Man, and Cybernetics, pp. 1217-1222, 2007.
[4] R. Kohavi and G. John, “Wrappers for feature subset selection,” Artificial Intelligence, Vol.97(1-2), pp. 273-324, 1997.
[5] N. Kwak and C. Choi, “Input Feature Selection for Classification Problems,” IEEE Transactions on Neural Networks, Vol.13(1), pp. 143-159, 2002.
[6] I. Guyon and A. Elisseeff, “An Introduction of variable and feature selection,” Journal of Machine Learning Research, Vol 3, pp. 1157-1182, 2003.
[7] R. Linsker, “Self-organization in a perceptual network,” Computer, Vol.21, pp. 105-117, 1988.
[8] R. Linsker, “How to generate ordered maps by maximizing the mutual informationbetween input and output,” Neural Computation, Vol.1, pp. 402-411, 1989.
[9] H. B. Barlow, “Unsupervised learning,” Neural Computation, Vol.1, pp. 295-311, 1989.
[10] H. B. Barlow, T. P. Kaushal, and G. J. Mitchison, “Finding minimum entropy code,” Neural Computation, Vol.1, pp. 412-423, 1989.
[11] R. Linsker, “Local synaptic rules suffice to maximize mutual information in a linear network,” Neural Computation, Vol.4, pp. 691-702, 1992.
[12] K. Torkkola, “Feature Extraction by Non-Parametric Mutual Information Maximization,” Journal of Machine Learning Research, Vol.3, pp. 1415-1438, 2003.
[13] V. Siondwani, S. Rakshit, D. Deodhare, D. Erdogmus, and J. C. Principe, “Feature Selection in MLPs and SVMs Based on Maximum Output Information,” IEEE Transactions on Neural Networks, Vol.15(4), pp. 937-948, 2004.
[14] Z. Nenadic, “Information Discriminant Analysis: Feature Extraction with an Information-Theoretic Objective,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.29(8), pp. 1394-1407, 2007.
[15] J. M. Leiva-Murillo and A. Artes-Rodriguez, “Maximization of mutual information for supervised linear feature extraction,” IEEE Transactions on Neural Networks, Vol.18(5), pp. 1433-1441, 2007.
[16] R. Kamimura, T. Kamimura, and T. R. Shultz, “Information theoretic competitive learning and linguistic rule acquistion,” Transactions of the Japanese Society for Artificial Intelligence, Vol.16(2), pp. 287-298, 2001.
[17] R. Kamimura, “Information-Theoretic Competitive Learning with Inverse Euclidean Distance,” Neural Processing Letters, Vol.18, pp. 163-184, 2003.
[18] R. Kamimura, “Information theoretic competitive learning in self-adaptive multi-layered networks,” Connection Science, Vol.13(4), pp. 323-347, 2003.
[19] R. Kamimura, “An Information-Theoretic Approach to Feature Extraction in Competitive Learning,” Neurocomputing, Vol.72, Issues 10-12, pp. 2693-2704, 2009.
[20] K. N. Gurney, “Information processing in dendrites: II Information theoretic complexity,” Neural Networks, 14.
[21] R. Kamimura, “Feature Detection and Information Loss in Competitive Learning,” In Proc. of SCIS and ISIS, pp. 1144-1148, 2008.
[22] M. C. Mozer and P. Smolensky, “Using relevance to reduce network size automatically,” Connection Science, Vol.1(1), pp. 3-16, 1989.
[23] J. S. D. Y. Le Cun and S. A. Solla, “Optimal Brain Damage,” In Advanced in Neural Information Proc., pp. 598-605, 1990.
[24] E. D. Karnin, “A simple procedure for pruning back-propagation trained networks,” IEEE transactions on neural networks, Vol.1(2), pp. 239-242, 1990.
[25] G. Castellano, A. M. Fanelli, and M. Pelillo, “An iterative pruning algorithm for feedforward neural entworks,” IEEE transactions on neural networks, Vol.8(3), pp. 519-531, 1997.
[26] J. W. Sammon, “A nonlinear mapping for data structure analysis,” IEEE Transactions on Computers, C-18(5), pp. 401-409, 1969.
[27] A. Ultsch and H. P. Siemon, “Kohonen self-organization feature maps for exploratory data analysis,” In Proc. of Int. Neural Network Conf., pp. 305-308, Dordrecht, 1990, Kulwere Academic Publisher.
[28] T. Kohonen, “Self-Organizing Maps,” Springer-Verlag, 1995.
[29] J. Versanto, “SOM-based data visualization methods,” Intelligent Data Analysis, Vol.3, pp. 111-126, 1999.
[30] S. Kaski, J. Nikkila, and T. Kohonen, “Methods for interpreting a self-organized map in data analysis,” In Proc. of European Symposium on Artificial Neural Networks, Bruges, Belgium, 1998.
[31] I. Mao and A. K. Jain, “Artificial neural networks for feature extraction and multivariate data projection,” IEEE Transactions on Neural Networks, Vol.6(2), pp. 296-317, 1995.
[32] T. Kohonen, “Self-Organization and Associative Memory,” Springer-Verlag, New York, 1988.
[33] R. Kamimura, “Conditional information and information loss for flexible feature extraction,” In Proc. of the Int. joint conf. on neural networks (IJCNN2008), pp. 2047-2083, 2008.
[34] A. Ultsch, “U^*-Matrix: a tool to visualize clusters in high dimensional data,” Technical Report 36, Department of Computer Science, University of Marburg, 2003.
[35] L. L. Gatlin, “Information Theory and Living Systems,” Columbia University Press, 1972.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] D. E. Rumelhart, G. E. Hinton, and R. Williams, “Learning internal representations by error propagation,” In D. E. Rumelhart and G. E. H. e. al., (Eds.), Parallel Distributed Proc., Vol.1, pp. 318-362, MIT Press, Cambridge, 1986.

[2] [2] T. Kohonen, “The self-organizing maps,” Proc. of the IEEE, Vol.78(9), pp. 1464-1480, 1990.

[3] [3] R. Kamimura, “Information loss to extract distinctive features in competitive learning Learning,” In Proc. of IEEE Conf. on Systems, Man, and Cybernetics, pp. 1217-1222, 2007.

[4] [4] R. Kohavi and G. John, “Wrappers for feature subset selection,” Artificial Intelligence, Vol.97(1-2), pp. 273-324, 1997.

[5] [5] N. Kwak and C. Choi, “Input Feature Selection for Classification Problems,” IEEE Transactions on Neural Networks, Vol.13(1), pp. 143-159, 2002.

[6] [6] I. Guyon and A. Elisseeff, “An Introduction of variable and feature selection,” Journal of Machine Learning Research, Vol 3, pp. 1157-1182, 2003.

[7] [7] R. Linsker, “Self-organization in a perceptual network,” Computer, Vol.21, pp. 105-117, 1988.

[8] [8] R. Linsker, “How to generate ordered maps by maximizing the mutual informationbetween input and output,” Neural Computation, Vol.1, pp. 402-411, 1989.

[9] [9] H. B. Barlow, “Unsupervised learning,” Neural Computation, Vol.1, pp. 295-311, 1989.

[10] [10] H. B. Barlow, T. P. Kaushal, and G. J. Mitchison, “Finding minimum entropy code,” Neural Computation, Vol.1, pp. 412-423, 1989.

[11] [11] R. Linsker, “Local synaptic rules suffice to maximize mutual information in a linear network,” Neural Computation, Vol.4, pp. 691-702, 1992.

[12] [12] K. Torkkola, “Feature Extraction by Non-Parametric Mutual Information Maximization,” Journal of Machine Learning Research, Vol.3, pp. 1415-1438, 2003.

[13] [13] V. Siondwani, S. Rakshit, D. Deodhare, D. Erdogmus, and J. C. Principe, “Feature Selection in MLPs and SVMs Based on Maximum Output Information,” IEEE Transactions on Neural Networks, Vol.15(4), pp. 937-948, 2004.

[14] [14] Z. Nenadic, “Information Discriminant Analysis: Feature Extraction with an Information-Theoretic Objective,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.29(8), pp. 1394-1407, 2007.

[15] [15] J. M. Leiva-Murillo and A. Artes-Rodriguez, “Maximization of mutual information for supervised linear feature extraction,” IEEE Transactions on Neural Networks, Vol.18(5), pp. 1433-1441, 2007.

[16] [16] R. Kamimura, T. Kamimura, and T. R. Shultz, “Information theoretic competitive learning and linguistic rule acquistion,” Transactions of the Japanese Society for Artificial Intelligence, Vol.16(2), pp. 287-298, 2001.

[17] [17] R. Kamimura, “Information-Theoretic Competitive Learning with Inverse Euclidean Distance,” Neural Processing Letters, Vol.18, pp. 163-184, 2003.

[18] [18] R. Kamimura, “Information theoretic competitive learning in self-adaptive multi-layered networks,” Connection Science, Vol.13(4), pp. 323-347, 2003.

[19] [19] R. Kamimura, “An Information-Theoretic Approach to Feature Extraction in Competitive Learning,” Neurocomputing, Vol.72, Issues 10-12, pp. 2693-2704, 2009.

[20] [20] K. N. Gurney, “Information processing in dendrites: II Information theoretic complexity,” Neural Networks, 14.

[21] [21] R. Kamimura, “Feature Detection and Information Loss in Competitive Learning,” In Proc. of SCIS and ISIS, pp. 1144-1148, 2008.

[22] [22] M. C. Mozer and P. Smolensky, “Using relevance to reduce network size automatically,” Connection Science, Vol.1(1), pp. 3-16, 1989.

[23] [23] J. S. D. Y. Le Cun and S. A. Solla, “Optimal Brain Damage,” In Advanced in Neural Information Proc., pp. 598-605, 1990.

[24] [24] E. D. Karnin, “A simple procedure for pruning back-propagation trained networks,” IEEE transactions on neural networks, Vol.1(2), pp. 239-242, 1990.

[25] [25] G. Castellano, A. M. Fanelli, and M. Pelillo, “An iterative pruning algorithm for feedforward neural entworks,” IEEE transactions on neural networks, Vol.8(3), pp. 519-531, 1997.

[26] [26] J. W. Sammon, “A nonlinear mapping for data structure analysis,” IEEE Transactions on Computers, C-18(5), pp. 401-409, 1969.

[27] [27] A. Ultsch and H. P. Siemon, “Kohonen self-organization feature maps for exploratory data analysis,” In Proc. of Int. Neural Network Conf., pp. 305-308, Dordrecht, 1990, Kulwere Academic Publisher.

[28] [28] T. Kohonen, “Self-Organizing Maps,” Springer-Verlag, 1995.

[29] [29] J. Versanto, “SOM-based data visualization methods,” Intelligent Data Analysis, Vol.3, pp. 111-126, 1999.

[30] [30] S. Kaski, J. Nikkila, and T. Kohonen, “Methods for interpreting a self-organized map in data analysis,” In Proc. of European Symposium on Artificial Neural Networks, Bruges, Belgium, 1998.

[31] [31] I. Mao and A. K. Jain, “Artificial neural networks for feature extraction and multivariate data projection,” IEEE Transactions on Neural Networks, Vol.6(2), pp. 296-317, 1995.

[32] [32] T. Kohonen, “Self-Organization and Associative Memory,” Springer-Verlag, New York, 1988.

[33] [33] R. Kamimura, “Conditional information and information loss for flexible feature extraction,” In Proc. of the Int. joint conf. on neural networks (IJCNN2008), pp. 2047-2083, 2008.

[34] [34] A. Ultsch, “U^*-Matrix: a tool to visualize clusters in high dimensional data,” Technical Report 36, Department of Computer Science, University of Marburg, 2003.

[35] [35] L. L. Gatlin, “Information Theory and Living Systems,” Columbia University Press, 1972.