A Backward Feature Selection by Creating Compact Neural Network  Using Coherence Learning and Pruning

Md. Monirul Kabir; Md. Shahjahan; Kazuyuki Murase

doi:10.20965/jaciii.2007.p0570

single-jc.php

« previous

JACIII Vol.11 No.6 pp. 570-581

doi: 10.20965/jaciii.2007.p0570

(2007)

Paper:

Views over last 60 days: 557

A Backward Feature Selection by Creating Compact Neural Network Using Coherence Learning and Pruning

Md. Monirul Kabir^*, Md. Shahjahan^**, and Kazuyuki Murase^*,***

^*Department of Human and Artificial Intelligence Systems, Graduate School of Engineering, University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan

^**Department of Electrical and Electronic Engineering, Khulna University of Engineering and Technology, Building no-13E, KUET Campus, Khulna-9203, Bangladesh

^***Research and Education Program for Life Science, University of Fukui

Received:

January 15, 2007

Accepted:

March 19, 2007

Published:

July 20, 2007

Keywords:

feature selection, artificial neural network, classification, pruning

Abstract

In this paper we propose a new backward feature selection method that generates compact classifier of a three-layered feed-forward artificial neural network (ANN). In the algorithm, that is based on the wrapper model, two techniques, coherence and pruning, are integrated together in order to find relevant features with a network of minimal numbers of hidden units and connections. Firstly, a coherence learning and a pruning technique are applied during training for removing unnecessary hidden units from the network. After that, attribute distances are measured by a straightforward computation that is not computationally expensive. An attribute is then removed based on an error-based criterion. The network is retrained after the removal of the attribute. This unnecessary attribute selection process is continued until a stopping criterion is satisfied. We applied this method to several standard benchmark classification problems such as breast cancer, diabetes, glass identification and thyroid problems. Experimental results confirmed that the proposed method generates compact network structures that can select relevant features with good classification accuracies.

Cite this article as:

M. Kabir, M. Shahjahan, and K. Murase, “A Backward Feature Selection by Creating Compact Neural Network Using Coherence Learning and Pruning,” J. Adv. Comput. Intell. Intell. Inform., Vol.11 No.6, pp. 570-581, 2007.

Data files:

References

[1] R. Sateino and H. Liu, “Neural Network Feature Selector,” IEEE Transactions on Neural Networks, Vol.8, 1997.
[2] S. Guan, J. Liu, and Y. Qi, “An Incremental approach to Contribution-based Feature Selection,” Journal of Intelligence Systems, Vol.13, No.1, 2004.
[3] S. Abe, “Modified Backward Feature Selection by CrossValidation,” Proceedings of the European Symposium on Artificial Neural Networks, pp. 163-168, April, 2005.
[4] G. Bontempi, “Structural feature selection for wrapper methods,” Proceedings of the European Symposium on Artificial Neural Networks, pp. 405-410, April, 2005.
[5] R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, Vol.97, pp. 273-324, 1997.
[6] H. Liu and L. Tu, “Toward Integrating Feature Selection Algorithms for Classification and Clustering,” IEEE Transactions on Knowledge and Data Engineering, Vol.17, No.4, pp. 491-502, April, 2005.
[7] T. W. S. Chow and D. Huang, “Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information,” IEEE Tans. Neural Netw., Vol.16, No.1, pp. 213-224, Jan., 2005.
[8] V. Sindhwani, S. Rakshit, D. Deodhare, D. Erdogmus, J. Principe, and P. Niyogi, “Feature selection in MLPs and SVMs based on maximum output information,” IEEE Trans. Neural Netw., Vol.15, No.4, pp. 937-948, Jul., 2004.
[9] N. Kambhatla and T. K. Leen, “Dimension reduction by local principal component analysis,” Neural Compt., Vol.9, No.7, pp. 1493-1516, 1997.
[10] J. T. Kwok and I. W. Tsang, “The pre-image problem in kernel methods,” IEEE Trans, Neural Netw., Vol.15, No.6, pp. 1517-1525, Nov., 2004.
[11] A. D. Back and T. P. Trappenberg, “Selecting inputs for modeling using normalized higher order statistics and independent component analysis,” IEEE Trans. Neural Netw., Vol.12, No.3, pp. 612-617, May, 2001.
[12] M. D. Plumbley and E. Oja, “A nonnegative PCA algorithm for independent component analysis,” IEEE Trans. Neural Netw., Vol.15, No.1, pp. 66-76, Jan., 2004.
[13] K. Z. Mao, “Fast orthogonal forward selection algorithm for feature subset selection,” IEEE Trans. Neural Netw., Vol.13, No.5, pp. 1218-1224, Sep., 2002.
[14] K. Z. Mao, “Orthogonal forward selection and backward elimination algorithms for feature subset selection,” IEEE Trans. Syst., Man, Cybern. B, Cybern., Vol.34, No.1, pp. 629-634, Feb., 2004.
[15] R. Caruana and V. D. Sa, “Benefitting from the variables that variable selection discards,” J. Mach. Learn. Res., Vol.3, pp. 1245-1264, 2003.
[16] K. Dunne, P. Cunningham, and F. Azuaje, “Solutions to Instability Problems with Sequential Wrapper-based Approaches to Feature Selection,” Journal of Machine Learning Research, 2002.
[17] R. Kohavi and D. Sommerfield, “Feature subset selection using the wrapper method: Overfitting and dynamic search space topology,” Proceedings of the First International Conference on Knowledge Discovery and Data Mining KDD, Menlo Park, California, USA, AAAI Press, 1995.
[18] M. Shahjahan and K. Murase, “Neural Network Training Algorithm with Positive Correlation,” IEICE Trans.Inf. & Syst., Vol.E88-D, No.10, pp. 2399-2409, October, 2005.
[19] L. Prechelt, “PROBEN1-A set of neural network benchmark problems and benchmarking rules,” Technical Report 21/94, Faculty of Informatics, University of Karlsruhe, Germany, 1994.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] R. Sateino and H. Liu, “Neural Network Feature Selector,” IEEE Transactions on Neural Networks, Vol.8, 1997.

[2] [2] S. Guan, J. Liu, and Y. Qi, “An Incremental approach to Contribution-based Feature Selection,” Journal of Intelligence Systems, Vol.13, No.1, 2004.

[3] [3] S. Abe, “Modified Backward Feature Selection by CrossValidation,” Proceedings of the European Symposium on Artificial Neural Networks, pp. 163-168, April, 2005.

[4] [4] G. Bontempi, “Structural feature selection for wrapper methods,” Proceedings of the European Symposium on Artificial Neural Networks, pp. 405-410, April, 2005.

[5] [5] R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, Vol.97, pp. 273-324, 1997.

[6] [6] H. Liu and L. Tu, “Toward Integrating Feature Selection Algorithms for Classification and Clustering,” IEEE Transactions on Knowledge and Data Engineering, Vol.17, No.4, pp. 491-502, April, 2005.

[7] [7] T. W. S. Chow and D. Huang, “Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information,” IEEE Tans. Neural Netw., Vol.16, No.1, pp. 213-224, Jan., 2005.

[8] [8] V. Sindhwani, S. Rakshit, D. Deodhare, D. Erdogmus, J. Principe, and P. Niyogi, “Feature selection in MLPs and SVMs based on maximum output information,” IEEE Trans. Neural Netw., Vol.15, No.4, pp. 937-948, Jul., 2004.

[9] [9] N. Kambhatla and T. K. Leen, “Dimension reduction by local principal component analysis,” Neural Compt., Vol.9, No.7, pp. 1493-1516, 1997.

[10] [10] J. T. Kwok and I. W. Tsang, “The pre-image problem in kernel methods,” IEEE Trans, Neural Netw., Vol.15, No.6, pp. 1517-1525, Nov., 2004.

[11] [11] A. D. Back and T. P. Trappenberg, “Selecting inputs for modeling using normalized higher order statistics and independent component analysis,” IEEE Trans. Neural Netw., Vol.12, No.3, pp. 612-617, May, 2001.

[12] [12] M. D. Plumbley and E. Oja, “A nonnegative PCA algorithm for independent component analysis,” IEEE Trans. Neural Netw., Vol.15, No.1, pp. 66-76, Jan., 2004.

[13] [13] K. Z. Mao, “Fast orthogonal forward selection algorithm for feature subset selection,” IEEE Trans. Neural Netw., Vol.13, No.5, pp. 1218-1224, Sep., 2002.

[14] [14] K. Z. Mao, “Orthogonal forward selection and backward elimination algorithms for feature subset selection,” IEEE Trans. Syst., Man, Cybern. B, Cybern., Vol.34, No.1, pp. 629-634, Feb., 2004.

[15] [15] R. Caruana and V. D. Sa, “Benefitting from the variables that variable selection discards,” J. Mach. Learn. Res., Vol.3, pp. 1245-1264, 2003.

[16] [16] K. Dunne, P. Cunningham, and F. Azuaje, “Solutions to Instability Problems with Sequential Wrapper-based Approaches to Feature Selection,” Journal of Machine Learning Research, 2002.

[17] [17] R. Kohavi and D. Sommerfield, “Feature subset selection using the wrapper method: Overfitting and dynamic search space topology,” Proceedings of the First International Conference on Knowledge Discovery and Data Mining KDD, Menlo Park, California, USA, AAAI Press, 1995.

[18] [18] M. Shahjahan and K. Murase, “Neural Network Training Algorithm with Positive Correlation,” IEICE Trans.Inf. & Syst., Vol.E88-D, No.10, pp. 2399-2409, October, 2005.

[19] [19] L. Prechelt, “PROBEN1-A set of neural network benchmark problems and benchmarking rules,” Technical Report 21/94, Faculty of Informatics, University of Karlsruhe, Germany, 1994.

A Backward Feature Selection by Creating Compact Neural Network Using Coherence Learning and Pruning

Md. Monirul Kabir*, Md. Shahjahan**, and Kazuyuki Murase*,***

Md. Monirul Kabir^*, Md. Shahjahan^**, and Kazuyuki Murase^*,***