Paper:

# Knowledge Extraction from a Mixed Transfer Function Artificial Neural Network

## M. Imad Khan, Yakov Frayman, and Saeid Nahavandi

Intelligent Systems Research Group, School of Engineering and Information Technology, Deakin University, Waurn Ponds, Geelong, VIC 3217, Australia

One of the main problems with Artificial Neural Networks (ANNs) is that their results are not intuitively clear. For example, commonly used hidden neurons with sigmoid activation function can approximate any continuous function, including linear functions, but the coefficients (weights) of this approximation are rather meaningless. To address this problem, current paper presents a novel kind of a neural network that uses transfer functions of various complexities in contrast to mono-transfer functions used in sigmoid and hyperbolic tangent networks. The presence of transfer functions of various complexities in a Mixed Transfer Functions Artificial Neural Network (MTFANN) allow easy conversion of the full model into user-friendly equation format (similar to that of linear regression) without any pruning or simplification of the model. At the same time, MTFANN maintains similar generalization ability to mono-transfer function networks in a global optimization context. The performance and knowledge extraction of MTFANN were evaluated on a realistic simulation of the Puma 560 robot arm and compared to sigmoid, hyperbolic tangent, linear and sinusoidal networks.

*J. Adv. Comput. Intell. Intell. Inform.*, Vol.10, No.3, pp. 295-301, 2006.

- [1] K. Hornik, M. Stinchcombe, and H. White, “Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks,” Neural Networks, Vol.3, pp. 551-560, 1990.
- [2] A. S. G. Avila, K. Broda, and D. M. Gabby, “Symbolic knowledge extraction from trained neural networks: A sound approach,” Artificial Intelligence, Vol.125, pp. 155-207, 2001.
- [3] M. W. Craven, and J. W. Shavlik, “Extracting Tree-Structured Representations of Trained Networks,” Advances in Neural Information Processing Systems, Vol.8, MIT Press, Cambridge, MA, 1996.
- [4] R. Andrews, J. Diederich, and A. Tickle, “Survey and critique of techniques for extracting rules from trained artificial neural networks,” Knowledge Based Systems, 8(6), pp. 373-389, 1998.
- [5] G. Bolonga, and C. Pellegrini, “Constraining the MLP Power of Expression to Facilitate Symbolic Rule Extraction,” Proc 1998 IEEE International Joint Conference on Neural Networks, Vol.1, pp. 146-151, 1998.
- [6] R. Setiono, “A penalty function approach for pruning feedforward neural networks,” Neural Computation, 9(1), pp. 185-204, 1997.
- [7] R. Setiono, and A. Azcarraga, “Generating concise sets of linear regression rules from Artificial Neural Networks,” International Journal of Artificial Intelligence Tools, 11(2), pp. 189-202, 2002.
- [8] R. Sentiono, W. K. Leow, and J. M. Zurada, “Extraction of Rules from Artificial Neural Networks for Nonlinear Regression,” IEEE Trans. Neural Networks, 13(3), 2002.
- [9] K. Saito, and R. Nakano, “Extracting regression rules from neural networks,” Neural Networks, Vol.15, pp. 1279-1288, 2002.
- [10] A. B. Tickle, R. Andrews, M. Golea, and J. Diederich, “The truth will come to light: Directions and challenges in extracting the knowledge embedded within trained artificial neural networks,” IEEE Trans. Neural Networks, Vol.9, pp. 1057-1068, 1998.
- [11] G. Towell, and J. Shavlik, “The extraction of refined rules from knowledge based neural networks,” Machine Learning, 13(1), pp. 71-101, 1993.
- [12] T. Kondo, A. S. Pandya, and J. M Zurada, “Logistic GMDH-type Neural Networks and their Application to the Identification of the X-ray Film Characteristic Curve,” Proc. of IEEE International Conference on Systems, Man and Cybernetics, Vol.1, pp. 437-442, 1999.
- [13] S. Oh, and W. Pedrycz, “The design of self-organizing Polynomial Neural Networks”, Information Science, 141, pp. 237-258, 2002.
- [14] http://www.cs.toronto.edu/˜delve/
- [15] http://www-ra.informatik.uni-tuebingen.de/SNNS/
- [16] J. Friedman, “Multivariate adaptive regression splines (with discussion),” Annals of Statistics, 19(1), pp. 1-82, 1991.
- [17] M. I. Jordan, and R. A. Jacobs, “Hierarchical mixtures of experts and the em algorithm,” Neural Computation, 6(2), pp. 181-214, 1994.
- [18] Y. Frayman, B. F. Rolfe, and G. I. Webb, “Solving regression problems using competitive ensemble models,” Lecture Notes in Artificial Intelligence, Vol.2557: Advances in Artificial Intelligence, pp. 511-522, 2002.
- [19] S. E. Fahlman, and C. Lebiere, “The cascade-correlation learning architecture,” in D. S. Touretzky (Ed.), Advances in Neural Information Processing Systems 2, Morgan Kaufmann, San Mateo, CA, pp. 524-532, 1990.