Paper:
Combination of Two Evolutionary Methods for Mining Association Rules in Large and Dense Databases
Eloy Gonzales*, Karla Taboada*, Shingo Mabu*,
Kaoru Shimada**, and Kotaro Hirasawa*
*Graduate School of Information, Production and Systems, Waseda University
2-7 Hibikino, Wakamatsu-ku, Kitakyushu, Fukuoka 808-0135, Japan
**Information, Production and Systems Research Center, Waseda University
2-7 Hibikino, Wakamatsu-ku, Kitakyushu, Fukuoka 808-0135, Japan
Our strategy consists in the division of a large and dense database into many small databases. These small databases are considered as individuals and form a population. Then the conventional GNP based mining method is applied to extract association rules for each of these individuals. Finally, the population is evolved through several generations using GA with special genetic operators considering the acquired information. Two complementary processing levels are defined: Global Level and Local Level, each with its own independent tasks and processes. In the Global Level mainly GA process is carried out, whereas in the Local Level, conventional GNP based mining method is carried out in parallel and they generate their own local pools of association rules. Several special genetic operations for GA in the Global Level are proposed and the performance of each of them and their combination is shown and compared.
In our simulations, the conventional GNP based mining method and our proposed method are compared using a real world large and dense database with a huge amount of attributes. The results show that extending the conventional GNP based mining method using GA allows to extract association rules from large and dense databases directly and more efficiently than the conventional GNP method.
- [1] K. Shimada, R. Wang, K. Hirasawa, and T. Furuzuki, “Medical Association Rule Mining Using Genetic Network Programming,” IEEJ Trans. EIS, Vol.126, No.7, pp. 849-856, 2006.
- [2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” in Proc. of the 20th VLDB Conf., pp. 487-499, 1994.
- [3] J. S. Park, M. S. Chen, and P. S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” in Proc. of the 1995 ACM SIGMOD Conf., pp. 175-186, 1995.
- [4] S. Brin, R. Motwani, and C. Silverstein, “Beyond market baskets: generalizing association rules to correlations,” in Proc. of ACM SIGMOD, pp. 265-276, 1997.
- [5] C. Z. Janikow, “A knowledge-intensive genetic algorithm for supervised learning,” Machine Learning 13, pp. 189-228, 1993.
- [6] M. Pei, E. D. Goodman, and W. F. Punch, “Pattern discovery from data using genetic algorithms,” in Proc. of the 1st Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD-97), World Scientific, 1997.
- [7] J. Eggermont, A. E. Eiben, and J. I. vanHemert, “A comparison of genetic programming variants for data classification,” in Proc. of the Third Int. Symposium on Intelligent Data Analysis (IDA-99), Amsterdam, The Netherlands, Springer-Verlag, Berlin, 1999.
- [8] C. C. Bojarczuk, H. S. Lopes, and A. A. Freitas, “Genetic programming for knowledge discovery in chest pain diagnosis,” IEEE Trans. on Engineering in Medicine and Biology Magazine, Vol.19, No.4, pp. 38-44, 2000.
- [9] G. Gamberoni, E. Lamma, F. Riguzzi, and C. Scapoli, “Combining Apriori and Bootstrap Techniques for Marker Analysis,” in Proc. of the 2nd Workshop in Data mining in Functional Genomics and Proteomics, pp. 1-10, 2007.
- [10] R. J. Kuo, S. Y. Lin, and C. W. Shih, “Mining association rules through integration of clustering analysis and ant colony system for health insurance database in Taiwan,” Expert Systems with Applications, Vol.33, Issue 3, pp. 794-808, 2007.
- [11] A. Niimi and E. Tazaki, “Extended Genetic Programming Using Apriori Algorithm for Rule Discovery,” Lecture Notes in Computer Science, Springer, Vol.2253, pp. 525-532, 2001.
- [12] C. Zhang and S. Zhang, “Association Rule Mining: models and algorithms,” Springer, 2002.
- [13] S. Mabu, K. Hirasawa, and J. Hu, “A Graph-Based Evolutionary Algorithm: Genetic Network Programming (GNP) and Its Extension Using Reinforcement Learning,” Evolutionary Computation, MIT Press, Vol.15, No.3, pp. 369-398, 2007.
- [14] T. Eguchi, K. Hirasawa, J. Hu, and N. Ota, “A study of Evolutionary Multiagent Models Based on Symbiosis,” IEEE Trans. on System, Man and Cybernetics, B, Vol.36, No.1, pp. 179-193, 2006.
- [15] K. Hirasawa, T. Eguchi, J. Zhou, L. Yu, J. Hu, and S. Markon, “A Double-deck Elevator Group Supervisory Control System using Genetic Network Programming,” IEEE Trans. on System, Man and Cybernetics, Part C, Vol.38, No.4, pp. 535-550, 2008.
- [16] K. Shimada, K. Hirasawa, and T. Furuzuki, “Genetic Network Programming with Acquisition Mechanisms of Association Rules,” Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol.10, No.1, pp. 102-111, 2006.
- [17] A. A. Freitas, “Data Mining and Knowledge discovery with Evolutionary Algorithms,” IEEE Transactions on Evolutionary Computation, Vol.7, No.6, pp. 517-518, 2003.
- [18] Michael Gilleland
http://www.merriampark.com/ld.htm. - [19] http://www.affymetrix.com/support/technical/sample_data/500k_hapmap_genotype_data.affx.
- [20] http://www.hapmap.org/citinghapmap.html
(SNP public database). - [21] A. Luntala, W. Price, M. Diaby, and M. Gravel, “Ensuring population diversity in genetic algorithms: A technical note with application to the cell formation problem,” European journal of operational research, Vol.178, No.2, pp. 634-638, 2007.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.