A Combination of Shuffled Frog-Leaping Algorithm and Genetic Algorithm for Gene Selection
Cheng-San Yang*, Li-Yeh Chuang**, Chao-Hsuan Ke***,
and Cheng-Hong Yang***
*Institute of biomedical engineering, National Cheng Kung University, Tainan, Taiwan 70101
**Department of Chemical Engineering, I-Shou University, Kaohsiung, Taiwan 84001
***Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan 80778
Microarray data referencing to gene expression profiles provides valuable answers to a variety of problems, and contributes to advances in clinical medicine. The application of microarray data to the classification of cancer types has recently assumed increasing importance. The classification of microarray data samples involves feature selection, whose goal is to identify subsets of differentially expressed gene potentially relevant for distinguishing sample classes and classifier design. We propose an efficient evolutionary approach for selecting gene subsets from gene expression data that effectively achieves higher accuracy for classification problems. Our proposal combines a shuffled frog-leaping algorithm (SFLA) and a genetic algorithm (GA), and chooses genes (features) related to classification. The K-nearest neighbor (KNN) with leave-one-out cross validation (LOOCV) is used to evaluate classification accuracy. We apply a novel hybrid approach based on SFLA-GA and KNN classification and compare 11 classification problems from the literature. Experimental results show that classification accuracy obtained using selected features was higher than the accuracy of datasets without feature selection.
-  D. S. V. Wong, F. K. Wong, and G. R. Wood, “A multi-stage approach to clustering and imputation of gene expression profiles,” Bioinformatics, Vol.23, No.8, pp. 998-1005, 2007.
-  E. Hartuv, A. Schmitt, J. Lange, et al., “An algorithm for clustering cDNA fingerprints,” Genomics, Vol.66, pp. 249-256, 2000.
-  A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, “A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis,” Bioinformatics, Vol.21, No.5, pp. 631-643, 2005.
-  J. E. Staunton, D. K. Slonim, H. A. Coller et al., “Chemosensitivity prediction by transcriptional profiling,” PNAS. U.S.A. 98 (19), pp. 10787-10792, 2001.
-  M. L. Raymer, W. F. Punch, E. D Goodman, L. A. Kuhn, and A. K. Jain, “Dimensionality Reduction Using Genetic Algorithms,” IEEE Trans. Evolutionary Computation, Vol.4, No.2, pp. 164-171, 2000.
-  P. M. Narendra and K. Fukunaga, “A Branch and Bound Algorithm for Feature Subset Selection,” IEEE Trans. Computers, Vol.6, No.9, pp. 917-922, 1997.
-  P. Pudil, J. Novovicova, and J. Kittler, “Floating Search Methods in Feature Selection,” Pattern Recognition Letters, Vol.15, pp. 1119-1125, 1994.
-  B. Roberto, “Using mutual information for selecting features in supervised neural net learning,” IEEE Transactions on Neural Networks, Vol.5, No.4, pp. 537-550, 1994.
-  H. Zhang and G. Sun, “Feature selection using tabu search method,” Pattern Recognition, Vol.35, pp. 701-711, 2002.
-  X. Liu, A. Krishnan, and A. Mondry, “An Entropy-based Gene Selection Method for Cancer Classification using Microarray Data,” BMC Bioinformatics, Vol.6:76, 2005.
-  N. Ancona, R. Maglietta, D. D’Addabbo, S. Liuni, and G. Pesole, “Regularized Least Squares Cancer Classifiers from DNA microarray data,” Bioinformatics, Vol.6 (Suppl 4):S2, 2005.
-  R. Díaz-Uriarte and S. A. de Andrés, “Gene selection and classification of microarray data using random forest,” BMC Bioinformatics, Vol.7:3, 2006.
-  D. Berrar, I. Bradbury, and W. Dubitzky, “Instance-based concept learning from multiclass DNA microarray data,” Bioinformatics, Vol.7:73, 2006.
-  E. K. Tang, P. Suganthan, and X. Yao, “Gene selection algorithms for microarray data based on least squares support vector machine,” Bioinformatics, Vol.7:95, 2006.
-  E. Elbeltagi, T. Hegazy, and D. Grierson, “Comparison among five evolutionary-based optimization algorithms,” Advanced Engineering Informatics, Vol.19, No.1, pp. 43-53, 2005.
-  J. Kennedy and R. Eberhart, “Particle swarm optimization,” Proc. of the IEEE Int. Conf. on neural networks (Perth, Australia), Piscataway, NJ: IEEE Service Center, pp. 1942-1948, 1995.
-  S. Y. Liong and M. Atiquzzaman, “Optimal design of water distribution network using shuffled complex evolution,” Journal of The Institution of Engineers, Vol.44, No.1, Singapore, pp. 93-107, 2004.
-  M. M. Eusuff and K. E. Lansey, “Optimization of water distribution network design using the shuffled frog leaping algorithm,” Journal of Water Resources Plan Management, Vol.129, No.3, pp. 210-225, 2003.
-  E. S. Hou, N. Ansari, and H. Ren, “A Genetic Algorithm for multiprocessor Scheduling,” IEEE Transactions on Parallel and Distributed Systems, Vol.5, No.2, pp. 113-120, 1994.
-  H. Vafaie and K. De Jong, “Genetic Algorithms as a Tool for Feature Selection in Machine Learning,” Proc. of the 4th Int. Conf. on Tools with Artifical Intelligence, Arlington, VA, pp. 200-204, 1992.
-  K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, “A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II,” IEEE Trans. Evol. Comput., Vol.6, pp. 182-197, 2002.
-  I. S. Oh, J. S. Lee, and B. R. Moon, “Hybrid Genetic Algorithms for Feature Selection,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.26, No.11, 2004.
-  S. Kim and B. T. Zhang, “Evolutionary learning of web-document structure for information retrieval,” Proc. of the 2001 Congress on Evoluationary Computation,Vol.2, pp. 1253-1260, 2001.
-  W. Pullan, “Adapting the Genetic Algorithm to the Traveling Salesman Problem,” IEEE Congress on Evolutionary Computation, 2003.
-  J. H. Holland, “Adaptation in Nature and Artificial Systems,” MIT Press, 1992.
-  T. M. Cover and P. E. Hart, “Nearest Neighbor Pattern Classification,” In Proc. of the IEEE Transactions Information Theory, pp. 21-27, 1967.
-  E. Fix and J. L. Hodges, “Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties,” Report No.4, USAF School of Aviation Medicine, Randolph Field, Texas, pp. 261-279, 1951.
-  M. Eusuff, K. Lansey, and F. Pasha, “Shuffled frog-leaping algorithm: a memetic meta-heuristic for discrete optimization,” Engineering Optimization, Vol.38, No.2, pp. 129-154, 2006.
-  H. Laanaya, A. Martin, A. Khenchaf, and D. Aboutajdine, “Feature selection using genetic algorithm for sonar images classification with support vector machines,” European Conf. on Propagation and Systems, Brest, France, 2005.
-  C. L. Huang and C. J. Wang, “A GA-based feature selection and parameters optimization for support vector machines,” Expert System with applications, Vol.31, No.2, pp. 231-240, 2006.
-  T. M. Mitchell, “Machine Learning,” McGraw-Hill, New York, USA, 1997.
-  B. V. Dasarathy, “NN concepts and techniques, Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques,” IEEE Computer Society Press, pp. 1-30, 1991.
-  D. F. Specht, “Probabilistic neural network,” Neural Networks, 3, pp. 109-118, 1990.
-  U. Kreßel, “Pairwise classification and support vector machines,” In Advances in Kernel Methods: Support Vector Learning, Cambridge, MA: MIT Press, pp. 255-268, 1999.
-  J. C. Platt, N. Cristianini, and J. Shawe-Taylor, “Large margin DAGS for multiclass classification,” In Advances in Neural Information Processing Systems, Vol.12, MIT Press, pp. 547-553, 2000.
-  J. Weston and C. Watkins, “Support vector machines for multi-class pattern recognition,” In Proc. of the Seventh European Symposium on Artificial Neural Networks (ESANN 99), Bruges, 1999.
-  K. Crammer and Y. Singer, “On the learnability and design of output codes for multiclass problems,” Proc. of the Thirteen Annual Conf. on Computational Learning Theory (COLT 2000), Stanford University, Palo Alto, CA, 2000.
-  C. H. Yeang, S. Ramaswamy, P. Tamayo, et al., “Molecular classification of multiple tumor types,” Bioinformatics, Vol.17: (Suppl.) S316 -S322, 2001.
-  T. Li, C. Zhang, and M. Ogihara, “A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression,” Bioinformatics Vol.20, No.15, pp. 2429-2437, 2004.
-  E. Elbeltagi, T. Hegazy, and D. Grierson, “A modified shuffled frogleaping optimization algorithm: applications to project management,” Structure & Infrastructure Engineering: Maintenance, Management, Life-Cycl, Vol.3, No.1, pp. 53-60, 2007.
-  A. K. Ghosh, “On optimum choice of k in nearest neighbor classification,” Computational Statistics & Data Analysis Vol.50, No.11, pp. 3113-3123, 2006.
-  A. I. Su, J. B. Welsh, L. M. Sapinoso, et al., “Molecular classification of human carcinomas by use of gene expression signatures,” Cancer Res, Vol.61, pp. 7388-7393, 2001.
-  S. Ramaswamy, P. Tamayo, R. Rifkin, et al., “Multiclass cancer diagnosis using tumor gene expression signatures,” PNAS 98 (26), pp. 15149-15154, 2001.
-  S. L. Pomeroy, P. Tamayo, M. Gaasenbeek et al., “Prediction of central nervous system embryonal tumor outcome based on gene expression” Nature 415 6870, pp. 436-442, 2002.
-  C. L. Nutt, D. R. Mani, R. A. Betensky, P. Tamayo, et al., “Gene expression-based classification of malignant gliomas correlates better with survival than histological classification,” Cancer Res. Vol.63, No.7, pp. 1602-1607, 2003.
-  T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, and M. Gaasenbeek, et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science 286, pp. 531-537, 1999.
-  S. A. Armstrong, J. E. Staunton, L. B. Silverman, R. Pieters, M. L. den Boer, M. D. Minden, S. E. Sallan, E. S. Lander, T. R. Golub, and S. J. Korsmeyer, “MLL translocations specify a distinct gene expression profile that distinguishes a unique leukaemia,” Nature Genetics, Vol.30, No.1, pp. 41-47, 2002.
-  A. Bhattacharjee, W. Richards, J. Staunton, C. Li, S. Monti, and P. Vasa et al., “Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses,” PNAS, Vol.98, No.24, pp. 13790-13795, 2001.
-  J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescus, C. Peterson, and P. S. Meltzer, “Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,” Nature Med. Vol.7, pp. 658-659, 2001.
-  D. Singh, P. Febbo, K. Ross, et al., “Gene expression correlates of clinical prostate cancer behavior,” Cancer Cell, Vol.1, pp. 203-209, 2002.
-  M. A. Shipp, K. N. Ross, P. Tamayo, et al., “Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning,” Nature Med, Vol.8 No.1, pp. 68-74, 2002.
-  C. Lee and G. G. Lee, “Information gain and divergence-based feature selection for machine learning-based text categorization,” Information Proc. and management, Vol.42, Issue 1, pp. 155-165, 2006.
-  C. H. Ooi and P. Tan, “Genetic algorithms applied to multi-class prediction for the analysis of gene expression data,” Bioinformatics, Vol.19, pp. 37-44, 2003.
-  L. Davis, “Hybrid genetic algorithms for machine learning,” In IEE Colloquium on Machine Learning, London, Vol.Digest, No.117, 9/1–9/3, 1990.
-  A. H. F. Dias and J. A. de Vasconcelos, “Multiobjective Genetic Algorithms Applied to Solve Optimization Problems,” IEEE Transactions on magnetic, Vol.38, No.2, pp. 1133-1136, 2002.
-  J. H. Holland, “Adaptation in natural and artificial systems,” Ann Arbor, The University of Michigan Press, 1975.
-  K. E. Lee, N. J. Sha, E. R. Dougherty, M. Vannucci, and B. K. Mallick, “Gene selection: a Bayesian variable selection approach,” Bioinformatics, Vol.19, pp. 90-97, 2003.
-  L. Li, C. R. Weinberg, T. A. Darden, and L. G. Pedersen, “Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method,” Bioinformatics, Vol.17, pp. 1131-1142, 2001.
-  M. Sebbana and R. Nockb, “A hybrid filter/wrapper approach of feature selection using information theory,” Pattern Recognition, Vol.35, pp. 835-846, 2002.
-  O. W. Kwon and J. H. Lee, “Text categorization based on k-nearest neighbor approach for web site classification,” Information Processing and Management, Vol.39, No.1, pp. 25-44, 2003.
-  P. Larrañaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armañanzas, G. Santafé, A. Pérez, and V. Robles, “Machine learning in Bioinformatics,” Briefings in Bioinformatics, Vol.7, pp. 86-112, 2006.
-  S. K. Fan, Y. C. Liang, and E. Zahara, “A genetic algorithm and a particle swarm optimizer hybridized with Nelder-Mead simplex search,” Computers and Industrial Engineering, Vol.50, No.4, pp. 401-425, 2006.
-  S. Y. Ho, L. S. Shu, and J. H. Chen, “Intelligent evolutionary algorithms for large parameter optimization problems,” IEEE Transaction on Evolutionary Computation, Vol.8, No.6, pp. 522-541, 2004.
-  L. Talavera, “An evaluation of filter and wrapper methods for feature selection in categorical clustering,” Proc 6th Int Symp on Intelligent Data Analysis (IDA05) Madrid, Vol.3646, pp. 440-451, 2005.
-  C. F. Tsai, C. W. Tsai, C. P. Chen, and F. C. Lin, “A multiplesearching approach to genetic algorithms for solving traveling salesman problem,” Joint Conf. on Information Sciences, pp. 362-366, 2002.
-  W. Shang, H. Huang, H. Zhu, Y. Lin, Y. Qu, and Z. Wang, “A novel feature selection algorithm for text categorization,” Expert Systems with Applications, Vol.33, No.1, pp. 1-5, 2007.