Exemplar Generalization in Reinforcement Learning: Improving Performance with Fewer Exemplars

Hiroyasu Matsushima; Kiyohiko Hattori; Keiki Takadama

doi:10.20965/jaciii.2009.p0683

single-jc.php

« previous

JACIII Vol.13 No.6 pp. 683-690

doi: 10.20965/jaciii.2009.p0683

(2009)

Paper:

Views over last 60 days: 509

Exemplar Generalization in Reinforcement Learning: Improving Performance with Fewer Exemplars

Hiroyasu Matsushima^, Kiyohiko Hattori^, and Keiki Takadama^*,**

^*The University of Electro-Communications, 1-5-1, Chofugaoka, Chofu, Tokyo 182-8585, Japan

^**PRESTO, Japan Science and Technology Agency (JST), 4-1-8 Honcho Kawaguchi, Saitama 332-0012, Japan

Received:

April 25, 2009

Accepted:

June 19, 2009

Published:

November 20, 2009

Keywords:

reinforcement learning, generalization, exemplar, direct policy search, real value

Abstract

This paper focuses on the generalization of exemplars (i.e., good rules) in the reinforcement learning framework and proposes Exemplar Generalization in Reinforcement Learning (EGRL) that extracts usual exemplars from a lot of exemplars provided as a prior knowledge and generalizes them by deleting unnecessary exemplars (some exemplars overlap) as much as possible. Through intensive simulation of a simple cargo layout problem to validate EGRL effectiveness, the following implications have been revealed: (1) EGRL derives good performance with fewer exemplars than using the efficient numbers of exemplars and randomly selected exemplars and (2) integration of covering, deletion, and subsumption mechanisms in EGRL is critical for improving EGRL performance and generalization.

Cite this article as:

H. Matsushima, K. Hattori, and K. Takadama, “Exemplar Generalization in Reinforcement Learning: Improving Performance with Fewer Exemplars,” J. Adv. Comput. Intell. Intell. Inform., Vol.13 No.6, pp. 683-690, 2009.

Data files:

References

[1] K. Ikeda, “Exemplar-Based Direct Policy Search with Evolutionary Optimization,” The 2005 IEEE Congress on Evolutionary Computation, Vol.3, pp. 2357-2364, 2005.
[2] S. W. Wilson, “Get Real! XCS with Continuous-Valued Inputs,” Learning Classifier Systems From Foundations to Applications, Lecture Note in Computer Science, Vol.1996, pp. 158-176, 2000.
[3] E. Bernado and J. M. Garrell, “Accuracy-Based Learning Classifier Systems: Models, Analysis and Applications to Classification Tasks,” Evolutionary Computation, Vol.11, No.3, pp. 209-238, 2003.
[4] M. V. Butz and S. W. Wilson, “An Algorithmic Description of XCS,” Soft Computing, Vol.6, No.3-4, pp. 144-153, 2002.
[5] J. H. Holland and J. Reitman, “Cognitive Systems Based on Adaptive Algorithms,” in D. A. Waterman and F. Hayes-Roth (Eds.), Pattern Directed Inference Systems, pp. 313-329, Academic Press, 1978.
[6] J. H. Holland, “Escaping Brittleness: The Possibilities of General Purpose Learning Algorithms Applied to Parallel Rule-based System,” Machine Learning, Vol.2, pp. 593-623, 1986.
[7] S. W. Wilson, “ZCS: A Zeroth Level Classifier System,” Evolutionary Computation, Vol.2, No.1, pp. 1-18, 1994.
[8] S. W. Wilson, “Classifier Fitness Based on Accuracy,” Evolutionary Computation, Vol.3, No.2, pp. 149-175, 1995.
[9] S. W. Wilson, “Generalization in the XCS Classifier Systems,” The Third Annual Conf. on Genetic Programming, pp. 665-674, Morgan Kaufmann, 1998.
[10] D. E. Goldberg, “Genetic Algorithms in Search, Optimization, and Machine Learning,” Addison-Wesley, 1989.
[11] A. Miyamae, J. Sakuma, I. Ono, and S. Kobayashi, “Instance-based Policy Learning by Real-coded Genetic Algorithms and Its Application to Control of Nonholonomic Systems,” The J. of The Japanese Society for Artificial Intelligence, Vol.24, No.1, pp. 104-115, 2009 (In Japanese).
[12] T. Kovacs, “Evolving Optimal Populations with XCS Classifier Systems,” Technical Report CSRP-96-17, School of Computer of Science, University of Birmingham, 1996.
[13] C. Stone and L. Bull, “For Real! XCS with Continuous-Valued Inputs,” Evolutionary Computation, Vol.11, No.3, pp. 299-336, 2003.
[14] R. S. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” Advance in Neural Information Processing Systems, Vol.8, pp. 1038-1044, The MIT Press, 1996.
[15] R. S. Sutton and A. Barto, “An Introduction to Reinforcement Learning,” The MIT Press, 1998.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] K. Ikeda, “Exemplar-Based Direct Policy Search with Evolutionary Optimization,” The 2005 IEEE Congress on Evolutionary Computation, Vol.3, pp. 2357-2364, 2005.

[2] [2] S. W. Wilson, “Get Real! XCS with Continuous-Valued Inputs,” Learning Classifier Systems From Foundations to Applications, Lecture Note in Computer Science, Vol.1996, pp. 158-176, 2000.

[3] [3] E. Bernado and J. M. Garrell, “Accuracy-Based Learning Classifier Systems: Models, Analysis and Applications to Classification Tasks,” Evolutionary Computation, Vol.11, No.3, pp. 209-238, 2003.

[4] [4] M. V. Butz and S. W. Wilson, “An Algorithmic Description of XCS,” Soft Computing, Vol.6, No.3-4, pp. 144-153, 2002.

[5] [5] J. H. Holland and J. Reitman, “Cognitive Systems Based on Adaptive Algorithms,” in D. A. Waterman and F. Hayes-Roth (Eds.), Pattern Directed Inference Systems, pp. 313-329, Academic Press, 1978.

[6] [6] J. H. Holland, “Escaping Brittleness: The Possibilities of General Purpose Learning Algorithms Applied to Parallel Rule-based System,” Machine Learning, Vol.2, pp. 593-623, 1986.

[7] [7] S. W. Wilson, “ZCS: A Zeroth Level Classifier System,” Evolutionary Computation, Vol.2, No.1, pp. 1-18, 1994.

[8] [8] S. W. Wilson, “Classifier Fitness Based on Accuracy,” Evolutionary Computation, Vol.3, No.2, pp. 149-175, 1995.

[9] [9] S. W. Wilson, “Generalization in the XCS Classifier Systems,” The Third Annual Conf. on Genetic Programming, pp. 665-674, Morgan Kaufmann, 1998.

[10] [10] D. E. Goldberg, “Genetic Algorithms in Search, Optimization, and Machine Learning,” Addison-Wesley, 1989.

[11] [11] A. Miyamae, J. Sakuma, I. Ono, and S. Kobayashi, “Instance-based Policy Learning by Real-coded Genetic Algorithms and Its Application to Control of Nonholonomic Systems,” The J. of The Japanese Society for Artificial Intelligence, Vol.24, No.1, pp. 104-115, 2009 (In Japanese).

[12] [12] T. Kovacs, “Evolving Optimal Populations with XCS Classifier Systems,” Technical Report CSRP-96-17, School of Computer of Science, University of Birmingham, 1996.

[13] [13] C. Stone and L. Bull, “For Real! XCS with Continuous-Valued Inputs,” Evolutionary Computation, Vol.11, No.3, pp. 299-336, 2003.

[14] [14] R. S. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” Advance in Neural Information Processing Systems, Vol.8, pp. 1038-1044, The MIT Press, 1996.

[15] [15] R. S. Sutton and A. Barto, “An Introduction to Reinforcement Learning,” The MIT Press, 1998.

Exemplar Generalization in Reinforcement Learning: Improving Performance with Fewer Exemplars

Hiroyasu Matsushima*, Kiyohiko Hattori*, and Keiki Takadama*,**

Hiroyasu Matsushima^, Kiyohiko Hattori^, and Keiki Takadama^*,**