Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents

Yasuyo Hatcho; Kiyohiko Hattori; Keiki Takadama

doi:10.20965/jaciii.2009.p0667

single-jc.php

« previous

JACIII Vol.13 No.6 pp. 667-674

doi: 10.20965/jaciii.2009.p0667

(2009)

Paper:

Views over last 60 days: 607

Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents

Yasuyo Hatcho^, Kiyohiko Hattori^, and Keiki Takadama^*,**

^*The University of Electro-Communications, 1-5-1, Chofugaoka, Chofu, Tokyo 182-8585, Japan

^**PRESTO, Japan Science and Technology Agency (JST), 4-1-8 Honcho Kawaguchi, Saitama 332-0012, Japan

Received:

April 24, 2009

Accepted:

June 19, 2009

Published:

November 20, 2009

Keywords:

generalization, time horizon, sequential interaction, reinforcement learning

Abstract

This paper focuses on generalization in reinforcement learning from the time horizon viewpoint, exploring the method that generalizes multiple Q-tables in the multiagent reinforcement learning domain. For this purpose, we propose time horizon generalization for reinforcement learning, which consists of (1) Q-table selection method and (2) Q-table merge timing method, enabling agents to (1) select which Q-tables can be generalized from among many Q-tables and (2) determine when the selected Q-tables should be generalized. Intensive simulation on the bargaining game as sequential interaction game have revealed the following implications: (1) both Q-table selection and merging timing methods help replicate the subject experimental results without ad-hoc parameter setting; and (2) such replication succeeds by agents using the proposed methods with smaller numbers of Q-tables.

Cite this article as:

Y. Hatcho, K. Hattori, and K. Takadama, “Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents,” J. Adv. Comput. Intell. Intell. Inform., Vol.13 No.6, pp. 667-674, 2009.

Data files:

References

[1] R.S. Sutton and A.G. Bart, “Reinforcement Learning -An Introduction-,” The MIT Press, 1998.
[2] S.W. Wilson, “Classifier Fitness Based on Accuracy,” Evolutionary Computation, Vol.3, No.2, pp. 149-175, 1995.
[3] A. B. Justin and W. M. Andrew, “Generalization in Reinforcement Learning: Safely Approximating the Value Function,” In Proc. of Neural Information Processings Systems 7, 1995.
[4] A. Muthoo, “Bargaining Theory with Applications,” Cambridge University Press, 1999.
[5] T. Kawai, Y. Koyama, and K. Takadama, “Modeling Sequential Bargaining Game Agents Towards Human-like Behaviors: Comparing Experimental and Simulation Results,” The First World Congress of the Int. Federation for Systems Research (IFSR'05), pp. 164-166, 2005.
[6] C. J. C. H. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, Vol.8, pp. 55-68, 1992.
[7] J. H. Holland, and J. Reitman, “Cognitive Systems Based on Adaptive Algorithms,” in D. A. Waterman, and F. Hayes-Roth, (Eds.), Pattern Directed Inference Systems, Academic Press, pp. 313-329, 1978.
[8] J. H. Holland, “The Possibilities of General Purpose Learning Algorithms Applied to Parallel Rule-based System,” Escaping Brittleness, Machine Learning, Vol.2, pp. 593-623, 1986.
[9] R. Goto and K. Matsuo, “State Generalization Method with Support Vector Machines in Reinforcement Learning,” Trans. of the Institute of Electronics, Information and Communication Engineers. D-I, pp. 897-905, 2003 (in Japanese).
[10] A. Rubinstein, “Perfect Equilibrium in a Bargaining Model,” Econometrica, Vol.50, No.1, pp. 97-109, 1982.
[11] M. J. Osborne and A. Rubinstein, “A Course in Game Theory,” MIT Press, 1994.
[12] K. Takadama, T. Kawai, and T. Koyama, “Micro- and Macro-Level Validation in Agent-Based Simulation: Reproduction of Human-Like Behaviors and Thinking in a Sequential Bargaining Game,” J. of Artificial Societies and Social Simulation (JASSS), Vol.11, No.2, 2008.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] R.S. Sutton and A.G. Bart, “Reinforcement Learning -An Introduction-,” The MIT Press, 1998.

[2] [2] S.W. Wilson, “Classifier Fitness Based on Accuracy,” Evolutionary Computation, Vol.3, No.2, pp. 149-175, 1995.

[3] [3] A. B. Justin and W. M. Andrew, “Generalization in Reinforcement Learning: Safely Approximating the Value Function,” In Proc. of Neural Information Processings Systems 7, 1995.

[4] [4] A. Muthoo, “Bargaining Theory with Applications,” Cambridge University Press, 1999.

[5] [5] T. Kawai, Y. Koyama, and K. Takadama, “Modeling Sequential Bargaining Game Agents Towards Human-like Behaviors: Comparing Experimental and Simulation Results,” The First World Congress of the Int. Federation for Systems Research (IFSR'05), pp. 164-166, 2005.

[6] [6] C. J. C. H. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, Vol.8, pp. 55-68, 1992.

[7] [7] J. H. Holland, and J. Reitman, “Cognitive Systems Based on Adaptive Algorithms,” in D. A. Waterman, and F. Hayes-Roth, (Eds.), Pattern Directed Inference Systems, Academic Press, pp. 313-329, 1978.

[8] [8] J. H. Holland, “The Possibilities of General Purpose Learning Algorithms Applied to Parallel Rule-based System,” Escaping Brittleness, Machine Learning, Vol.2, pp. 593-623, 1986.

[9] [9] R. Goto and K. Matsuo, “State Generalization Method with Support Vector Machines in Reinforcement Learning,” Trans. of the Institute of Electronics, Information and Communication Engineers. D-I, pp. 897-905, 2003 (in Japanese).

[10] [10] A. Rubinstein, “Perfect Equilibrium in a Bargaining Model,” Econometrica, Vol.50, No.1, pp. 97-109, 1982.

[11] [11] M. J. Osborne and A. Rubinstein, “A Course in Game Theory,” MIT Press, 1994.

[12] [12] K. Takadama, T. Kawai, and T. Koyama, “Micro- and Macro-Level Validation in Agent-Based Simulation: Reproduction of Human-Like Behaviors and Thinking in a Sequential Bargaining Game,” J. of Artificial Societies and Social Simulation (JASSS), Vol.11, No.2, 2008.

Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents

Yasuyo Hatcho*, Kiyohiko Hattori*, and Keiki Takadama*,**

Yasuyo Hatcho^, Kiyohiko Hattori^, and Keiki Takadama^*,**