JACIII Vol.13 No.6 pp. 667-674
doi: 10.20965/jaciii.2009.p0667


Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents

Yasuyo Hatcho*, Kiyohiko Hattori*, and Keiki Takadama*,**

*The University of Electro-Communications, 1-5-1, Chofugaoka, Chofu, Tokyo 182-8585, Japan

**PRESTO, Japan Science and Technology Agency (JST), 4-1-8 Honcho Kawaguchi, Saitama 332-0012, Japan

April 24, 2009
June 19, 2009
November 20, 2009
generalization, time horizon, sequential interaction, reinforcement learning

This paper focuses on generalization in reinforcement learning from the time horizon viewpoint, exploring the method that generalizes multiple Q-tables in the multiagent reinforcement learning domain. For this purpose, we propose time horizon generalization for reinforcement learning, which consists of (1) Q-table selection method and (2) Q-table merge timing method, enabling agents to (1) select which Q-tables can be generalized from among many Q-tables and (2) determine when the selected Q-tables should be generalized. Intensive simulation on the bargaining game as sequential interaction game have revealed the following implications: (1) both Q-table selection and merging timing methods help replicate the subject experimental results without ad-hoc parameter setting; and (2) such replication succeeds by agents using the proposed methods with smaller numbers of Q-tables.

Cite this article as:
Yasuyo Hatcho, Kiyohiko Hattori, and Keiki Takadama, “Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents,” J. Adv. Comput. Intell. Intell. Inform., Vol.13, No.6, pp. 667-674, 2009.
Data files:
  1. [1] R.S. Sutton and A.G. Bart, “Reinforcement Learning -An Introduction-,” The MIT Press, 1998.
  2. [2] S.W. Wilson, “Classifier Fitness Based on Accuracy,” Evolutionary Computation, Vol.3, No.2, pp. 149-175, 1995.
  3. [3] A. B. Justin and W. M. Andrew, “Generalization in Reinforcement Learning: Safely Approximating the Value Function,” In Proc. of Neural Information Processings Systems 7, 1995.
  4. [4] A. Muthoo, “Bargaining Theory with Applications,” Cambridge University Press, 1999.
  5. [5] T. Kawai, Y. Koyama, and K. Takadama, “Modeling Sequential Bargaining Game Agents Towards Human-like Behaviors: Comparing Experimental and Simulation Results,” The First World Congress of the Int. Federation for Systems Research (IFSR'05), pp. 164-166, 2005.
  6. [6] C. J. C. H. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, Vol.8, pp. 55-68, 1992.
  7. [7] J. H. Holland, and J. Reitman, “Cognitive Systems Based on Adaptive Algorithms,” in D. A. Waterman, and F. Hayes-Roth, (Eds.), Pattern Directed Inference Systems, Academic Press, pp. 313-329, 1978.
  8. [8] J. H. Holland, “The Possibilities of General Purpose Learning Algorithms Applied to Parallel Rule-based System,” Escaping Brittleness, Machine Learning, Vol.2, pp. 593-623, 1986.
  9. [9] R. Goto and K. Matsuo, “State Generalization Method with Support Vector Machines in Reinforcement Learning,” Trans. of the Institute of Electronics, Information and Communication Engineers. D-I, pp. 897-905, 2003 (in Japanese).
  10. [10] A. Rubinstein, “Perfect Equilibrium in a Bargaining Model,” Econometrica, Vol.50, No.1, pp. 97-109, 1982.
  11. [11] M. J. Osborne and A. Rubinstein, “A Course in Game Theory,” MIT Press, 1994.
  12. [12] K. Takadama, T. Kawai, and T. Koyama, “Micro- and Macro-Level Validation in Agent-Based Simulation: Reproduction of Human-Like Behaviors and Thinking in a Sequential Bargaining Game,” J. of Artificial Societies and Social Simulation (JASSS), Vol.11, No.2, 2008.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Mar. 05, 2021