single-jc.php

JACIII Vol.13 No.6 pp. 667-674
doi: 10.20965/jaciii.2009.p0667
(2009)

Paper:

Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents

Yasuyo Hatcho*, Kiyohiko Hattori*, and Keiki Takadama*,**

*The University of Electro-Communications, 1-5-1, Chofugaoka, Chofu, Tokyo 182-8585, Japan

**PRESTO, Japan Science and Technology Agency (JST), 4-1-8 Honcho Kawaguchi, Saitama 332-0012, Japan

Received:
April 24, 2009
Accepted:
June 19, 2009
Published:
November 20, 2009
Keywords:
generalization, time horizon, sequential interaction, reinforcement learning
Abstract
This paper focuses on generalization in reinforcement learning from the time horizon viewpoint, exploring the method that generalizes multiple Q-tables in the multiagent reinforcement learning domain. For this purpose, we propose time horizon generalization for reinforcement learning, which consists of (1) Q-table selection method and (2) Q-table merge timing method, enabling agents to (1) select which Q-tables can be generalized from among many Q-tables and (2) determine when the selected Q-tables should be generalized. Intensive simulation on the bargaining game as sequential interaction game have revealed the following implications: (1) both Q-table selection and merging timing methods help replicate the subject experimental results without ad-hoc parameter setting; and (2) such replication succeeds by agents using the proposed methods with smaller numbers of Q-tables.
Cite this article as:
Y. Hatcho, K. Hattori, and K. Takadama, “Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents,” J. Adv. Comput. Intell. Intell. Inform., Vol.13 No.6, pp. 667-674, 2009.
Data files:
References
  1. [1] R.S. Sutton and A.G. Bart, “Reinforcement Learning -An Introduction-,” The MIT Press, 1998.
  2. [2] S.W. Wilson, “Classifier Fitness Based on Accuracy,” Evolutionary Computation, Vol.3, No.2, pp. 149-175, 1995.
  3. [3] A. B. Justin and W. M. Andrew, “Generalization in Reinforcement Learning: Safely Approximating the Value Function,” In Proc. of Neural Information Processings Systems 7, 1995.
  4. [4] A. Muthoo, “Bargaining Theory with Applications,” Cambridge University Press, 1999.
  5. [5] T. Kawai, Y. Koyama, and K. Takadama, “Modeling Sequential Bargaining Game Agents Towards Human-like Behaviors: Comparing Experimental and Simulation Results,” The First World Congress of the Int. Federation for Systems Research (IFSR'05), pp. 164-166, 2005.
  6. [6] C. J. C. H. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, Vol.8, pp. 55-68, 1992.
  7. [7] J. H. Holland, and J. Reitman, “Cognitive Systems Based on Adaptive Algorithms,” in D. A. Waterman, and F. Hayes-Roth, (Eds.), Pattern Directed Inference Systems, Academic Press, pp. 313-329, 1978.
  8. [8] J. H. Holland, “The Possibilities of General Purpose Learning Algorithms Applied to Parallel Rule-based System,” Escaping Brittleness, Machine Learning, Vol.2, pp. 593-623, 1986.
  9. [9] R. Goto and K. Matsuo, “State Generalization Method with Support Vector Machines in Reinforcement Learning,” Trans. of the Institute of Electronics, Information and Communication Engineers. D-I, pp. 897-905, 2003 (in Japanese).
  10. [10] A. Rubinstein, “Perfect Equilibrium in a Bargaining Model,” Econometrica, Vol.50, No.1, pp. 97-109, 1982.
  11. [11] M. J. Osborne and A. Rubinstein, “A Course in Game Theory,” MIT Press, 1994.
  12. [12] K. Takadama, T. Kawai, and T. Koyama, “Micro- and Macro-Level Validation in Agent-Based Simulation: Reproduction of Human-Like Behaviors and Thinking in a Sequential Bargaining Game,” J. of Artificial Societies and Social Simulation (JASSS), Vol.11, No.2, 2008.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Apr. 19, 2024