Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents
Yasuyo Hatcho*, Kiyohiko Hattori*, and Keiki Takadama*,**
*The University of Electro-Communications, 1-5-1, Chofugaoka, Chofu, Tokyo 182-8585, Japan
**PRESTO, Japan Science and Technology Agency (JST), 4-1-8 Honcho Kawaguchi, Saitama 332-0012, Japan
This paper focuses on generalization in reinforcement learning from the time horizon viewpoint, exploring the method that generalizes multiple Q-tables in the multiagent reinforcement learning domain. For this purpose, we propose time horizon generalization for reinforcement learning, which consists of (1) Q-table selection method and (2) Q-table merge timing method, enabling agents to (1) select which Q-tables can be generalized from among many Q-tables and (2) determine when the selected Q-tables should be generalized. Intensive simulation on the bargaining game as sequential interaction game have revealed the following implications: (1) both Q-table selection and merging timing methods help replicate the subject experimental results without ad-hoc parameter setting; and (2) such replication succeeds by agents using the proposed methods with smaller numbers of Q-tables.
-  R.S. Sutton and A.G. Bart, “Reinforcement Learning -An Introduction-,” The MIT Press, 1998.
-  S.W. Wilson, “Classifier Fitness Based on Accuracy,” Evolutionary Computation, Vol.3, No.2, pp. 149-175, 1995.
-  A. B. Justin and W. M. Andrew, “Generalization in Reinforcement Learning: Safely Approximating the Value Function,” In Proc. of Neural Information Processings Systems 7, 1995.
-  A. Muthoo, “Bargaining Theory with Applications,” Cambridge University Press, 1999.
-  T. Kawai, Y. Koyama, and K. Takadama, “Modeling Sequential Bargaining Game Agents Towards Human-like Behaviors: Comparing Experimental and Simulation Results,” The First World Congress of the Int. Federation for Systems Research (IFSR'05), pp. 164-166, 2005.
-  C. J. C. H. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, Vol.8, pp. 55-68, 1992.
-  J. H. Holland, and J. Reitman, “Cognitive Systems Based on Adaptive Algorithms,” in D. A. Waterman, and F. Hayes-Roth, (Eds.), Pattern Directed Inference Systems, Academic Press, pp. 313-329, 1978.
-  J. H. Holland, “The Possibilities of General Purpose Learning Algorithms Applied to Parallel Rule-based System,” Escaping Brittleness, Machine Learning, Vol.2, pp. 593-623, 1986.
-  R. Goto and K. Matsuo, “State Generalization Method with Support Vector Machines in Reinforcement Learning,” Trans. of the Institute of Electronics, Information and Communication Engineers. D-I, pp. 897-905, 2003 (in Japanese).
-  A. Rubinstein, “Perfect Equilibrium in a Bargaining Model,” Econometrica, Vol.50, No.1, pp. 97-109, 1982.
-  M. J. Osborne and A. Rubinstein, “A Course in Game Theory,” MIT Press, 1994.
-  K. Takadama, T. Kawai, and T. Koyama, “Micro- and Macro-Level Validation in Agent-Based Simulation: Reproduction of Human-Like Behaviors and Thinking in a Sequential Bargaining Game,” J. of Artificial Societies and Social Simulation (JASSS), Vol.11, No.2, 2008.