JACIII Vol.13 No.6 pp. 658-666
doi: 10.20965/jaciii.2009.p0658


Multiple-Timescale PIA for Model-Based Reinforcement Learning

Tomohiro Yamaguchi* and Eri Imatani**

*Nara National Collage of Technology, 22 Yata-cho, Yamatokoriyama, Nara 639-1080, Japan

**Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan

April 14, 2009
July 31, 2009
November 20, 2009
multiagent reinforcement learning, multiple timescales, PIA, stochastic game, matrix game solver

This paper discusses dynamic-programming-based multiagent reinforcement learning in the MDP model. To learn cooperative actions among agents, a major difficulty in multiagent reinforcement learning is the problem of simultaneous learning. To solve this problem, each agent should learn in different time. We propose multiple-timescale reinforcement learning improving their learning results exclusively. We conducted comparative experiments between multiple-timescale and exclusive policy improvement, reducing optimal common-payoff Nash solution search cost.

Cite this article as:
Tomohiro Yamaguchi and Eri Imatani, “Multiple-Timescale PIA for Model-Based Reinforcement Learning,” J. Adv. Comput. Intell. Intell. Inform., Vol.13, No.6, pp. 658-666, 2009.
Data files:
  1. [1] M. Bowling and M. M. Veloso, “An analysis of stochastic game theory for multiagent reinforcement learning,” Technical report CMU-CS-00-165, Carnegie Mellon University, 2000.
  2. [2] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
  3. [3] A .G. Barto, S. J. Bradtke, and S. P. Singh, “Learning to act using real-time dynamic programming,” Artificial intelligence Vol.72, pp. 81-138, Elsevier, 1995.
  4. [4] M. L. Puterman, “Markov Decision Processes:Discrete Stochastic Dynamic Programming,” JOHN WILEY & SONS, INC, pp. 385-388, 1994.
  5. [5] Y. Shoham, R. Powers, and T. Grenager, “Multi-agent reinforcement learning:a critical survey,” Technical report, Stanford University, 2003.
  6. [6] J. Hu and M. P. Wellman, “Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm,” Proc. 15th Int. Conf. on Machine Learning, pp. 242-250, 1998.
  7. [7] M. Littman, “Markov games as a framework for multi-agent reinforcement learning,” Proc. of 11th Inter. Conf. on Machine Learning, pp. 157-163, 1994.
  8. [8] D. Leslie, “Multiple timescales for multiagent learning,” NIPS2002, workshop on Multi-Agent Learning: Theory and Practice, 2002.
  9. [9] M. Bowling and M. M. Veloso, “Multiagent learning using a variable learning rate,” Artificial Intelligence Vol.136, pp. 215-250, 2002.
  10. [10] F. Kaplan, P-Y. Oudeyer, E. Kubinyi, and A. Miklosi, “Robotic clicker training,” Robotics and Autonomous Systems, Vol.38, 3-4, pp. 197-206, 2002.
  11. [11] K. Satoh and T. Yamaguchi, “Preparing various policies for interactive reinforcement learning,” Proc. of the SICE-ICASE Int. Joint Conf. 2006.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Mar. 05, 2021