Multiple-Timescale PIA for Model-Based Reinforcement Learning
Tomohiro Yamaguchi* and Eri Imatani**
*Nara National Collage of Technology, 22 Yata-cho, Yamatokoriyama, Nara 639-1080, Japan
**Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
This paper discusses dynamic-programming-based multiagent reinforcement learning in the MDP model. To learn cooperative actions among agents, a major difficulty in multiagent reinforcement learning is the problem of simultaneous learning. To solve this problem, each agent should learn in different time. We propose multiple-timescale reinforcement learning improving their learning results exclusively. We conducted comparative experiments between multiple-timescale and exclusive policy improvement, reducing optimal common-payoff Nash solution search cost.
-  M. Bowling and M. M. Veloso, “An analysis of stochastic game theory for multiagent reinforcement learning,” Technical report CMU-CS-00-165, Carnegie Mellon University, 2000.
-  R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
-  A .G. Barto, S. J. Bradtke, and S. P. Singh, “Learning to act using real-time dynamic programming,” Artificial intelligence Vol.72, pp. 81-138, Elsevier, 1995.
-  M. L. Puterman, “Markov Decision Processes:Discrete Stochastic Dynamic Programming,” JOHN WILEY & SONS, INC, pp. 385-388, 1994.
-  Y. Shoham, R. Powers, and T. Grenager, “Multi-agent reinforcement learning:a critical survey,” Technical report, Stanford University, 2003.
-  J. Hu and M. P. Wellman, “Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm,” Proc. 15th Int. Conf. on Machine Learning, pp. 242-250, 1998.
-  M. Littman, “Markov games as a framework for multi-agent reinforcement learning,” Proc. of 11th Inter. Conf. on Machine Learning, pp. 157-163, 1994.
-  D. Leslie, “Multiple timescales for multiagent learning,” NIPS2002, workshop on Multi-Agent Learning: Theory and Practice, 2002.
-  M. Bowling and M. M. Veloso, “Multiagent learning using a variable learning rate,” Artificial Intelligence Vol.136, pp. 215-250, 2002.
-  F. Kaplan, P-Y. Oudeyer, E. Kubinyi, and A. Miklosi, “Robotic clicker training,” Robotics and Autonomous Systems, Vol.38, 3-4, pp. 197-206, 2002.
-  K. Satoh and T. Yamaguchi, “Preparing various policies for interactive reinforcement learning,” Proc. of the SICE-ICASE Int. Joint Conf. 2006.