Multiple-Timescale PIA for Model-Based Reinforcement Learning

Tomohiro Yamaguchi; Eri Imatani

doi:10.20965/jaciii.2009.p0658

single-jc.php

« previous

JACIII Vol.13 No.6 pp. 658-666

doi: 10.20965/jaciii.2009.p0658

(2009)

Paper:

Views over last 60 days: 626

Multiple-Timescale PIA for Model-Based Reinforcement Learning

Tomohiro Yamaguchi^* and Eri Imatani^**

^*Nara National Collage of Technology, 22 Yata-cho, Yamatokoriyama, Nara 639-1080, Japan

^**Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan

Received:

April 14, 2009

Accepted:

July 31, 2009

Published:

November 20, 2009

Keywords:

multiagent reinforcement learning, multiple timescales, PIA, stochastic game, matrix game solver

Abstract

This paper discusses dynamic-programming-based multiagent reinforcement learning in the MDP model. To learn cooperative actions among agents, a major difficulty in multiagent reinforcement learning is the problem of simultaneous learning. To solve this problem, each agent should learn in different time. We propose multiple-timescale reinforcement learning improving their learning results exclusively. We conducted comparative experiments between multiple-timescale and exclusive policy improvement, reducing optimal common-payoff Nash solution search cost.

Cite this article as:

T. Yamaguchi and E. Imatani, “Multiple-Timescale PIA for Model-Based Reinforcement Learning,” J. Adv. Comput. Intell. Intell. Inform., Vol.13 No.6, pp. 658-666, 2009.

Data files:

References

[1] M. Bowling and M. M. Veloso, “An analysis of stochastic game theory for multiagent reinforcement learning,” Technical report CMU-CS-00-165, Carnegie Mellon University, 2000.
[2] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
[3] A .G. Barto, S. J. Bradtke, and S. P. Singh, “Learning to act using real-time dynamic programming,” Artificial intelligence Vol.72, pp. 81-138, Elsevier, 1995.
[4] M. L. Puterman, “Markov Decision Processes:Discrete Stochastic Dynamic Programming,” JOHN WILEY & SONS, INC, pp. 385-388, 1994.
[5] Y. Shoham, R. Powers, and T. Grenager, “Multi-agent reinforcement learning:a critical survey,” Technical report, Stanford University, 2003.
[6] J. Hu and M. P. Wellman, “Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm,” Proc. 15th Int. Conf. on Machine Learning, pp. 242-250, 1998.
[7] M. Littman, “Markov games as a framework for multi-agent reinforcement learning,” Proc. of 11th Inter. Conf. on Machine Learning, pp. 157-163, 1994.
[8] D. Leslie, “Multiple timescales for multiagent learning,” NIPS2002, workshop on Multi-Agent Learning: Theory and Practice, 2002.
[9] M. Bowling and M. M. Veloso, “Multiagent learning using a variable learning rate,” Artificial Intelligence Vol.136, pp. 215-250, 2002.
[10] F. Kaplan, P-Y. Oudeyer, E. Kubinyi, and A. Miklosi, “Robotic clicker training,” Robotics and Autonomous Systems, Vol.38, 3-4, pp. 197-206, 2002.
[11] K. Satoh and T. Yamaguchi, “Preparing various policies for interactive reinforcement learning,” Proc. of the SICE-ICASE Int. Joint Conf. 2006.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] M. Bowling and M. M. Veloso, “An analysis of stochastic game theory for multiagent reinforcement learning,” Technical report CMU-CS-00-165, Carnegie Mellon University, 2000.

[2] [2] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.

[3] [3] A .G. Barto, S. J. Bradtke, and S. P. Singh, “Learning to act using real-time dynamic programming,” Artificial intelligence Vol.72, pp. 81-138, Elsevier, 1995.

[4] [4] M. L. Puterman, “Markov Decision Processes:Discrete Stochastic Dynamic Programming,” JOHN WILEY & SONS, INC, pp. 385-388, 1994.

[5] [5] Y. Shoham, R. Powers, and T. Grenager, “Multi-agent reinforcement learning:a critical survey,” Technical report, Stanford University, 2003.

[6] [6] J. Hu and M. P. Wellman, “Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm,” Proc. 15th Int. Conf. on Machine Learning, pp. 242-250, 1998.

[7] [7] M. Littman, “Markov games as a framework for multi-agent reinforcement learning,” Proc. of 11th Inter. Conf. on Machine Learning, pp. 157-163, 1994.

[8] [8] D. Leslie, “Multiple timescales for multiagent learning,” NIPS2002, workshop on Multi-Agent Learning: Theory and Practice, 2002.

[9] [9] M. Bowling and M. M. Veloso, “Multiagent learning using a variable learning rate,” Artificial Intelligence Vol.136, pp. 215-250, 2002.

[10] [10] F. Kaplan, P-Y. Oudeyer, E. Kubinyi, and A. Miklosi, “Robotic clicker training,” Robotics and Autonomous Systems, Vol.38, 3-4, pp. 197-206, 2002.

[11] [11] K. Satoh and T. Yamaguchi, “Preparing various policies for interactive reinforcement learning,” Proc. of the SICE-ICASE Int. Joint Conf. 2006.

Multiple-Timescale PIA for Model-Based Reinforcement Learning

Tomohiro Yamaguchi* and Eri Imatani**

Tomohiro Yamaguchi^* and Eri Imatani^**