Designing Internal Reward of Reinforcement Learning Agents in Multi-Step Dilemma Problem

Yoshihiro Ichikawa; Keiki Takadama

doi:10.20965/jaciii.2013.p0926

single-jc.php

« previous

JACIII Vol.17 No.6 pp. 926-931

doi: 10.20965/jaciii.2013.p0926

(2013)

Paper:

Views over last 60 days: 703

Designing Internal Reward of Reinforcement Learning Agents in Multi-Step Dilemma Problem

Yoshihiro Ichikawa and Keiki Takadama

The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, Japan

Received:

May 21, 2013

Accepted:

September 26, 2013

Published:

November 20, 2013

Keywords:

multi-agent, reinforcement learning, conflict avoidance, multi-step dilemma problem

Abstract

This paper proposes the reinforcement learning agent that estimates internal rewards using external rewards in order to avoid conflict in multi-step dilemma problem. Intensive simulation results have revealed that the agent succeeds in avoiding local convergence and obtains a behavior policy for reaching a higher reward by updating the Q-value using the value that is subtracted the average reward from an external reward.

Cite this article as:

Y. Ichikawa and K. Takadama, “Designing Internal Reward of Reinforcement Learning Agents in Multi-Step Dilemma Problem,” J. Adv. Comput. Intell. Intell. Inform., Vol.17 No.6, pp. 926-931, 2013.

Data files:

References

[1] C. J. C. H. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, Vol.8, pp. 55-58,1992.
[2] R. S. Sutton and A. G. Bart, “Reinforcement Learning -An Introduction-,” The MIT Press, 1998.
[3] M. Tan, “Multiagent Reinforcement Learning: Independent vs. Cooperative Agent,” The 10th Int. Conf. on Machine Learning, pp. 330-337, 1993.
[4] G.Weiss, “Multiagent Systems: AModern Approach to Distributed Artificial Intelligence,” The MIT Press, 1999.
[5] P. Stone and M. Veloso, “Multiagent Systems: A Survey from a Machine Learning Perspective,” Autonomous Robots, Vol.8, pp. 345-383, 1997.
[6] E. Yang and D. Gu, “Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey,” Technical Report CSM-404, Department of Computer Science, University of Essex, 2004.
[7] Y.-M. D. Hauwere, P. Vrancx and A. Nowé, “Learning multiagent state space representations,” Proc. of the 9th Int. Conf. on Autonomous Agents and Multiagent Systems, Vol.1, pp. 715-722, 2010.
[8] Y. Ichikawa, K. Sato, K. Hattori, and K. Takadama, “Entropy-based Conflict Avoidance According to Learning Progress in Multi-Agent Q-learning,” Proc. of the IADIS Int. Conf. on Intelligent Systems and Agents 2012 (ISA2012), 2012.
[9] M. L. Littman, “Markov Games as a Framework for Multi-Agent Reinforcement Learning,” Proc. of the Eleventh Int. Conf. on Machine Learning, pp. 157-163, 1994.
[10] J. Hu and M. P. Wellman, “Nash Q-Learning for General-Sum Stochastic Games,” J. of Machine Learning Research, Vol.4, pp. 1039-1069, 2003.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] C. J. C. H. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, Vol.8, pp. 55-58,1992.

[2] [2] R. S. Sutton and A. G. Bart, “Reinforcement Learning -An Introduction-,” The MIT Press, 1998.

[3] [3] M. Tan, “Multiagent Reinforcement Learning: Independent vs. Cooperative Agent,” The 10th Int. Conf. on Machine Learning, pp. 330-337, 1993.

[4] [4] G.Weiss, “Multiagent Systems: AModern Approach to Distributed Artificial Intelligence,” The MIT Press, 1999.

[5] [5] P. Stone and M. Veloso, “Multiagent Systems: A Survey from a Machine Learning Perspective,” Autonomous Robots, Vol.8, pp. 345-383, 1997.

[6] [6] E. Yang and D. Gu, “Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey,” Technical Report CSM-404, Department of Computer Science, University of Essex, 2004.

[7] [7] Y.-M. D. Hauwere, P. Vrancx and A. Nowé, “Learning multiagent state space representations,” Proc. of the 9th Int. Conf. on Autonomous Agents and Multiagent Systems, Vol.1, pp. 715-722, 2010.

[8] [8] Y. Ichikawa, K. Sato, K. Hattori, and K. Takadama, “Entropy-based Conflict Avoidance According to Learning Progress in Multi-Agent Q-learning,” Proc. of the IADIS Int. Conf. on Intelligent Systems and Agents 2012 (ISA2012), 2012.

[9] [9] M. L. Littman, “Markov Games as a Framework for Multi-Agent Reinforcement Learning,” Proc. of the Eleventh Int. Conf. on Machine Learning, pp. 157-163, 1994.

[10] [10] J. Hu and M. P. Wellman, “Nash Q-Learning for General-Sum Stochastic Games,” J. of Machine Learning Research, Vol.4, pp. 1039-1069, 2003.