JACIII Vol.17 No.6 pp. 926-931
doi: 10.20965/jaciii.2013.p0926


Designing Internal Reward of Reinforcement Learning Agents in Multi-Step Dilemma Problem

Yoshihiro Ichikawa and Keiki Takadama

The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, Japan

May 21, 2013
September 26, 2013
November 20, 2013
multi-agent, reinforcement learning, conflict avoidance, multi-step dilemma problem
This paper proposes the reinforcement learning agent that estimates internal rewards using external rewards in order to avoid conflict in multi-step dilemma problem. Intensive simulation results have revealed that the agent succeeds in avoiding local convergence and obtains a behavior policy for reaching a higher reward by updating the Q-value using the value that is subtracted the average reward from an external reward.
Cite this article as:
Y. Ichikawa and K. Takadama, “Designing Internal Reward of Reinforcement Learning Agents in Multi-Step Dilemma Problem,” J. Adv. Comput. Intell. Intell. Inform., Vol.17 No.6, pp. 926-931, 2013.
Data files:
  1. [1] C. J. C. H. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, Vol.8, pp. 55-58,1992.
  2. [2] R. S. Sutton and A. G. Bart, “Reinforcement Learning -An Introduction-,” The MIT Press, 1998.
  3. [3] M. Tan, “Multiagent Reinforcement Learning: Independent vs. Cooperative Agent,” The 10th Int. Conf. on Machine Learning, pp. 330-337, 1993.
  4. [4] G.Weiss, “Multiagent Systems: AModern Approach to Distributed Artificial Intelligence,” The MIT Press, 1999.
  5. [5] P. Stone and M. Veloso, “Multiagent Systems: A Survey from a Machine Learning Perspective,” Autonomous Robots, Vol.8, pp. 345-383, 1997.
  6. [6] E. Yang and D. Gu, “Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey,” Technical Report CSM-404, Department of Computer Science, University of Essex, 2004.
  7. [7] Y.-M. D. Hauwere, P. Vrancx and A. Nowé, “Learning multiagent state space representations,” Proc. of the 9th Int. Conf. on Autonomous Agents and Multiagent Systems, Vol.1, pp. 715-722, 2010.
  8. [8] Y. Ichikawa, K. Sato, K. Hattori, and K. Takadama, “Entropy-based Conflict Avoidance According to Learning Progress in Multi-Agent Q-learning,” Proc. of the IADIS Int. Conf. on Intelligent Systems and Agents 2012 (ISA2012), 2012.
  9. [9] M. L. Littman, “Markov Games as a Framework for Multi-Agent Reinforcement Learning,” Proc. of the Eleventh Int. Conf. on Machine Learning, pp. 157-163, 1994.
  10. [10] J. Hu and M. P. Wellman, “Nash Q-Learning for General-Sum Stochastic Games,” J. of Machine Learning Research, Vol.4, pp. 1039-1069, 2003.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on May. 19, 2024