Designing Internal Reward of Reinforcement Learning Agents in Multi-Step Dilemma Problem
Yoshihiro Ichikawa and Keiki Takadama
The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, Japan
This paper proposes the reinforcement learning agent that estimates internal rewards using external rewards in order to avoid conflict in multi-step dilemma problem. Intensive simulation results have revealed that the agent succeeds in avoiding local convergence and obtains a behavior policy for reaching a higher reward by updating the Q-value using the value that is subtracted the average reward from an external reward.
-  C. J. C. H. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, Vol.8, pp. 55-58,1992.
-  R. S. Sutton and A. G. Bart, “Reinforcement Learning -An Introduction-,” The MIT Press, 1998.
-  M. Tan, “Multiagent Reinforcement Learning: Independent vs. Cooperative Agent,” The 10th Int. Conf. on Machine Learning, pp. 330-337, 1993.
-  G.Weiss, “Multiagent Systems: AModern Approach to Distributed Artificial Intelligence,” The MIT Press, 1999.
-  P. Stone and M. Veloso, “Multiagent Systems: A Survey from a Machine Learning Perspective,” Autonomous Robots, Vol.8, pp. 345-383, 1997.
-  E. Yang and D. Gu, “Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey,” Technical Report CSM-404, Department of Computer Science, University of Essex, 2004.
-  Y.-M. D. Hauwere, P. Vrancx and A. Nowé, “Learning multiagent state space representations,” Proc. of the 9th Int. Conf. on Autonomous Agents and Multiagent Systems, Vol.1, pp. 715-722, 2010.
-  Y. Ichikawa, K. Sato, K. Hattori, and K. Takadama, “Entropy-based Conflict Avoidance According to Learning Progress in Multi-Agent Q-learning,” Proc. of the IADIS Int. Conf. on Intelligent Systems and Agents 2012 (ISA2012), 2012.
-  M. L. Littman, “Markov Games as a Framework for Multi-Agent Reinforcement Learning,” Proc. of the Eleventh Int. Conf. on Machine Learning, pp. 157-163, 1994.
-  J. Hu and M. P. Wellman, “Nash Q-Learning for General-Sum Stochastic Games,” J. of Machine Learning Research, Vol.4, pp. 1039-1069, 2003.