Paper:
A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces
Takuji Watanabe*, Kazuteru Miyazaki**, and Hiroaki Kobayashi***
*Platform Software Div., Software Unit, Fujitsu, Limited, Shinyokohama TECH Building, 3-9-18 Shinyokohama Kohoku-ku, Yokohama, Kanagawa 222-0033, Japan
**Department of Assessment and Research for degree Awarding, National Institution for Academic Degrees and University Evaluation, 1-29-1 Gakuennishimachi, Kodaira, Tokyo 187-8587, Japan
***Dept. of Mechanical Engineering Informatics, Meiji University, 1-1-1 Higashimita Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
The penalty avoiding rational policy making algorithm (PARP) [1] previously improved to save memory and cope with uncertainty, i.e., IPARP [2], requires that states be discretized in real environments with continuous state spaces, using function approximation or some other method. Especially, in PARP, a method that discretizes state using a basis functions is known [3]. Because this creates a new basis function based on the current input and its next observation, however, an unsuitable basis function may be generated in some asynchronous multiagent environments. We therefore propose a uniform basis function and range extent of the basis function is estimated before learning. We show the effectiveness of our proposal using a soccer game task called “Keepaway.”
- [1] K. Miyazaki and S. Kobayashi, “Reinforcement Learning for Penalty Avoiding Policy Making,” Proc. of the 2000 IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 206-211, 2000.
- [2] K. Miyazaki, T. Namatame, T. Kojima, and H. Kobayashi, “Improvement of the Penalty Avoiding Rational Policy Making algorithm to Real World Robotics,” Proc. of the 13th Int. Conf. on Advanced Robotics, pp. 1183-1188, 2007.
- [3] K. Miyazaki and S. Kobayashi, “A Reinforcement Learning System for Penalty Avoiding in Continuous State Spaces,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.11, No.6, pp. 668-676, 2007.
- [4] P. Stone, R. S. Sutton, and G. Kuhlamann, “Reinforcement Learning toward RoboCup Soccer Keepaway,” Adaptive Behavior, Vol.13, No.3, pp. 165-188, 2005.
- [5] S. Arai and N. Tanaka, “Experimental Analysis of Reward Design for Continuing Task in Multiagent Domains - RoboCup Soccer Keepaway-,” Trans. of the Japanese Society for Artificial Intelligence, Vol.21. No.6, pp. 537-546, 2006 (in Japanese).
- [6] T. Watanabe, K. Miyazaki, and H. Kobayashi, “Extension of Improved Penalty Avoiding Rational Policy Making Algorithm to Tile Coding Environment for Keepaway Tasks,” SICE Annual Conf. 2008, 2A17-3 (CD-ROM), 2008.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.