A New Improved Penalty Avoiding Rational Policy  Making Algorithm for Keepaway with Continuous State Spaces

Takuji Watanabe; Kazuteru Miyazaki; Hiroaki Kobayashi

doi:10.20965/jaciii.2009.p0675

single-jc.php

« previous

JACIII Vol.13 No.6 pp. 675-682

doi: 10.20965/jaciii.2009.p0675

(2009)

Paper:

Views over last 60 days: 578

A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces

Takuji Watanabe^, Kazuteru Miyazaki^, and Hiroaki Kobayashi^

^*Platform Software Div., Software Unit, Fujitsu, Limited, Shinyokohama TECH Building, 3-9-18 Shinyokohama Kohoku-ku, Yokohama, Kanagawa 222-0033, Japan

^**Department of Assessment and Research for degree Awarding, National Institution for Academic Degrees and University Evaluation, 1-29-1 Gakuennishimachi, Kodaira, Tokyo 187-8587, Japan

^***Dept. of Mechanical Engineering Informatics, Meiji University, 1-1-1 Higashimita Tama-ku, Kawasaki, Kanagawa 214-8571, Japan

Received:

April 15, 2009

Accepted:

August 18, 2009

Published:

November 20, 2009

Keywords:

reinforcement learning, profit sharing, continuous state space, improved PARP (IPARP), keepaway

Abstract

The penalty avoiding rational policy making algorithm (PARP) [1] previously improved to save memory and cope with uncertainty, i.e., IPARP [2], requires that states be discretized in real environments with continuous state spaces, using function approximation or some other method. Especially, in PARP, a method that discretizes state using a basis functions is known [3]. Because this creates a new basis function based on the current input and its next observation, however, an unsuitable basis function may be generated in some asynchronous multiagent environments. We therefore propose a uniform basis function and range extent of the basis function is estimated before learning. We show the effectiveness of our proposal using a soccer game task called “Keepaway.”

Cite this article as:

T. Watanabe, K. Miyazaki, and H. Kobayashi, “A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces,” J. Adv. Comput. Intell. Intell. Inform., Vol.13 No.6, pp. 675-682, 2009.

Data files:

References

[1] K. Miyazaki and S. Kobayashi, “Reinforcement Learning for Penalty Avoiding Policy Making,” Proc. of the 2000 IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 206-211, 2000.
[2] K. Miyazaki, T. Namatame, T. Kojima, and H. Kobayashi, “Improvement of the Penalty Avoiding Rational Policy Making algorithm to Real World Robotics,” Proc. of the 13th Int. Conf. on Advanced Robotics, pp. 1183-1188, 2007.
[3] K. Miyazaki and S. Kobayashi, “A Reinforcement Learning System for Penalty Avoiding in Continuous State Spaces,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.11, No.6, pp. 668-676, 2007.
[4] P. Stone, R. S. Sutton, and G. Kuhlamann, “Reinforcement Learning toward RoboCup Soccer Keepaway,” Adaptive Behavior, Vol.13, No.3, pp. 165-188, 2005.
[5] S. Arai and N. Tanaka, “Experimental Analysis of Reward Design for Continuing Task in Multiagent Domains - RoboCup Soccer Keepaway-,” Trans. of the Japanese Society for Artificial Intelligence, Vol.21. No.6, pp. 537-546, 2006 (in Japanese).
[6] T. Watanabe, K. Miyazaki, and H. Kobayashi, “Extension of Improved Penalty Avoiding Rational Policy Making Algorithm to Tile Coding Environment for Keepaway Tasks,” SICE Annual Conf. 2008, 2A17-3 (CD-ROM), 2008.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] K. Miyazaki and S. Kobayashi, “Reinforcement Learning for Penalty Avoiding Policy Making,” Proc. of the 2000 IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 206-211, 2000.

[2] [2] K. Miyazaki, T. Namatame, T. Kojima, and H. Kobayashi, “Improvement of the Penalty Avoiding Rational Policy Making algorithm to Real World Robotics,” Proc. of the 13th Int. Conf. on Advanced Robotics, pp. 1183-1188, 2007.

[3] [3] K. Miyazaki and S. Kobayashi, “A Reinforcement Learning System for Penalty Avoiding in Continuous State Spaces,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.11, No.6, pp. 668-676, 2007.

[4] [4] P. Stone, R. S. Sutton, and G. Kuhlamann, “Reinforcement Learning toward RoboCup Soccer Keepaway,” Adaptive Behavior, Vol.13, No.3, pp. 165-188, 2005.

[5] [5] S. Arai and N. Tanaka, “Experimental Analysis of Reward Design for Continuing Task in Multiagent Domains - RoboCup Soccer Keepaway-,” Trans. of the Japanese Society for Artificial Intelligence, Vol.21. No.6, pp. 537-546, 2006 (in Japanese).

[6] [6] T. Watanabe, K. Miyazaki, and H. Kobayashi, “Extension of Improved Penalty Avoiding Rational Policy Making Algorithm to Tile Coding Environment for Keepaway Tasks,” SICE Annual Conf. 2008, 2A17-3 (CD-ROM), 2008.

A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces

Takuji Watanabe*, Kazuteru Miyazaki**, and Hiroaki Kobayashi***

Takuji Watanabe^, Kazuteru Miyazaki^, and Hiroaki Kobayashi^