single-jc.php

JACIII Vol.13 No.6 pp. 675-682
doi: 10.20965/jaciii.2009.p0675
(2009)

Paper:

A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces

Takuji Watanabe*, Kazuteru Miyazaki**, and Hiroaki Kobayashi***

*Platform Software Div., Software Unit, Fujitsu, Limited, Shinyokohama TECH Building, 3-9-18 Shinyokohama Kohoku-ku, Yokohama, Kanagawa 222-0033, Japan

**Department of Assessment and Research for degree Awarding, National Institution for Academic Degrees and University Evaluation, 1-29-1 Gakuennishimachi, Kodaira, Tokyo 187-8587, Japan

***Dept. of Mechanical Engineering Informatics, Meiji University, 1-1-1 Higashimita Tama-ku, Kawasaki, Kanagawa 214-8571, Japan

Received:
April 15, 2009
Accepted:
August 18, 2009
Published:
November 20, 2009
Keywords:
reinforcement learning, profit sharing, continuous state space, improved PARP (IPARP), keepaway
Abstract

The penalty avoiding rational policy making algorithm (PARP) [1] previously improved to save memory and cope with uncertainty, i.e., IPARP [2], requires that states be discretized in real environments with continuous state spaces, using function approximation or some other method. Especially, in PARP, a method that discretizes state using a basis functions is known [3]. Because this creates a new basis function based on the current input and its next observation, however, an unsuitable basis function may be generated in some asynchronous multiagent environments. We therefore propose a uniform basis function and range extent of the basis function is estimated before learning. We show the effectiveness of our proposal using a soccer game task called “Keepaway.”

Cite this article as:
T. Watanabe, K. Miyazaki, , and H. Kobayashi, “A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces,” J. Adv. Comput. Intell. Intell. Inform., Vol.13, No.6, pp. 675-682, 2009.
Data files:
References
  1. [1] K. Miyazaki and S. Kobayashi, “Reinforcement Learning for Penalty Avoiding Policy Making,” Proc. of the 2000 IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 206-211, 2000.
  2. [2] K. Miyazaki, T. Namatame, T. Kojima, and H. Kobayashi, “Improvement of the Penalty Avoiding Rational Policy Making algorithm to Real World Robotics,” Proc. of the 13th Int. Conf. on Advanced Robotics, pp. 1183-1188, 2007.
  3. [3] K. Miyazaki and S. Kobayashi, “A Reinforcement Learning System for Penalty Avoiding in Continuous State Spaces,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.11, No.6, pp. 668-676, 2007.
  4. [4] P. Stone, R. S. Sutton, and G. Kuhlamann, “Reinforcement Learning toward RoboCup Soccer Keepaway,” Adaptive Behavior, Vol.13, No.3, pp. 165-188, 2005.
  5. [5] S. Arai and N. Tanaka, “Experimental Analysis of Reward Design for Continuing Task in Multiagent Domains - RoboCup Soccer Keepaway-,” Trans. of the Japanese Society for Artificial Intelligence, Vol.21. No.6, pp. 537-546, 2006 (in Japanese).
  6. [6] T. Watanabe, K. Miyazaki, and H. Kobayashi, “Extension of Improved Penalty Avoiding Rational Policy Making Algorithm to Tile Coding Environment for Keepaway Tasks,” SICE Annual Conf. 2008, 2A17-3 (CD-ROM), 2008.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, IE9,10,11, Opera.

Last updated on Oct. 18, 2019