JACIII Vol.13 No.6 pp. 675-682
doi: 10.20965/jaciii.2009.p0675


A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces

Takuji Watanabe*, Kazuteru Miyazaki**, and Hiroaki Kobayashi***

*Platform Software Div., Software Unit, Fujitsu, Limited, Shinyokohama TECH Building, 3-9-18 Shinyokohama Kohoku-ku, Yokohama, Kanagawa 222-0033, Japan

**Department of Assessment and Research for degree Awarding, National Institution for Academic Degrees and University Evaluation, 1-29-1 Gakuennishimachi, Kodaira, Tokyo 187-8587, Japan

***Dept. of Mechanical Engineering Informatics, Meiji University, 1-1-1 Higashimita Tama-ku, Kawasaki, Kanagawa 214-8571, Japan

April 15, 2009
August 18, 2009
November 20, 2009
reinforcement learning, profit sharing, continuous state space, improved PARP (IPARP), keepaway

The penalty avoiding rational policy making algorithm (PARP) [1] previously improved to save memory and cope with uncertainty, i.e., IPARP [2], requires that states be discretized in real environments with continuous state spaces, using function approximation or some other method. Especially, in PARP, a method that discretizes state using a basis functions is known [3]. Because this creates a new basis function based on the current input and its next observation, however, an unsuitable basis function may be generated in some asynchronous multiagent environments. We therefore propose a uniform basis function and range extent of the basis function is estimated before learning. We show the effectiveness of our proposal using a soccer game task called “Keepaway.”

