
JRM Vol.17 No.6 pp. 636-644
doi: 10.20965/jrm.2005.p0636


Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller

Yutaka Nakamura, Takeshi Mori, Yoichi Tokita,
Tomohiro Shibata, and Shin Ishii

Theoretical Life Science Lab., Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan

February 10, 2005
June 21, 2005
December 20, 2005
reinforcement learning, off-policy learning, biped walking, central pattern generator (CPG)
Referring to the mechanism of animals’ rhythmic movements, motor control schemes using a central pattern generator (CPG) controller have been studied. We previously proposed reinforcement learning (RL) called the CPG-actor-critic model, as an autonomous learning framework for a CPG controller. Here, we propose an off-policy natural policy gradient RL algorithm for the CPG-actor-critic model, to solve the “exploration-exploitation” problem by meta-controlling “behavior policy.” We apply this RL algorithm to an automatic control problem using a biped robot simulator. Computer simulation demonstrated that the CPG controller enables the biped robot to walk stably and efficiently based on our new algorithm.
Cite this article as:
Y. Nakamura, T. Mori, Y. Tokita, T. Shibata, and S. Ishii, “Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller,” J. Robot. Mechatron., Vol.17 No.6, pp. 636-644, 2005.
Data files:
