Acquisition of a Gymnast-Like Robotic Giant-Swing Motion by Q-Learning and Improvement of the Repeatability

Masayuki Hara; Naoto Kawabe; Jian Huang; Tetsuro Yabuta

doi:10.20965/jrm.2011.p0126

single-rb.php

« previous

JRM Vol.23 No.1 pp. 126-136

doi: 10.20965/jrm.2011.p0126

(2011)

Paper:

Views over last 60 days: 612

Acquisition of a Gymnast-Like Robotic Giant-Swing Motion by Q-Learning and Improvement of the Repeatability

Masayuki Hara^1, Naoto Kawabe^2, Jian Huang^3,
and Tetsuro Yabuta^4

^*1Robotics Systems Laboratory (LSRO), Ecole Polytechnique Federale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland

^*2Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan

^*3Dept. of Intelligent Mechanical Engineering, School of Engineering, Kinki University, 1 Takaya Umenobe, Higashi-Hiroshima City, Hiroshima 739-2116, Japan

^*4Dept. of Mechanical Engineering, Yokohama National University, 79-5 Tokiwadai, Hodogaya-ku, Yokohama, Kanagawa 240-8501, Japan

Received:

April 22, 2010

Accepted:

October 1, 2010

Published:

February 20, 2011

Keywords:

Q-learning, reinforcement learning, motion learning, giant-swing motion

Abstract

This paper proposes an application of Q-learning to a compact humanoid robot, aiming at acquisition of a gymnastic swinging even through the Markov property may not be guaranteed in such dynamic motions. As for this, several studies have relied on information from the robotic models or multiple controllers, but very few studies have tried Q-learning of human-like swing motion without preliminary knowledge. We avoid this Markov property problem by embedding the dynamic information in a robotic state space and averaging action-value functions. In this study, Q-learning is executed with a dynamic simulator based on a real humanoid robot with 5 degrees of freedom (DOF) and we verify the learning effectiveness by actually applying the learning results to the real robot. The particularly significant point in our Q-learning is that preliminary information is eliminated as far as possible; only the reward and current robotic state are available. The key factor in robotic giant-swing motion is discussed by examining effects of various rewards on the robotic performance. In addition, we argue a method for improving the repeatability and duration until reaching giant-swing motion. Finally, this paper demonstrates an attractive robotic giant-swing motion generated by only the environmental interaction.

Cite this article as:

M. Hara, N. Kawabe, J. Huang, and T. Yabuta, “Acquisition of a Gymnast-Like Robotic Giant-Swing Motion by Q-Learning and Improvement of the Repeatability,” J. Robot. Mechatron., Vol.23 No.1, pp. 126-136, 2011.

Data files:

References

[1] T. Mitchell, “Machine Learning,” McGraw-Hill Science, 1997.
[2] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.
[3] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
[4] B. F. Skinner, “The technology of teaching,” Prentice Hall College Div., 1968.
[5] S. Mahadevan and J. Connell, “Automatic programming of behavior-based robots using reinforcement learning,” Artificial Intelligence, Vol.55, pp. 311-365, 1992.
[6] K. Doya, “Reinforcement learning in animals and robots,” pp. 69-71, 1996.
[7] M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda, “Purposive behavior acquisition for a real robot by vision-based reinforcement learning,” Machine learning, Vol.23, pp. 279-303, 1996.
[8] Y. Takahashi, M. Asada, and K. Hosoda, “Reasonable performance in less learning time by real robot based on incremental state space segmentation,” Vol.3, 1996.
[9] M. J. Mataric, “Reinforcement learning in the multi-robot domain,” Autonomous Robots, Vol.4, pp. 73-83, 1997.
[10] Z. Kalmar, C. Szepesvari, and A. Lorincz, “Module-based reinforcement learning: Experiments with a real robot,” Autonomous Robots, Vol.5, pp. 273-295, 1998.
[11] H. Kimura and S. Kobayashi, “Reinforcement learning using stochastic gradient algorithm and its application to robots,” IEE Japan Trans. on Electronics, Information and Systems, Vol.119-C, No.8, pp. 931-934, 1999. (in Japanese)
[12] H. Kimura, T. Yamashita, and S. Kobayashi, “Reinforcement learning of walking behavior for a four-legged robot,” pp. 411-416, 2001.
[13] J. Peters, S. Vijayakumar, and S. Shcaal, “Reinforcement learning for humanoid robotics,” 2003.
[14] V. R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,” Society for Industrial and Applied Mathematics, Vol.42, No.4, pp. 1143-1166, 2003.
[15] T. Mori, Y. Nakamura, and S. Ishii, “Reinforcement learning based on a policy gradient method for a biped locomotion,” Trans. of the IEICE, Vol.J88-D-II, No.6, pp. 1080-1089, 2005.
[16] J. Peters and S. Schaal, “Natural actor-critic,” Neurocomputing, Vol.71, pp. 1180-1190, 2008.
[17] C. Juang and C. Lu, “Ant colony optimization incorporated with fuzzy Q-learning for reinforcement fuzzy control,” IEEE Trans. on Systems, Man, and Cybernetics, Part A: Systems and Humans, Vol.39, No.3, pp. 597-608, 2009.
[18] T. Rucksties, F. Sehnke, T. Schaul, J. Wierstra, Y. Sun, and J. Schmidhuber, “Exploring parameter space in reinforcement learning,” Journal of Behavioral Robotics, Vol.1, No.1, pp. 14-24, 2010.
[19] M. Hara, M. Inoue, H. Motoyama, J. Huang, and T. Yabuta, “Study on motion forms of mobile robots generated by Q-learning process based on reward databases,” pp. 5112-5117, 2006.
[20] Y. Jung, M. Inoue, M. Hara, J. Huang, and T. Yabuta, “Study on motion forms of a two-dimensional mobile robot by using reinforcement learning,” pp. 4240-4245, 2006.
[21] F. Saito, T. Fukuda, F. Arai, and K. Kosuge, “Heuristic generation of driving input and control of brachiation robot,” JSME Int. J., Vol.37-C, No.1, pp. 147-154, 1994.
[22] M. W. Spong, “The swing up control problem for the acrobot,” IEEE Control Magazine, Vol.15, No.1, pp. 49-55, 1995.
[23] G. Boone, “Efficient reinforcement learning: Model-based acrobot control,” pp. 229-234, 1997.
[24] Y. Michitsuji, M. Sato, and H. Yamakita, “Giant swing via forward upward circling of the acrobat-robot,” pp. 3262-3267, 2001.
[25] M. Nishimura, J. Yoshimoto, Y. Tokita, Y. Nakamura, and S. Ishii, “Control of real acrobot by learning the switching rule of multiple controllers,” Trans. of the IEICE, Vol.J88-A, No.5, pp. 646-657, 2005. (in Japanese)
[26] T. Fukuda and Y. Hasegawa, “Learning method for multi-controller of robot behavior,” JSME Int. J., Vol.41-C, No.2, pp. 260-268, 1998.
[27] Y. Hasegawa, Y. Ito, and T. Fukuda, “Behavior-based control for 13-link brachiation robot,” Automatisierungstechnik, Vol.48, No.6, pp. 305-310, 2000.
[28] M. Hara, N. Kawabe, N. Sakai, J. Huang, and T. Yabuta, “Consideration on robotic giant-swing motion generated by reinforcement learning,” pp. 4206-4211, 2009.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] T. Mitchell, “Machine Learning,” McGraw-Hill Science, 1997.

[2] [2] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.

[3] [3] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.

[4] [4] B. F. Skinner, “The technology of teaching,” Prentice Hall College Div., 1968.

[5] [5] S. Mahadevan and J. Connell, “Automatic programming of behavior-based robots using reinforcement learning,” Artificial Intelligence, Vol.55, pp. 311-365, 1992.

[6] [6] K. Doya, “Reinforcement learning in animals and robots,” pp. 69-71, 1996.

[7] [7] M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda, “Purposive behavior acquisition for a real robot by vision-based reinforcement learning,” Machine learning, Vol.23, pp. 279-303, 1996.

[8] [8] Y. Takahashi, M. Asada, and K. Hosoda, “Reasonable performance in less learning time by real robot based on incremental state space segmentation,” Vol.3, 1996.

[9] [9] M. J. Mataric, “Reinforcement learning in the multi-robot domain,” Autonomous Robots, Vol.4, pp. 73-83, 1997.

[10] [10] Z. Kalmar, C. Szepesvari, and A. Lorincz, “Module-based reinforcement learning: Experiments with a real robot,” Autonomous Robots, Vol.5, pp. 273-295, 1998.

[11] [11] H. Kimura and S. Kobayashi, “Reinforcement learning using stochastic gradient algorithm and its application to robots,” IEE Japan Trans. on Electronics, Information and Systems, Vol.119-C, No.8, pp. 931-934, 1999. (in Japanese)

[12] [12] H. Kimura, T. Yamashita, and S. Kobayashi, “Reinforcement learning of walking behavior for a four-legged robot,” pp. 411-416, 2001.

[13] [13] J. Peters, S. Vijayakumar, and S. Shcaal, “Reinforcement learning for humanoid robotics,” 2003.

[14] [14] V. R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,” Society for Industrial and Applied Mathematics, Vol.42, No.4, pp. 1143-1166, 2003.

[15] [15] T. Mori, Y. Nakamura, and S. Ishii, “Reinforcement learning based on a policy gradient method for a biped locomotion,” Trans. of the IEICE, Vol.J88-D-II, No.6, pp. 1080-1089, 2005.

[16] [16] J. Peters and S. Schaal, “Natural actor-critic,” Neurocomputing, Vol.71, pp. 1180-1190, 2008.

[17] [17] C. Juang and C. Lu, “Ant colony optimization incorporated with fuzzy Q-learning for reinforcement fuzzy control,” IEEE Trans. on Systems, Man, and Cybernetics, Part A: Systems and Humans, Vol.39, No.3, pp. 597-608, 2009.

[18] [18] T. Rucksties, F. Sehnke, T. Schaul, J. Wierstra, Y. Sun, and J. Schmidhuber, “Exploring parameter space in reinforcement learning,” Journal of Behavioral Robotics, Vol.1, No.1, pp. 14-24, 2010.

[19] [19] M. Hara, M. Inoue, H. Motoyama, J. Huang, and T. Yabuta, “Study on motion forms of mobile robots generated by Q-learning process based on reward databases,” pp. 5112-5117, 2006.

[20] [20] Y. Jung, M. Inoue, M. Hara, J. Huang, and T. Yabuta, “Study on motion forms of a two-dimensional mobile robot by using reinforcement learning,” pp. 4240-4245, 2006.

[21] [21] F. Saito, T. Fukuda, F. Arai, and K. Kosuge, “Heuristic generation of driving input and control of brachiation robot,” JSME Int. J., Vol.37-C, No.1, pp. 147-154, 1994.

[22] [22] M. W. Spong, “The swing up control problem for the acrobot,” IEEE Control Magazine, Vol.15, No.1, pp. 49-55, 1995.

[23] [23] G. Boone, “Efficient reinforcement learning: Model-based acrobot control,” pp. 229-234, 1997.

[24] [24] Y. Michitsuji, M. Sato, and H. Yamakita, “Giant swing via forward upward circling of the acrobat-robot,” pp. 3262-3267, 2001.

[25] [25] M. Nishimura, J. Yoshimoto, Y. Tokita, Y. Nakamura, and S. Ishii, “Control of real acrobot by learning the switching rule of multiple controllers,” Trans. of the IEICE, Vol.J88-A, No.5, pp. 646-657, 2005. (in Japanese)

[26] [26] T. Fukuda and Y. Hasegawa, “Learning method for multi-controller of robot behavior,” JSME Int. J., Vol.41-C, No.2, pp. 260-268, 1998.

[27] [27] Y. Hasegawa, Y. Ito, and T. Fukuda, “Behavior-based control for 13-link brachiation robot,” Automatisierungstechnik, Vol.48, No.6, pp. 305-310, 2000.

[28] [28] M. Hara, N. Kawabe, N. Sakai, J. Huang, and T. Yabuta, “Consideration on robotic giant-swing motion generated by reinforcement learning,” pp. 4206-4211, 2009.

Acquisition of a Gymnast-Like Robotic Giant-Swing Motion by Q-Learning and Improvement of the Repeatability

Masayuki Hara*1, Naoto Kawabe*2, Jian Huang*3, and Tetsuro Yabuta*4

Masayuki Hara^1, Naoto Kawabe^2, Jian Huang^3,
and Tetsuro Yabuta^4