Paper:
Acquisition of a Gymnast-Like Robotic Giant-Swing Motion by Q-Learning and Improvement of the Repeatability
Masayuki Hara*1, Naoto Kawabe*2, Jian Huang*3,
and Tetsuro Yabuta*4
*1Robotics Systems Laboratory (LSRO), Ecole Polytechnique Federale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
*2Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
*3Dept. of Intelligent Mechanical Engineering, School of Engineering, Kinki University, 1 Takaya Umenobe, Higashi-Hiroshima City, Hiroshima 739-2116, Japan
*4Dept. of Mechanical Engineering, Yokohama National University, 79-5 Tokiwadai, Hodogaya-ku, Yokohama, Kanagawa 240-8501, Japan
- [1] T. Mitchell, “Machine Learning,” McGraw-Hill Science, 1997.
- [2] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.
- [3] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
- [4] B. F. Skinner, “The technology of teaching,” Prentice Hall College Div., 1968.
- [5] S. Mahadevan and J. Connell, “Automatic programming of behavior-based robots using reinforcement learning,” Artificial Intelligence, Vol.55, pp. 311-365, 1992.
- [6] K. Doya, “Reinforcement learning in animals and robots,” pp. 69-71, 1996.
- [7] M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda, “Purposive behavior acquisition for a real robot by vision-based reinforcement learning,” Machine learning, Vol.23, pp. 279-303, 1996.
- [8] Y. Takahashi, M. Asada, and K. Hosoda, “Reasonable performance in less learning time by real robot based on incremental state space segmentation,” Vol.3, 1996.
- [9] M. J. Mataric, “Reinforcement learning in the multi-robot domain,” Autonomous Robots, Vol.4, pp. 73-83, 1997.
- [10] Z. Kalmar, C. Szepesvari, and A. Lorincz, “Module-based reinforcement learning: Experiments with a real robot,” Autonomous Robots, Vol.5, pp. 273-295, 1998.
- [11] H. Kimura and S. Kobayashi, “Reinforcement learning using stochastic gradient algorithm and its application to robots,” IEE Japan Trans. on Electronics, Information and Systems, Vol.119-C, No.8, pp. 931-934, 1999. (in Japanese)
- [12] H. Kimura, T. Yamashita, and S. Kobayashi, “Reinforcement learning of walking behavior for a four-legged robot,” pp. 411-416, 2001.
- [13] J. Peters, S. Vijayakumar, and S. Shcaal, “Reinforcement learning for humanoid robotics,” 2003.
- [14] V. R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,” Society for Industrial and Applied Mathematics, Vol.42, No.4, pp. 1143-1166, 2003.
- [15] T. Mori, Y. Nakamura, and S. Ishii, “Reinforcement learning based on a policy gradient method for a biped locomotion,” Trans. of the IEICE, Vol.J88-D-II, No.6, pp. 1080-1089, 2005.
- [16] J. Peters and S. Schaal, “Natural actor-critic,” Neurocomputing, Vol.71, pp. 1180-1190, 2008.
- [17] C. Juang and C. Lu, “Ant colony optimization incorporated with fuzzy Q-learning for reinforcement fuzzy control,” IEEE Trans. on Systems, Man, and Cybernetics, Part A: Systems and Humans, Vol.39, No.3, pp. 597-608, 2009.
- [18] T. Rucksties, F. Sehnke, T. Schaul, J. Wierstra, Y. Sun, and J. Schmidhuber, “Exploring parameter space in reinforcement learning,” Journal of Behavioral Robotics, Vol.1, No.1, pp. 14-24, 2010.
- [19] M. Hara, M. Inoue, H. Motoyama, J. Huang, and T. Yabuta, “Study on motion forms of mobile robots generated by Q-learning process based on reward databases,” pp. 5112-5117, 2006.
- [20] Y. Jung, M. Inoue, M. Hara, J. Huang, and T. Yabuta, “Study on motion forms of a two-dimensional mobile robot by using reinforcement learning,” pp. 4240-4245, 2006.
- [21] F. Saito, T. Fukuda, F. Arai, and K. Kosuge, “Heuristic generation of driving input and control of brachiation robot,” JSME Int. J., Vol.37-C, No.1, pp. 147-154, 1994.
- [22] M. W. Spong, “The swing up control problem for the acrobot,” IEEE Control Magazine, Vol.15, No.1, pp. 49-55, 1995.
- [23] G. Boone, “Efficient reinforcement learning: Model-based acrobot control,” pp. 229-234, 1997.
- [24] Y. Michitsuji, M. Sato, and H. Yamakita, “Giant swing via forward upward circling of the acrobat-robot,” pp. 3262-3267, 2001.
- [25] M. Nishimura, J. Yoshimoto, Y. Tokita, Y. Nakamura, and S. Ishii, “Control of real acrobot by learning the switching rule of multiple controllers,” Trans. of the IEICE, Vol.J88-A, No.5, pp. 646-657, 2005. (in Japanese)
- [26] T. Fukuda and Y. Hasegawa, “Learning method for multi-controller of robot behavior,” JSME Int. J., Vol.41-C, No.2, pp. 260-268, 1998.
- [27] Y. Hasegawa, Y. Ito, and T. Fukuda, “Behavior-based control for 13-link brachiation robot,” Automatisierungstechnik, Vol.48, No.6, pp. 305-310, 2000.
- [28] M. Hara, N. Kawabe, N. Sakai, J. Huang, and T. Yabuta, “Consideration on robotic giant-swing motion generated by reinforcement learning,” pp. 4206-4211, 2009.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.
Copyright© 2011 by Fuji Technology Press Ltd. and Japan Society of Mechanical Engineers. All right reserved.