Paper:

# A Hybrid Learning Strategy for Real Hardware of Swing-Up Pendulum

## Shingo Nakamura, Ryo Saegusa, and Shuji Hashimoto

Dept. of Applied Physics, School of Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan

Generally, the bottom-up learning approaches, such as neural-network, to obtain the optimal controller of target task for mechanical system face a problem including huge number of trials, which require much time and give stress against the hardware. To avoid such problems, a simulator is often built and performed with a learning method. However, there are also problems that how simulator is constructed and how accurate it performs. In this paper, we are considering a construction of simulator directly from the real hardware. Afterward a constructed simulator is used for learning target task and the obtained optimal controller is applied to the real hardware. As an example, we picked up the pendulum swing-up task which was a typical nonlinear control problem. The construction of a simulator is performed by back-propagation method with neural-network and the optimal controller is obtained by reinforcement learning method. Both processes are implemented without using the real hardware after the data sampling, therefore, load against the hardware gets sufficiently smaller, and the objective controller can be obtained faster than using only the hardware. And we consider that our proposed method can be a basic learning strategy to obtain the optimal controller of mechanical systems.

*J. Adv. Comput. Intell. Intell. Inform.*, Vol.11, No.8, pp. 972-978, 2007.

- [1] M. F. Speider, S. Nakamura, and S. Hashimoto, “Crossing the reality gap for a swing-up pendulum,” Proc. of the 2006 IEICE General Conf., CD-Proc, D-2-12, 2006.
- [2] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” A Bradford Book, The MIT Press, 1988.
- [3] A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike adaptive elements that can solve difficult learning control problems,” IEEE Trans. Syst.Man. & Cybern, Vol.SMC-13, pp. 835-846, 1983.
- [4] Y. Xu, M. Iwase, and K. Furuta, “Time Optimal Swing-up Control of Single Pendulum,” Trans. of ASME: Journal of Dynamics Systems, Measurement and Control, Vol.123, No.5, pp. 518-527, 2001.
- [5] K. Yoshida, “Swing-up control of an inverted pendulum by energybased methods,” Proc. of the American Control Conf. 1999, pp. 4045-4047, 1999.
- [6] K. J. Astrom and K. Furuta, “Swing-up a pendulum by a energy control,” Automatica, Vol.36, pp. 287-295, 2000.
- [7] K. Doya, “Efficient Nonlinear Control with Actor-Tutor Architecture,” Advances in Neural Information Processing System, 9, pp. 1012-1018, 1996.
- [8] M. Bugeja, “Non-linear swing-up and stabilizing control of an inverted pendulum system,” Proc. IEEE Region 8 EUROCON 2003, 2003.
- [9] K. Iguchi, H. Kimura, and S. Kobayashi, “GA-based Control for Swinging up and Stabilizing Parallel Double Inverted Pendulums,” Proceedings of the 13th SICE Symposium on Decentralized Autonomous Systems, pp. 277-282, 2001 (in Japanese).
- [10] K. Doya, K. Samejima, K. Katagiri, and M. Kawato, “Multiple model-based reinforcement learning,” Neural Comput., Vol.14, No.6, pp. 1347-1369, 2002.