Paper:

# Expression of Continuous State and Action Spaces for *Q*-Learning Using Neural Networks and CMAC

## Kazuaki Yamada

Department of Mechanical Engineering, Toyo University, 2100 Kujirai, Kawagoe-shi, Saitama 350-8585, Japan

This paper proposes a new reinforcement learning algorithm that can learn, using neural networks and CMAC, a mapping function between highdimensional sensors and the motors of an autonomous robot. Conventional reinforcement learning algorithms require a lot of memory because they use lookup tables to describe high-dimensional mapping functions. Researchers have therefore tried to develop reinforcement learning algorithms that can learn the high-dimensional mapping functions. We apply the proposed method to an autonomous robot navigation problem and a multi-link robot arm reaching problem, and we evaluate the effectiveness of the method.

*Q*-Learning Using Neural Networks and CMAC,”

*J. Robot. Mechatron.*, Vol.24, No.2, pp. 330-339, 2012.

- [1] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A Survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.
- [2] R. S. Sutton and A. Barto, “Reinforcement Learning: An Introduction,” The MIT Press, 1998.
- [3] Y. Takahashi, M. Asada, S. Noda, and K. Hosoda, “Sensor Space Segmentation for Mobile Robot Learning,” Proc. of ICMAS’96 Workshop on Learning, Interaction and Organizations inMultiagent Environment, 1996.
- [4] T. Yairi, K. Hori, and S. Nakasuka, “Autonomous Reconstruction of State Space for Learning of Robot Behavior,” Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems 2000, pp. 891-896, 2000.
- [5] M. Asada, S. Noda, and K. Hosoda, “Action-Based Sensor Space Categorization for Robot Learning,” Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems 1996, pp. 1502-1509, 1996.
- [6] Y. Kashimura, A. Ueno, and S. Tatsumi, “A Continuous Action Space Representation by Particle Filter for Reinforcement Learning,” The 22nd Annual Conf. of the Japanese Society for Artificial Intelligence, 2008 (in Japanese).
- [7] H. Kimura and S. Kobayashi, “An Analysis of Actor-Critic Algorithms using Eligibility Traces – Reinforcement Learning with Imperfect Value Function,” 15th Int. Conf. on Machine Learning, pp. 278-286, 1998.
- [8] J.Morimoto and K. Doya, “Reinforcement learning of dynamic motor sequences: Learning to stand up,” Proc. of IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vol.3, pp. 1721-1726, 1998.
- [9] K. Doya, “Reinforcement Learning in Continuous Time and Space,” Neural computation, Vol.12, pp. 219-245, 2000.
- [10] K. Shibata, M. Sugisaka, and K. Ito, “Fast and Stable Learning in Direct-Vision-Based Reinforcement Learning,” Proc. of 6th Int. Symposium on Artificial Life and Robotics, Vol.1, pp. 200-203, 2001.
- [11] J. Yoshimoto, S. Ishii, and M. Sato, “On-line EM reinforcement learning,” Proc. of IEEE-INNS-ENNS Int. Joint Conf. on Neural Networks (IJCNN2000), Vol.3, pp. 163-168, 2000.
- [12] K. Yamada, “Network Parameter Setting for Reinforcement Learning Approaches Using Neural Networks,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.15, No.7, pp. 822-830, 2011.
- [13] H. Kimura, “Natural Gradient Actor-Critic Algorithms using Random Rectangular Coarse Coding,” The Society of Instrument and Control Engineers (SICE) Annual Conf. 2008, 2A17-1, pp. 2027-2034, 2008.
- [14] Y. Omori, “Recent Developments in Markov Chain Monte Carlo Method,” J. of Japan Statistical Society, Vol.31, No.3, pp. 305-344, 2001 (in Japanese).
- [15] S. Abe, “Neural Networks and Fuzzy Systems,” Springer, 1997.
- [16] H. Shimodaira, “A Weight Value Initialization Method for Improving Learning Performance of the Back Propagation Algorithm in Neural Networks,” J. of Information Processing, Vol.35, No.10, pp. 2046-2053, 1994 (in Japanese).
- [17] R. S. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” Advances in Neural Information Processing Systems, Vol.8, pp. 1038-1044, 1996.
- [18] B. Sallans and G. E. Hinton, “Reinforcement Learning with Factored States and Actions,” J. of Machine Learning Research, Vol.5, pp. 1063-1088, 2004.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

Copyright© 2012 by Fuji Technology Press Ltd. and Japan Society of Mechanical Engineers. All right reserved.