single-rb.php

JRM Vol.24 No.2 pp. 330-339
doi: 10.20965/jrm.2012.p0330
(2012)

Paper:

Expression of Continuous State and Action Spaces for Q-Learning Using Neural Networks and CMAC

Kazuaki Yamada

Department of Mechanical Engineering, Toyo University, 2100 Kujirai, Kawagoe-shi, Saitama 350-8585, Japan

Received:
October 1, 2011
Accepted:
January 18, 2012
Published:
April 20, 2012
Keywords:
reinforcement learning, neural networks, CMAC, griddy Gibbs sampler, autonomous robots
Abstract

This paper proposes a new reinforcement learning algorithm that can learn, using neural networks and CMAC, a mapping function between highdimensional sensors and the motors of an autonomous robot. Conventional reinforcement learning algorithms require a lot of memory because they use lookup tables to describe high-dimensional mapping functions. Researchers have therefore tried to develop reinforcement learning algorithms that can learn the high-dimensional mapping functions. We apply the proposed method to an autonomous robot navigation problem and a multi-link robot arm reaching problem, and we evaluate the effectiveness of the method.

Cite this article as:
Kazuaki Yamada, “Expression of Continuous State and Action Spaces for Q-Learning Using Neural Networks and CMAC,” J. Robot. Mechatron., Vol.24, No.2, pp. 330-339, 2012.
Data files:
References
  1. [1] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A Survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.
  2. [2] R. S. Sutton and A. Barto, “Reinforcement Learning: An Introduction,” The MIT Press, 1998.
  3. [3] Y. Takahashi, M. Asada, S. Noda, and K. Hosoda, “Sensor Space Segmentation for Mobile Robot Learning,” Proc. of ICMAS’96 Workshop on Learning, Interaction and Organizations inMultiagent Environment, 1996.
  4. [4] T. Yairi, K. Hori, and S. Nakasuka, “Autonomous Reconstruction of State Space for Learning of Robot Behavior,” Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems 2000, pp. 891-896, 2000.
  5. [5] M. Asada, S. Noda, and K. Hosoda, “Action-Based Sensor Space Categorization for Robot Learning,” Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems 1996, pp. 1502-1509, 1996.
  6. [6] Y. Kashimura, A. Ueno, and S. Tatsumi, “A Continuous Action Space Representation by Particle Filter for Reinforcement Learning,” The 22nd Annual Conf. of the Japanese Society for Artificial Intelligence, 2008 (in Japanese).
  7. [7] H. Kimura and S. Kobayashi, “An Analysis of Actor-Critic Algorithms using Eligibility Traces – Reinforcement Learning with Imperfect Value Function,” 15th Int. Conf. on Machine Learning, pp. 278-286, 1998.
  8. [8] J.Morimoto and K. Doya, “Reinforcement learning of dynamic motor sequences: Learning to stand up,” Proc. of IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vol.3, pp. 1721-1726, 1998.
  9. [9] K. Doya, “Reinforcement Learning in Continuous Time and Space,” Neural computation, Vol.12, pp. 219-245, 2000.
  10. [10] K. Shibata, M. Sugisaka, and K. Ito, “Fast and Stable Learning in Direct-Vision-Based Reinforcement Learning,” Proc. of 6th Int. Symposium on Artificial Life and Robotics, Vol.1, pp. 200-203, 2001.
  11. [11] J. Yoshimoto, S. Ishii, and M. Sato, “On-line EM reinforcement learning,” Proc. of IEEE-INNS-ENNS Int. Joint Conf. on Neural Networks (IJCNN2000), Vol.3, pp. 163-168, 2000.
  12. [12] K. Yamada, “Network Parameter Setting for Reinforcement Learning Approaches Using Neural Networks,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.15, No.7, pp. 822-830, 2011.
  13. [13] H. Kimura, “Natural Gradient Actor-Critic Algorithms using Random Rectangular Coarse Coding,” The Society of Instrument and Control Engineers (SICE) Annual Conf. 2008, 2A17-1, pp. 2027-2034, 2008.
  14. [14] Y. Omori, “Recent Developments in Markov Chain Monte Carlo Method,” J. of Japan Statistical Society, Vol.31, No.3, pp. 305-344, 2001 (in Japanese).
  15. [15] S. Abe, “Neural Networks and Fuzzy Systems,” Springer, 1997.
  16. [16] H. Shimodaira, “A Weight Value Initialization Method for Improving Learning Performance of the Back Propagation Algorithm in Neural Networks,” J. of Information Processing, Vol.35, No.10, pp. 2046-2053, 1994 (in Japanese).
  17. [17] R. S. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” Advances in Neural Information Processing Systems, Vol.8, pp. 1038-1044, 1996.
  18. [18] B. Sallans and G. E. Hinton, “Reinforcement Learning with Factored States and Actions,” J. of Machine Learning Research, Vol.5, pp. 1063-1088, 2004.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Aug. 02, 2021