Paper:

# Network Parameter Setting for Reinforcement Learning Approaches Using Neural Networks

## Kazuaki Yamada

Department of Mechanical Engineering, Faculty of Science and Engineering, Toyo University, 2100 Kujirai, Kawagoe-shi, Saitama 350-8585, Japan

Reinforcement learning approaches are attracting attention as a technique for constructing a trial-anderror mapping function between sensors and motors of an autonomous mobile robot. Conventional reinforcement learning approaches use a look-up table to express the mapping function between grid state and grid action spaces. The grid size greatly adversely affects the learning performance of reinforcement learning algorithms. To avoid this, researchers have proposed reinforcement learning algorithms using neural networks to express the mapping function between continuous state space and action. A designer, however, must set the number of middle neurons and initial values of weight parameters appropriately to improve the approximate accuracy of neural networks. This paper proposes a new method that automatically sets the number ofmiddle neurons and initial values of weight parameters based on the dimension number of the sensor space. The feasibility of proposed method is demonstrated using an autonomous mobile robot navigation problem and is evaluated by comparing it with two types of Q-learning as follows: Q-learning using RBF networks and Q-learning using neural networks whose parameters are set by a designer.

*J. Adv. Comput. Intell. Intell. Inform.*, Vol.15, No.7, pp. 822-930, 2011.

- [1] K. Sibata, “Emergence of Intelligence by Reinforcement Learning and a Neural Network,” J. of The Society of Instrument and Control Engineers, Vol.48, No.1, pp. 106-111, 2009.
- [2] R. S. Sutton and A. Barto, “Reinforcement Learning: An Introduction,” The MIT Press, 1998.
- [3] Y. Takahashi, M. Asada, S. Noda, and K. Hosoda, “Sensor Space Segmentation for Mobile Robot Learning,” Proc. of ICMAS’96 Workshop on Learning, Interaction and Organizations inMultiagent Environment, 1996.
- [4] T. Yairi, K. Hori, and S. Nakasuka, “Autonomous Reconstruction of State Space for Learning of Robot Behavior,” Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems 2000, pp. 891-896, 2000.
- [5] R. Goto and H. Matsuo, “State Generalization Method with Support Vector Machines in Reinforcement Learning,” Systems and Computers in Japan, Vol.37, No.9, pp. 77-86, 2006.
- [6] M. Asada, S. Noda, and K. Hosoda, “Action-Based Sensor Space Categorization for Robot Learning,” Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems 1996, pp. 1502-1509, 1996.
- [7] T. Yasuda and K. Ohkura, “Improving Segmentation of Action Space for the Instance-Based Reinforcement Learning Method Called BRL (1st Report, Behavior Acquisition for a Mobile Robot),” The Japan Society of Mechanical Engineers, Vol.74, No.747, pp. 2747-2754, 2008.
- [8] N. Kotani, M. Nunobiki, and K. Taniguchi, “A Novel Clustering Method Curbing the Number of States in Reinforcement Learning,” The Institute of Systems, Control and Information Engineers, Vol.22, No.1, pp. 21-28, 2009.
- [9] K. Shibata, M. Sugisaka, and K. Ito, “Fast and Stable Learning in Direct-Vision-Based Reinforcement Learning,” Proc. of Int. Symposium on Artificial Life and Robotics 6th, Vol.1, pp. 200-203, 2001.
- [10] J. Morimoto and K. Doya, “Reinforcement learning of dynamic motor sequences: Learning to stand up,” Proc. of IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vol.3, pp. 1721-1726, 1998.
- [11] J. Yoshimoto, S. Ishii, and M. Sato, “On-Line EM Reinforcement Learning for Automatic Control of Continuous Dynamical Systems,” J. of Systems, Control and Information Engineers, Vol.16, No.5, pp. 209-217, 2003.
- [12] S. Abe, “Neural Networks and Fuzzy Systems,” Springer, 1997.
- [13] H. Shimodaira, “A Weight Value Initialization Method for Improving Learning Performance of the Back Propagation Algorithm in Neural Networks,” J. of Information Processing, Vol.35, No.10, pp. 2046-2053, 1994.