JACIII Vol.21 No.5 pp. 840-848
doi: 10.20965/jaciii.2017.p0840


Experimental Study on Behavior Acquisition of Mobile Robot by Deep Q-Network

Hikaru Sasaki*, Tadashi Horiuchi**, and Satoru Kato***

*Graduate School of Information Science, Nara Institute of Science and Technology
8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan

**Department of Control Engineering, National Institute of Technology, Matsue College
14-4 Nishi-ikuma, Matsue, Shimane 690-8518, Japan

***Department of Information Engineering, National Institute of Technology, Matsue College
14-4 Nishi-ikuma, Matsue, Shimane 690-8518, Japan

March 21, 2017
July 21, 2017
September 20, 2017
deep reinforcement learning, deep Q-network, mobile robot, behavior acquisition

Deep Q-network (DQN) is one of the most famous methods of deep reinforcement learning. DQN approximates the action-value function using Convolutional Neural Network (CNN) and updates it using Q-learning. In this study, we applied DQN to robot behavior learning in a simulation environment. We constructed the simulation environment for a two-wheeled mobile robot using the robot simulation software, Webots. The mobile robot acquired good behavior such as avoiding walls and moving along a center line by learning from high-dimensional visual information supplied as input data. We propose a method that reuses the best target network so far when the learning performance suddenly falls. Moreover, we incorporate Profit Sharing method into DQN in order to accelerate learning. Through the simulation experiment, we confirmed that our method is effective.

  1. [1] V. Mnih et al., “Playing Atari with Deep Reinforcement Learning,” Proc. of NIPS 2013 Deep Learning Workshop, 2013.
  2. [2] V. Mnih et al., “Human-level Control through Deep Reinforcement Learning,” Nature, Vol.518, pp. 529-533, 2015.
  3. [3] Y. LeCun et al., “Gradient-based Learning applied to Document Recognition,” Proc. of the IEEE, Vol.86, No.11, pp. 2278-2324, 1998.
  4. [4] Y. Matsuo, “Expectation of Robot Field from Artificial Intelligence Field,” J. of the Robotics Society of Japan, Vol.35, No.3, pp. 2-7, 2017 (in Japanese).
  5. [5] E. Yong, “Inside the Eye: Nature’s Most Exquisite Creation,” National Geographic Magazine, Vol.22, No.2, 2016.
  6. [6] S. Yoshigi, T. Mikami, and T. Horiuchi, “Behavior Acquisition of Autonomous Four-Legged Robot with CPG and Reinforcement Learning,” Proc. of 2016 Annual Conf. of Electronics, Information and Systems Society, I.E.E. of Japan, pp. 1311-1312, 2016 (in Japanese).
  7. [7] N. Nagami, R. Kishimoto, and T. Horiuchi, “Acquisition of Goal-Oriented Behavior for Snake-Like Robot by CPG and Reinforcement Learning,” J. of Japan Society for Fuzzy Theory and Intelligent Informatics, Vol.29, No.2, pp. 551-557, 2017 (in Japanese).
  8. [8] L. Tai and M. Liu, “Mobile Robots Exploration through CNN based Reinforcement Learning,” Robotics and Biomimetics, Vol.3, No.24, 2016.
  9. [9] F. Zhang et al., “Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control,” arXiv:1511.03791, 13 Nov 2015.
  10. [10] S. Amarjyoti, “Deep reinforcement learning for robotic manipulation – the state of the art,” arXiv:1701.08878, 31 Jan 2017.
  11. [11] D. Silver, “Tutorial: Deep Reinforcement Learning,” Proc. of the 33rd Int. Conf. on Machine Learning (ICML 2016), 2016.
  12. [12] T. P. Lillicrap et al., “Continuous Control with Deep Reinforcement Learning,” arXiv:1509.02971, 29 Feb 2016.
  13. [13] V. Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” arXiv:1602.01783, 16 Jun 2016.
  14. [14] C. J. Watkins and P. Dayan, “Technical Note: Q-Learning,” Machine Learning, Vol.8, pp. 279-292, 1992.
  15. [15] A. Graves, “Generating Sequences with Recurrent Neural Networks,” arXiv:1308.0850, 2013.
  16. [16] K. Miyazaki, H. Kimura, and S. Kobayashi, “Theory and Applications of Reinforcement Learning Based on Profit Sharing,” Trans. of the Japanese Society for Artificial Intelligence, Vol.14, No.5, pp. 800-807, 1999 (in Japanese).
  17. [17] T. Horiuchi, A. Fujino, O. Katai, and T. Sawaragi, “Q-PSP Learning: An Exploitation-Oriented Q-Learning Algorithm and Its Applications,” Trans. of the Society of Instrument and Control Engineers, Vol.35, No.5, pp. 645-653, 1999 (in Japanese).
  18. [18] K. Miyazaki, “Experimental Results of Exploitation-oriented Learning with Deep Learning,” The papers of Technical Meeting on Systems, I.E.E. of Japan, ST-16-049, pp. 41-46, 2016 (in Japanese).

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, IE9,10,11, Opera.

Last updated on Oct. 20, 2017