Acceleration of Reinforcement Learning by a Mobile Robot Using Generalized Inhibition  Rules

Kousuke Inoue; Tamio Arai; JunOta

doi:10.20965/jrm.2010.p0122

single-rb.php

« previous

JRM Vol.22 No.1 pp. 122-133

(2010)

doi: 10.20965/jrm.2010.p0122

Paper:

Views over last 60 days: 936

Acceleration of Reinforcement Learning by a Mobile Robot Using Generalized Inhibition Rules

Kousuke Inoue^, Tamio Arai^, and JunOta^

^*Department of Intelligent Systems Engineering, Faculty of Engineering, Ibaraki University, 4-12-1 Nakanarusawa-cho, Hitachi, Ibaraki 316-8511, Japan

^**Department of Precision Engineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan

^***Research into Artifacts, Center for Engineering (RACE), The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8568, Japan

Received:

March 26, 2009

Accepted:

December 25, 2009

Published:

February 20, 2010

Keywords:

optical tactile sensor, elastic body, cubic polynomial deformation, 3D force information

Abstract

One very fundamental problem in behavioral learning by an agent is that it takes quite a long time to acquire optimal behavior. In order to solve this problem, in this paper, we propose an approach to make learning processes more efficient by the use of generalized knowledge. In this approach, the agent repeats learning processes for different tasks and extracts behavioral rules that are commonly harmful to task execution by the use of statistical method. After sufficient experience is accumulated, the generalized rules are extracted from the experience and are applied to subsequent learning processes, and, consequently, the learning processes are accelerated by inhibiting commonly harmful behaviors. In order to achieve generality of rule expression, the description of the rules is based on egocentric information, namely, raw data of observations and actions experienced by the agent. In order to avoid a perceptual aliasing problem, the rule expression includes information on sequential experience and a mechanism is introduced to control the balance of utility and generality of the rules. The proposedmethod is examined in navigation tasks by amobile robot in grid environments as an example of application. The results show that the proposed method accelerates learning processes.

Cite this article as:

K. Inoue, T. Arai, and JunOta, “Acceleration of Reinforcement Learning by a Mobile Robot Using Generalized Inhibition Rules,” J. Robot. Mechatron., Vol.22 No.1, pp. 122-133, 2010.

Data files:

References

[1] D. E. Goldberg, “Genetic algorithms in search, optimization, and machine learning,” Addison-Wesley, 1989.
[2] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A Survey,” J. of Artificial Intelligence Research, No.4, pp. 237-285, 1996.
[3] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial Intelligence, Vol.101, pp. 99-134, 1998.
[4] L. Chrisman, “Reinforcement learning with perceptual aliasing: The perceptual distinctions approach,” In National Conf. on Artificial Intelligence, pp. 183-188, 1992.
[5] Y. Takahashi, K. Hikita, and M. Asada, “Incremental Purposive Behavior Acquisition based on Self-Interpretation of Instructions by Coach,” Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 686-693, 2003.
[6] T. Minato and M. Asada, “Skill Acquisition and Self-Improvement for Environmental Change Adaptation of Mobile Robot,” Proc. Fifth Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats 5, pp. 360-365, 1998.
[7] T. Kondo and K. Ito, “An Environment Cognition and Motor Adaptation Model by Eliciting Sensorimotor Constraints based on Timeseries Observations,” J. of Robotics and Mechatronics, Vol.19, No.4, pp. 395-401, 2007.
[8] T. Kondo and K. Ito, “An Incremental Learning using Schema Extraction Mechanism for Autonomous Mobile Robot,” Proc. IEEE Int. Symp. on Computational Intelligence in Robotics and Automation, pp. 1126-1131, 2003.
[9] S. Thrun and T. Mitchell, “Lifelong Robot Learning,” Robotics and Autonomous Systems, Vol.15, pp. 25-46, 1995.
[10] F. Tanaka and M. Yamamura, “An approach to lifelong reinforcement learning through multiple environments,” 6th European Workshop on Learning Robots, pp. 93-99, 1997.
[11] J. H. Holland, “Adaptation,” Progress in Theoretical Biology IV, eds. R. Rosen and F. Snell, Academic Press, pp. 263-293, 1976.
[12] M. Furukawa, M.Watanabe, and Y. Kakazu, “Reduction of stateaction mapping on reinforcement learning for multi-agv autonomous driving,” Proc. Int. Conf. on Manufacturing Milestones toward the 21st Century MM21, pp. 513-518, 1997.
[13] M. Furukawa et al., “Collision avoidance for multi-agv by learning collision,” Intelligent Engineering systems through artificial neural networks, Eds. C. Gadili et al., Vol.14, ASME Press, pp. 143-148, 2004.
[14] C. Stefano and A. Marcelli, “Generalization vs. specialization: quantitative evaluation criteria for genetics-based learning systems,” Proc. 1997 IEEE Int. Conf. on Systems, Man, and Cybernetics, 1997.
[15] C. J. C. H. Watkins and P. Dayan, “Technical Note : Q-Learning,” Machine Learning, Vol.8, pp. 279-292, 1992.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] D. E. Goldberg, “Genetic algorithms in search, optimization, and machine learning,” Addison-Wesley, 1989.

[B2] [2] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A Survey,” J. of Artificial Intelligence Research, No.4, pp. 237-285, 1996.

[B3] [3] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial Intelligence, Vol.101, pp. 99-134, 1998.

[B4] [4] L. Chrisman, “Reinforcement learning with perceptual aliasing: The perceptual distinctions approach,” In National Conf. on Artificial Intelligence, pp. 183-188, 1992.

[B5] [5] Y. Takahashi, K. Hikita, and M. Asada, “Incremental Purposive Behavior Acquisition based on Self-Interpretation of Instructions by Coach,” Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 686-693, 2003.

[B6] [6] T. Minato and M. Asada, “Skill Acquisition and Self-Improvement for Environmental Change Adaptation of Mobile Robot,” Proc. Fifth Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats 5, pp. 360-365, 1998.

[B7] [7] T. Kondo and K. Ito, “An Environment Cognition and Motor Adaptation Model by Eliciting Sensorimotor Constraints based on Timeseries Observations,” J. of Robotics and Mechatronics, Vol.19, No.4, pp. 395-401, 2007.

[B8] [8] T. Kondo and K. Ito, “An Incremental Learning using Schema Extraction Mechanism for Autonomous Mobile Robot,” Proc. IEEE Int. Symp. on Computational Intelligence in Robotics and Automation, pp. 1126-1131, 2003.

[B9] [9] S. Thrun and T. Mitchell, “Lifelong Robot Learning,” Robotics and Autonomous Systems, Vol.15, pp. 25-46, 1995.

[B10] [10] F. Tanaka and M. Yamamura, “An approach to lifelong reinforcement learning through multiple environments,” 6th European Workshop on Learning Robots, pp. 93-99, 1997.

[B11] [11] J. H. Holland, “Adaptation,” Progress in Theoretical Biology IV, eds. R. Rosen and F. Snell, Academic Press, pp. 263-293, 1976.

[B12] [12] M. Furukawa, M.Watanabe, and Y. Kakazu, “Reduction of stateaction mapping on reinforcement learning for multi-agv autonomous driving,” Proc. Int. Conf. on Manufacturing Milestones toward the 21st Century MM21, pp. 513-518, 1997.

[B13] [13] M. Furukawa et al., “Collision avoidance for multi-agv by learning collision,” Intelligent Engineering systems through artificial neural networks, Eds. C. Gadili et al., Vol.14, ASME Press, pp. 143-148, 2004.

[B14] [14] C. Stefano and A. Marcelli, “Generalization vs. specialization: quantitative evaluation criteria for genetics-based learning systems,” Proc. 1997 IEEE Int. Conf. on Systems, Man, and Cybernetics, 1997.

[B15] [15] C. J. C. H. Watkins and P. Dayan, “Technical Note : Q-Learning,” Machine Learning, Vol.8, pp. 279-292, 1992.

Acceleration of Reinforcement Learning by a Mobile Robot Using Generalized Inhibition Rules

Kousuke Inoue*, Tamio Arai**, and JunOta***

Kousuke Inoue^, Tamio Arai^, and JunOta^