Improving the Robustness of Instance-Based Reinforcement Learning Robots by Metalearning

Toshiyuki Yasuda; Kousuke Araki; Kazuhiro Ohkura

doi:10.20965/jaciii.2011.p1065

single-jc.php

« previous

JACIII Vol.15 No.8 pp. 1065-1072

doi: 10.20965/jaciii.2011.p1065

(2011)

Paper:

Views over last 60 days: 671

Improving the Robustness of Instance-Based Reinforcement Learning Robots by Metalearning

Toshiyuki Yasuda, Kousuke Araki, and Kazuhiro Ohkura

Graduate School of Engineering, Hiroshima University, 1-4-1, Kagamiyama, Higashi-Hiroshima, Hiroshima 739-8527, Japan

Received:

March 16, 2011

Accepted:

July 15, 2011

Published:

October 20, 2011

Keywords:

multi-robot system, reinforcement learning, metalearning, robustness

Abstract

Learning autonomous robots have been widely discussed in recent years. Reinforcement learning (RL) is a popular method in this domain. However, its performance is quite sensitive to the segmentation of state and action spaces. To overcome this problem, we developed the new technique Bayesian-discriminationfunction-based RL (BRL). BRL has proven to be more effective than other standard RL algorithms in dealing withmulti-robot system(MRS) problems. However, as in most learning systems, occasional overfitting problems occur in BRL. This paper introduces an extended BRL for improving the robustness of MRSs. Metalearning based on the information entropy of fired rules is adopted for adaptive modification of its learning parameters. Computer simulations are conducted to verify the effectiveness of our proposed method.

Cite this article as:

T. Yasuda, K. Araki, and K. Ohkura, “Improving the Robustness of Instance-Based Reinforcement Learning Robots by Metalearning,” J. Adv. Comput. Intell. Intell. Inform., Vol.15 No.8, pp. 1065-1072, 2011.

Data files:

References

[1] R.S. Sutton and A.G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
[2] R.S. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” Advances in Neural Information Processing Systems, Vol. 8, pp. 1038-1044, MIT Press, 1996.
[3] J. Morimoto and K. Doya, “Acquisition of Stand-Up Behavior by a Real Robot using Hierarchical Reinforcement Learning for Motion Learning: Learning, “Stand Up” Trajectories,” Proc. of Intl. Conf. on Machine Learning, pp. 623-630, 2000.
[4] L.J. Lin, “Scaling Up Reinforcement Learning for Robot Control,” Proc. of the 10th Intl Conf. on Machine Learning, pp. 182-189, 1993.
[5] M. Asada, S. Noda, and K. Hosoda, “Action-Based Sensor Space Categorization for Robot Learning,” Proc. of IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, pp. 1502-1509, 1996.
[6] Y. Takahashi, M. Asada, and K. Hosoda, “Reasonable Performance in Less Learning Time by Real Robot Based on Incremental State Space Segmentation,” Proc. of IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, pp. 1518-1524, 1996.
[7] M. Svinin, F. Kojima, Y. Katada, and K. Ueda, “Initial Experiments on Reinforcement Learning Control of Cooperative Manipulations,” Proc. of IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, pp. 416-422, 2000.
[8] T. Yasuda and K. Ohkura, “Autonomous Role Assignment in Homogeneous Multi-Robot Systems,” Journal of Robotics and Mechatronics, Vol. 17, No. 5, pp. 596-604, 2005.
[9] T. Yasuda and K. Ohkura, “Improving Robustness of Reinforcement Learning for a Multi-Robot System Environment,” Proc. of the Fourth IEEE Intl. Workshop on Soft Computing as Transdisciplinary Science and Technology, pp. 265- 272, 2005.
[10] T. Yasuda and K. Ohkura, “Improving Search Efficiency in the Action Space of an Instance-Based Reinforcement Learning,” Advances in Artifical Life, the 9th European Conf. on Artificial Life, LNAI, Vol. 4648, pp. 325-334, 2007.
[11] K. Ohkura and R. Washizaki, “Robust Instance-Based Reinforcement Learning for Multi-Robot Systems,” Proc. of the 4th Intl. Conf. on Advanced Mechatronics, pp. 583-588, 2004.
[12] K. Doya, “Reinforcement Learning in Continuous Time and Space,” Neural Computation, Vol. 12, pp. 219-245, 2000.
[13] J. Peters and S. Schaal, “Natural actor critic,” Neurocomputing, Vol.71, 7-9, pp. 1180-1190, 2008.
[14] R.J. Williams, “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning,” Machine Learning, Vol. 8, pp. 229-256, 1992.
[15] K. Doya, “Metalearning and neuromodulation,” Neural Networks, Vol. 15, Issues 4-6, pp. 495-506, 2002.
[16] N. Schweighofer and K. Doya, “Meta-learning in Reinforcement Learning,” Neural Networks, Vol. 16, Issue 1, pp. 5-9, 2003.
[17] S. Elfwing, E. Uchibe, K. Doya, and H.I. Chiristensen, “Coevolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning,” Adaptive Behavior, Vol. 16, pp. 400-412, 2008.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] R.S. Sutton and A.G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.

[2] [2] R.S. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” Advances in Neural Information Processing Systems, Vol. 8, pp. 1038-1044, MIT Press, 1996.

[3] [3] J. Morimoto and K. Doya, “Acquisition of Stand-Up Behavior by a Real Robot using Hierarchical Reinforcement Learning for Motion Learning: Learning, “Stand Up” Trajectories,” Proc. of Intl. Conf. on Machine Learning, pp. 623-630, 2000.

[4] [4] L.J. Lin, “Scaling Up Reinforcement Learning for Robot Control,” Proc. of the 10th Intl Conf. on Machine Learning, pp. 182-189, 1993.

[5] [5] M. Asada, S. Noda, and K. Hosoda, “Action-Based Sensor Space Categorization for Robot Learning,” Proc. of IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, pp. 1502-1509, 1996.

[6] [6] Y. Takahashi, M. Asada, and K. Hosoda, “Reasonable Performance in Less Learning Time by Real Robot Based on Incremental State Space Segmentation,” Proc. of IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, pp. 1518-1524, 1996.

[7] [7] M. Svinin, F. Kojima, Y. Katada, and K. Ueda, “Initial Experiments on Reinforcement Learning Control of Cooperative Manipulations,” Proc. of IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, pp. 416-422, 2000.

[8] [8] T. Yasuda and K. Ohkura, “Autonomous Role Assignment in Homogeneous Multi-Robot Systems,” Journal of Robotics and Mechatronics, Vol. 17, No. 5, pp. 596-604, 2005.

[9] [9] T. Yasuda and K. Ohkura, “Improving Robustness of Reinforcement Learning for a Multi-Robot System Environment,” Proc. of the Fourth IEEE Intl. Workshop on Soft Computing as Transdisciplinary Science and Technology, pp. 265- 272, 2005.

[10] [10] T. Yasuda and K. Ohkura, “Improving Search Efficiency in the Action Space of an Instance-Based Reinforcement Learning,” Advances in Artifical Life, the 9th European Conf. on Artificial Life, LNAI, Vol. 4648, pp. 325-334, 2007.

[11] [11] K. Ohkura and R. Washizaki, “Robust Instance-Based Reinforcement Learning for Multi-Robot Systems,” Proc. of the 4th Intl. Conf. on Advanced Mechatronics, pp. 583-588, 2004.

[12] [12] K. Doya, “Reinforcement Learning in Continuous Time and Space,” Neural Computation, Vol. 12, pp. 219-245, 2000.

[13] [13] J. Peters and S. Schaal, “Natural actor critic,” Neurocomputing, Vol.71, 7-9, pp. 1180-1190, 2008.

[14] [14] R.J. Williams, “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning,” Machine Learning, Vol. 8, pp. 229-256, 1992.

[15] [15] K. Doya, “Metalearning and neuromodulation,” Neural Networks, Vol. 15, Issues 4-6, pp. 495-506, 2002.

[16] [16] N. Schweighofer and K. Doya, “Meta-learning in Reinforcement Learning,” Neural Networks, Vol. 16, Issue 1, pp. 5-9, 2003.

[17] [17] S. Elfwing, E. Uchibe, K. Doya, and H.I. Chiristensen, “Coevolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning,” Adaptive Behavior, Vol. 16, pp. 400-412, 2008.