Merging with Extraction Method for Transfer Learning in Actor-Critic

Toshiaki Takano; Haruhiko Takase; Hiroharu Kawanaka; Shinji Tsuruoka

doi:10.20965/jaciii.2011.p0814

single-jc.php

« previous

JACIII Vol.15 No.7 pp. 814-821

doi: 10.20965/jaciii.2011.p0814

(2011)

Paper:

Views over last 60 days: 770

Merging with Extraction Method for Transfer Learning in Actor-Critic

Toshiaki Takano^, Haruhiko Takase^, Hiroharu Kawanaka^*,
and Shinji Tsuruoka^**

^*Graduate School of Engineering, Mie University, 1577 Kurima-Machiya, Tsu, Mie 514-8507, Japan

^**Graduate School of Regional Innovation Studies, Mie University, 1577 Kurima-Machiya, Tsu, Mie 514-8507, Japan

Received:

March 3, 2011

Accepted:

May 9, 2011

Published:

September 20, 2011

Keywords:

transfer learning, reinforcement learning, actor-critic

Abstract

This paper aims to accelerate learning process of actor-critic method, which is one of the major reinforcement learning algorithms, by a transfer learning. Transfer learning accelerates learning processes for the target task by reusing knowledge of source policies for each source task. In general, it consists of a selection phase and a training phase. Agents select source policies that are similar to the target one without trial and error, and train the target task by referring selected policies. In this paper, we discuss the training phase, and the rest of the training algorithm is based on our previous method. We proposed the effective transfer method that consists of the extractionmethod and the mergingmethod. Agents extract action preferences that are related to reliable states, and state values that lead to preferred states. Extracted parameters are merged into the current parameters by taking weighted average. We apply the proposed algorithm to simple maze tasks, and show the effectiveness of the proposed method: reduce 16% episodes and 55% failures without transfer.

Cite this article as:

T. Takano, H. Takase, H. Kawanaka, and S. Tsuruoka, “Merging with Extraction Method for Transfer Learning in Actor-Critic,” J. Adv. Comput. Intell. Intell. Inform., Vol.15 No.7, pp. 814-821, 2011.

Data files:

References

[1] R. S. Sutton and A. G. Barto, “Reinforcement Learning,” MIT Press, Cambridge, MA, 1998.
[2] M.-R. Kolahdouz and M. J. Mahjoob, “A Reinforcement Learning Approach to Dynamic Object Manipulation in Noisy Environment,” Int. J. of Innovative Computing, Information and Control, Vol.6, No.4, pp. 1615-1622, 2010.
[3] P. Darbyshire and D. Wang, “Effects of Communication in Cooperative Q-Learning,” Int. J. of Innovative Computing, Information and Control, Vol.6, No.5, pp. 2113-2126, 2010.
[4] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning – A Survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.
[5] M. Wiering and J. Schmidhuber, “Fast Online Q(λ),” Machine Learning, Vol.33, pp. 105-115, 1998.
[6] A. P. S. Braga andA. F. R. Araújo, “Influence zones – A strategy to enhance reinforcement learning,” Neurocomputing, Vol.70, pp. 21-34, 2006.
[7] L. Matignon, G. J. Laurent, and N. L. Fort-Piat, “Reward Function and Initial Values – Better Choices for Accelerated Goal-Directed,” Lecture Notes in Computer Science, Vol.4131, pp. 840-849, 2006.
[8] S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” Technical Report, Dept. of Computer Science and Engineering, Hong Kong Univ. of Science and Technology, HKUST-CS08-08, 2008.
[9] M. E. Taylor and P. Stone, “Transfer Learning for Reinforcement Learning Domains: A Survey,” J. of Machine Learning Research, Vol.10, No.1, pp. 1633-1685, 2009.
[10] T. Takano, H. Takase, H. Kawanaka, H. Kita, T. Hayashi, and S. Tsuruoka, “Detection of the effective knowledge for knowledge reuse in Actor-Critic,” Proc. of the 19th Intelligent System Symposium and the 1st Int. Workshop on Aware Computing, pp. 624-627, 2009.
[11] B. Price and C. Boutilier, “Accelerating Reinforcement Learning through Implicit Imitation,” J. of Artificial Intelligence Research, Vol.19, pp. 569-629, 2003.
[12] F. Fernández and M. Veloso, “Probabilistic Policy Reuse in a Reinforcement Learning Agent,” Proc. of the fifth Int. joint Conf. on Autonomous agents and multiagent systems, pp. 720-727, 2006.
[13] I. H. Witten, “An Adaptive Optimal Controller for Discrete-Time Markov Environments,” Information and Control, Vol.34, pp. 286-295, 1977.
[14] A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike elements that can solve difficult learning control problems,” IEEE Trans. on Systems, Man, and Cybernetics, Vol.13, pp. 835-846, 1983.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] R. S. Sutton and A. G. Barto, “Reinforcement Learning,” MIT Press, Cambridge, MA, 1998.

[2] [2] M.-R. Kolahdouz and M. J. Mahjoob, “A Reinforcement Learning Approach to Dynamic Object Manipulation in Noisy Environment,” Int. J. of Innovative Computing, Information and Control, Vol.6, No.4, pp. 1615-1622, 2010.

[3] [3] P. Darbyshire and D. Wang, “Effects of Communication in Cooperative Q-Learning,” Int. J. of Innovative Computing, Information and Control, Vol.6, No.5, pp. 2113-2126, 2010.

[4] [4] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning – A Survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.

[5] [5] M. Wiering and J. Schmidhuber, “Fast Online Q(λ),” Machine Learning, Vol.33, pp. 105-115, 1998.

[6] [6] A. P. S. Braga andA. F. R. Araújo, “Influence zones – A strategy to enhance reinforcement learning,” Neurocomputing, Vol.70, pp. 21-34, 2006.

[7] [7] L. Matignon, G. J. Laurent, and N. L. Fort-Piat, “Reward Function and Initial Values – Better Choices for Accelerated Goal-Directed,” Lecture Notes in Computer Science, Vol.4131, pp. 840-849, 2006.

[8] [8] S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” Technical Report, Dept. of Computer Science and Engineering, Hong Kong Univ. of Science and Technology, HKUST-CS08-08, 2008.

[9] [9] M. E. Taylor and P. Stone, “Transfer Learning for Reinforcement Learning Domains: A Survey,” J. of Machine Learning Research, Vol.10, No.1, pp. 1633-1685, 2009.

[10] [10] T. Takano, H. Takase, H. Kawanaka, H. Kita, T. Hayashi, and S. Tsuruoka, “Detection of the effective knowledge for knowledge reuse in Actor-Critic,” Proc. of the 19th Intelligent System Symposium and the 1st Int. Workshop on Aware Computing, pp. 624-627, 2009.

[11] [11] B. Price and C. Boutilier, “Accelerating Reinforcement Learning through Implicit Imitation,” J. of Artificial Intelligence Research, Vol.19, pp. 569-629, 2003.

[12] [12] F. Fernández and M. Veloso, “Probabilistic Policy Reuse in a Reinforcement Learning Agent,” Proc. of the fifth Int. joint Conf. on Autonomous agents and multiagent systems, pp. 720-727, 2006.

[13] [13] I. H. Witten, “An Adaptive Optimal Controller for Discrete-Time Markov Environments,” Information and Control, Vol.34, pp. 286-295, 1977.

[14] [14] A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike elements that can solve difficult learning control problems,” IEEE Trans. on Systems, Man, and Cybernetics, Vol.13, pp. 835-846, 1983.

Merging with Extraction Method for Transfer Learning in Actor-Critic

Toshiaki Takano*, Haruhiko Takase*, Hiroharu Kawanaka*, and Shinji Tsuruoka**

Toshiaki Takano^, Haruhiko Takase^, Hiroharu Kawanaka^*,
and Shinji Tsuruoka^**