JACIII Vol.15 No.7 pp. 814-821
doi: 10.20965/jaciii.2011.p0814


Merging with Extraction Method for Transfer Learning in Actor-Critic

Toshiaki Takano*, Haruhiko Takase*, Hiroharu Kawanaka*,
and Shinji Tsuruoka**

*Graduate School of Engineering, Mie University, 1577 Kurima-Machiya, Tsu, Mie 514-8507, Japan

**Graduate School of Regional Innovation Studies, Mie University, 1577 Kurima-Machiya, Tsu, Mie 514-8507, Japan

March 3, 2011
May 9, 2011
September 20, 2011
transfer learning, reinforcement learning, actor-critic

This paper aims to accelerate learning process of actor-critic method, which is one of the major reinforcement learning algorithms, by a transfer learning. Transfer learning accelerates learning processes for the target task by reusing knowledge of source policies for each source task. In general, it consists of a selection phase and a training phase. Agents select source policies that are similar to the target one without trial and error, and train the target task by referring selected policies. In this paper, we discuss the training phase, and the rest of the training algorithm is based on our previous method. We proposed the effective transfer method that consists of the extractionmethod and the mergingmethod. Agents extract action preferences that are related to reliable states, and state values that lead to preferred states. Extracted parameters are merged into the current parameters by taking weighted average. We apply the proposed algorithm to simple maze tasks, and show the effectiveness of the proposed method: reduce 16% episodes and 55% failures without transfer.

Cite this article as:
Toshiaki Takano, Haruhiko Takase, Hiroharu Kawanaka, and
and Shinji Tsuruoka, “Merging with Extraction Method for Transfer Learning in Actor-Critic,” J. Adv. Comput. Intell. Intell. Inform., Vol.15, No.7, pp. 814-821, 2011.
Data files:
  1. [1] R. S. Sutton and A. G. Barto, “Reinforcement Learning,” MIT Press, Cambridge, MA, 1998.
  2. [2] M.-R. Kolahdouz and M. J. Mahjoob, “A Reinforcement Learning Approach to Dynamic Object Manipulation in Noisy Environment,” Int. J. of Innovative Computing, Information and Control, Vol.6, No.4, pp. 1615-1622, 2010.
  3. [3] P. Darbyshire and D. Wang, “Effects of Communication in Cooperative Q-Learning,” Int. J. of Innovative Computing, Information and Control, Vol.6, No.5, pp. 2113-2126, 2010.
  4. [4] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning – A Survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.
  5. [5] M. Wiering and J. Schmidhuber, “Fast Online Q(λ),” Machine Learning, Vol.33, pp. 105-115, 1998.
  6. [6] A. P. S. Braga andA. F. R. Araújo, “Influence zones – A strategy to enhance reinforcement learning,” Neurocomputing, Vol.70, pp. 21-34, 2006.
  7. [7] L. Matignon, G. J. Laurent, and N. L. Fort-Piat, “Reward Function and Initial Values – Better Choices for Accelerated Goal-Directed,” Lecture Notes in Computer Science, Vol.4131, pp. 840-849, 2006.
  8. [8] S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” Technical Report, Dept. of Computer Science and Engineering, Hong Kong Univ. of Science and Technology, HKUST-CS08-08, 2008.
  9. [9] M. E. Taylor and P. Stone, “Transfer Learning for Reinforcement Learning Domains: A Survey,” J. of Machine Learning Research, Vol.10, No.1, pp. 1633-1685, 2009.
  10. [10] T. Takano, H. Takase, H. Kawanaka, H. Kita, T. Hayashi, and S. Tsuruoka, “Detection of the effective knowledge for knowledge reuse in Actor-Critic,” Proc. of the 19th Intelligent System Symposium and the 1st Int. Workshop on Aware Computing, pp. 624-627, 2009.
  11. [11] B. Price and C. Boutilier, “Accelerating Reinforcement Learning through Implicit Imitation,” J. of Artificial Intelligence Research, Vol.19, pp. 569-629, 2003.
  12. [12] F. Fernández and M. Veloso, “Probabilistic Policy Reuse in a Reinforcement Learning Agent,” Proc. of the fifth Int. joint Conf. on Autonomous agents and multiagent systems, pp. 720-727, 2006.
  13. [13] I. H. Witten, “An Adaptive Optimal Controller for Discrete-Time Markov Environments,” Information and Control, Vol.34, pp. 286-295, 1977.
  14. [14] A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike elements that can solve difficult learning control problems,” IEEE Trans. on Systems, Man, and Cybernetics, Vol.13, pp. 835-846, 1983.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Feb. 25, 2021