Paper:
Merging with Extraction Method for Transfer Learning in Actor-Critic
Toshiaki Takano*, Haruhiko Takase*, Hiroharu Kawanaka*,
and Shinji Tsuruoka**
*Graduate School of Engineering, Mie University, 1577 Kurima-Machiya, Tsu, Mie 514-8507, Japan
**Graduate School of Regional Innovation Studies, Mie University, 1577 Kurima-Machiya, Tsu, Mie 514-8507, Japan
- [1] R. S. Sutton and A. G. Barto, “Reinforcement Learning,” MIT Press, Cambridge, MA, 1998.
- [2] M.-R. Kolahdouz and M. J. Mahjoob, “A Reinforcement Learning Approach to Dynamic Object Manipulation in Noisy Environment,” Int. J. of Innovative Computing, Information and Control, Vol.6, No.4, pp. 1615-1622, 2010.
- [3] P. Darbyshire and D. Wang, “Effects of Communication in Cooperative Q-Learning,” Int. J. of Innovative Computing, Information and Control, Vol.6, No.5, pp. 2113-2126, 2010.
- [4] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning – A Survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.
- [5] M. Wiering and J. Schmidhuber, “Fast Online Q(λ),” Machine Learning, Vol.33, pp. 105-115, 1998.
- [6] A. P. S. Braga andA. F. R. Araújo, “Influence zones – A strategy to enhance reinforcement learning,” Neurocomputing, Vol.70, pp. 21-34, 2006.
- [7] L. Matignon, G. J. Laurent, and N. L. Fort-Piat, “Reward Function and Initial Values – Better Choices for Accelerated Goal-Directed,” Lecture Notes in Computer Science, Vol.4131, pp. 840-849, 2006.
- [8] S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” Technical Report, Dept. of Computer Science and Engineering, Hong Kong Univ. of Science and Technology, HKUST-CS08-08, 2008.
- [9] M. E. Taylor and P. Stone, “Transfer Learning for Reinforcement Learning Domains: A Survey,” J. of Machine Learning Research, Vol.10, No.1, pp. 1633-1685, 2009.
- [10] T. Takano, H. Takase, H. Kawanaka, H. Kita, T. Hayashi, and S. Tsuruoka, “Detection of the effective knowledge for knowledge reuse in Actor-Critic,” Proc. of the 19th Intelligent System Symposium and the 1st Int. Workshop on Aware Computing, pp. 624-627, 2009.
- [11] B. Price and C. Boutilier, “Accelerating Reinforcement Learning through Implicit Imitation,” J. of Artificial Intelligence Research, Vol.19, pp. 569-629, 2003.
- [12] F. Fernández and M. Veloso, “Probabilistic Policy Reuse in a Reinforcement Learning Agent,” Proc. of the fifth Int. joint Conf. on Autonomous agents and multiagent systems, pp. 720-727, 2006.
- [13] I. H. Witten, “An Adaptive Optimal Controller for Discrete-Time Markov Environments,” Information and Control, Vol.34, pp. 286-295, 1977.
- [14] A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike elements that can solve difficult learning control problems,” IEEE Trans. on Systems, Man, and Cybernetics, Vol.13, pp. 835-846, 1983.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.