Analysis of a Method Improving Reinforcement Learning Agents’ Policies
Daisuke Kitakoshi*, Hiroyuki Shioya*, and Masahito Kurihara**
*Muroran Institute of Technology, Mizumoto 27-1, Muroran, 050-8585, Japan
**Graduate School of Hokkaido University, Kita 13 Nishi 8, Kita-ku, Sapporo, 060-8628, Japan
Reinforcement learning (RL) is a kind of machine learning. It aims to optimize agents’ policies by adapting the agents to an environment according to rewards. In this paper, we propose a method for improving policies by using stochastic knowledge, in which reinforcement learning agents obtain. We use a Bayesian Network (BN), which is a stochastic model, as knowledge of an agent. Its structure is decided by minimum description length criterion using series of an agent’s input-output and rewards as sample data. A BN constructed in our study represents stochastic dependences between input-output and rewards. In our proposed method, policies are improved by supervised learning using the structure of BN (i.e. stochastic knowledge). The proposed improvement mechanism makes RL agents acquire more effective policies. We carry out simulations in the pursuit problem in order to show the effectiveness of our proposed method.