JACIII Vol.7 No.3 pp. 276-282
doi: 10.20965/jaciii.2003.p0276


Analysis of a Method Improving Reinforcement Learning Agents’ Policies

Daisuke Kitakoshi*, Hiroyuki Shioya*, and Masahito Kurihara**

*Muroran Institute of Technology, Mizumoto 27-1, Muroran, 050-8585, Japan

**Graduate School of Hokkaido University, Kita 13 Nishi 8, Kita-ku, Sapporo, 060-8628, Japan

July 20, 2003
August 26, 2003
October 20, 2003
reinforcement learning, Bayesian Network, stochastic knowledge

Reinforcement learning (RL) is a kind of machine learning. It aims to optimize agents’ policies by adapting the agents to an environment according to rewards. In this paper, we propose a method for improving policies by using stochastic knowledge, in which reinforcement learning agents obtain. We use a Bayesian Network (BN), which is a stochastic model, as knowledge of an agent. Its structure is decided by minimum description length criterion using series of an agent’s input-output and rewards as sample data. A BN constructed in our study represents stochastic dependences between input-output and rewards. In our proposed method, policies are improved by supervised learning using the structure of BN (i.e. stochastic knowledge). The proposed improvement mechanism makes RL agents acquire more effective policies. We carry out simulations in the pursuit problem in order to show the effectiveness of our proposed method.

Cite this article as:
Daisuke Kitakoshi, Hiroyuki Shioya, and Masahito Kurihara, “Analysis of a Method Improving Reinforcement Learning Agents’ Policies,” J. Adv. Comput. Intell. Intell. Inform., Vol.7, No.3, pp. 276-282, 2003.
Data files:

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Feb. 25, 2021