Analysis of a Method Improving Reinforcement Learning Agents’ Policies

Daisuke Kitakoshi; Hiroyuki Shioya; Masahito Kurihara

doi:10.20965/jaciii.2003.p0276

single-jc.php

« previous

JACIII Vol.7 No.3 pp. 276-282

doi: 10.20965/jaciii.2003.p0276

(2003)

Paper:

Views over last 60 days: 466

Analysis of a Method Improving Reinforcement Learning Agents’ Policies

Daisuke Kitakoshi^, Hiroyuki Shioya^, and Masahito Kurihara^**

^*Muroran Institute of Technology, Mizumoto 27-1, Muroran, 050-8585, Japan

^**Graduate School of Hokkaido University, Kita 13 Nishi 8, Kita-ku, Sapporo, 060-8628, Japan

Received:

July 20, 2003

Accepted:

August 26, 2003

Published:

October 20, 2003

Keywords:

reinforcement learning, Bayesian Network, stochastic knowledge

Abstract

Reinforcement learning (RL) is a kind of machine learning. It aims to optimize agents’ policies by adapting the agents to an environment according to rewards. In this paper, we propose a method for improving policies by using stochastic knowledge, in which reinforcement learning agents obtain. We use a Bayesian Network (BN), which is a stochastic model, as knowledge of an agent. Its structure is decided by minimum description length criterion using series of an agent’s input-output and rewards as sample data. A BN constructed in our study represents stochastic dependences between input-output and rewards. In our proposed method, policies are improved by supervised learning using the structure of BN (i.e. stochastic knowledge). The proposed improvement mechanism makes RL agents acquire more effective policies. We carry out simulations in the pursuit problem in order to show the effectiveness of our proposed method.

Cite this article as:

D. Kitakoshi, H. Shioya, and M. Kurihara, “Analysis of a Method Improving Reinforcement Learning Agents’ Policies,” J. Adv. Comput. Intell. Intell. Inform., Vol.7 No.3, pp. 276-282, 2003.

Data files:

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

Analysis of a Method Improving Reinforcement Learning Agents’ Policies

Daisuke Kitakoshi*, Hiroyuki Shioya*, and Masahito Kurihara**

Daisuke Kitakoshi^, Hiroyuki Shioya^, and Masahito Kurihara^**