TD Learning with Neural Networks

Norio Baba

doi:10.20965/jrm.1998.p0289

single-rb.php

« previous

JRM Vol.10 No.4 pp. 289-294

doi: 10.20965/jrm.1998.p0289

(1998)

Paper:

Views over last 60 days: 507

TD Learning with Neural Networks

Norio Baba

Information Science, Osaka-Kyoiku University, 4-698-1 Asahiga-Oka, Kashiwara-shi, Osaka 582-8582, Japan

Received:

March 24, 1998

Accepted:

June 5, 1998

Published:

August 20, 1998

Keywords:

Neural networks, TD-learning, Environmental change, Multiple outputs

Abstract

Temporal difference (TD) learning (TD learning), proposed by Sutton in the late 1980s, is very interesting prediction using obtained predictions for future prediction. Applying this learning to neural networks helps improve prediction performance using neural networks, after certain problems are solved. Major problems are as follows: 1) Prediction P_t at time t is assumed to be scalar in Sutton's original paper, raising the problem of "what is the rule for updating weight vector of the neural network if the neural network has multiple outputs?" 2) How do we derive individual components of gradient vector ∇_wP_t for weight vector w? This paper proposes how to handle these problems when TD learning is used in a neural network, focusing on the TD(0) algorithm, often used in TD learning. It proposes the rule for updating the neural network weight vector for a two-out neural network under problem 1) above, and explains the rule's validity. It then proposes computing every components of ∇_wP_t.

Cite this article as:

N. Baba, “TD Learning with Neural Networks,” J. Robot. Mechatron., Vol.10 No.4, pp. 289-294, 1998.

Data files:

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.