Paper:
Views over last 60 days: 428
TD Learning with Neural Networks
Norio Baba
Information Science, Osaka-Kyoiku University, 4-698-1 Asahiga-Oka, Kashiwara-shi, Osaka 582-8582, Japan
Received:March 24, 1998Accepted:June 5, 1998Published:August 20, 1998
Keywords:Neural networks, TD-learning, Environmental change, Multiple outputs
Abstract
Temporal difference (TD) learning (TD learning), proposed by Sutton in the late 1980s, is very interesting prediction using obtained predictions for future prediction. Applying this learning to neural networks helps improve prediction performance using neural networks, after certain problems are solved. Major problems are as follows: 1) Prediction Pt at time t is assumed to be scalar in Sutton's original paper, raising the problem of "what is the rule for updating weight vector of the neural network if the neural network has multiple outputs?" 2) How do we derive individual components of gradient vector ∇wPt for weight vector w? This paper proposes how to handle these problems when TD learning is used in a neural network, focusing on the TD(0) algorithm, often used in TD learning. It proposes the rule for updating the neural network weight vector for a two-out neural network under problem 1) above, and explains the rule's validity. It then proposes computing every components of ∇wPt.
Cite this article as:N. Baba, “TD Learning with Neural Networks,” J. Robot. Mechatron., Vol.10 No.4, pp. 289-294, 1998.Data files: