JRM Vol.10 No.4 pp. 289-294
doi: 10.20965/jrm.1998.p0289


TD Learning with Neural Networks

Norio Baba

Information Science, Osaka-Kyoiku University, 4-698-1 Asahiga-Oka, Kashiwara-shi, Osaka 582-8582, Japan

March 24, 1998
June 5, 1998
August 20, 1998
Neural networks, TD-learning, Environmental change, Multiple outputs

Temporal difference (TD) learning (TD learning), proposed by Sutton in the late 1980s, is very interesting prediction using obtained predictions for future prediction. Applying this learning to neural networks helps improve prediction performance using neural networks, after certain problems are solved. Major problems are as follows: 1) Prediction Pt at time t is assumed to be scalar in Sutton’s original paper, raising the problem of “what is the rule for updating weight vector of the neural network if the neural network has multiple outputs?” 2) How do we derive individual components of gradient vector ∇wPt for weight vector w? This paper proposes how to handle these problems when TD learning is used in a neural network, focusing on the TD(0) algorithm, often used in TD learning. It proposes the rule for updating the neural network weight vector for a two-out neural network under problem 1) above, and explains the rule’s validity. It then proposes computing every components of ∇wPt.

Cite this article as:
Norio Baba, “TD Learning with Neural Networks,” J. Robot. Mechatron., Vol.10, No.4, pp. 289-294, 1998.
Data files:

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Mar. 05, 2021