Paper:

# TD Learning with Neural Networks

## Norio Baba

Information Science, Osaka-Kyoiku University, 4-698-1 Asahiga-Oka, Kashiwara-shi, Osaka 582-8582, Japan

*P*at time

_{t}*t*is assumed to be scalar in Sutton's original paper, raising the problem of "what is the rule for updating weight vector of the neural network if the neural network has multiple outputs?" 2) How do we derive individual components of gradient vector ∇

_{w}

*P*for weight vector

_{t}*w*? This paper proposes how to handle these problems when TD learning is used in a neural network, focusing on the TD(0) algorithm, often used in TD learning. It proposes the rule for updating the neural network weight vector for a two-out neural network under problem 1) above, and explains the rule's validity. It then proposes computing every components of ∇

_{w}

*P*.

_{t}*J. Robot. Mechatron.*, Vol.10 No.4, pp. 289-294, 1998.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.