Paper:

# Q-Learning in Continuous State-Action Space with Noisy and Redundant Inputs by Using a Selective Desensitization Neural Network

## Takaaki Kobayashi, Takeshi Shibuya, and Masahiko Morita

Faculty of Engineering, Information and Systems, University of Tsukuba

1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573 Japan

When applying reinforcement learning (RL) algorithms such as Q-learning to real-world applications, we must consider the influence of sensor noise. The simplest way to reduce such noise influence is to additionally use other types of sensors, but this may require more state space — and probably increase redundancy. Conventional value-function approximators used to RL in continuous state-action space do not deal appropriately with such situations. The selective desensitization neural network (SDNN) has high generalization ability and robustness against noise and redundant input. We therefore propose an SDNN-based value-function approximator for Q-learning in continuous state-action space, and evaluate its performance in terms of robustness against redundant input and sensor noise. Results show that our proposal is strongly robust against noise and redundant input and enables the agent to take better actions by using additional inputs without degrading learning efficiency. These properties are eminently advantageous in real-world applications such as in robotic systems.

- [1] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
- [2] C. J. C. H. Watkins, “Learning from delayed rewards,” Ph.D. thesis, University of Cambridge, 1989.
- [3] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, Vol.8, pp. 279-292, 1992.
- [4] C. Gaskett, D. Wettergreen, and A. Zelinsky, “Q-learning in continuous state and action spaces,” Proc. 12
^{th}Australian Joint Conf. on Artificial Intell., Sydney, pp. 417-428, 1999. - [5] G. Konidaris, S. Osentoski, and P. Thomas, “Value function approximation in reinforcement learning using the Fourier basis,” Proc. 25
^{th}Conf. on Artificial Intell., San Francisco, pp. 380-385, 2011. - [6] A. Geramifard, M. Bowling, and R. S. Sutton, “Incremental least-square temporal difference learning,” Proc. 21
^{th}Conf. on Artificial Intell., Boston, pp. 356-361, 2006. - [7] J. Park and I. W. Sandberg, “Universal approximation using radial-basis-function networks,” Neural Computation, Vol.3, No.2, pp. 246-257, 1991.
- [8] L. Jouffe, “Fuzzy inference system learning by reinforcement methods,” IEEE Trans. on Systems, Man, and Cybernetics, Part C: Applications and Reviews, Vol.28, No.3, pp. 338-355, 1998.
- [9] P. Y. Glorennec, “Reinforcement learning: An overview,” Proc. European Symposium on Intelligent Techniques (ESIT-00), Aachen, pp. 14-35, 2000.
- [10] K. Nonaka, F. Tanaka, and M. Morita, “Empirical comparison of feedforward neural networks on two-variable function approximation,” IEICE Trans. Inf. & Syst.(Japanese Edition), Vol.94, No.12, pp. 2114-2125, 2011.
- [11] K. Horie, A. Suemitsu, and M. Morita, “Direct estimation of hand motion speed from surface electromyograms using a selective desensitization neural network,” J. Signal Process., Vol.18, No.4, pp. 225-228, 2014.
- [12] T. Kobayashi, T. Shibuya, and M. Morita, “Q-learning in Continuous State-Action Space with Redundant Dimensions by Using a Selective Desensitization Neural Network,” Proc. SCIS&ISIS 2014, Kitakyushu, pp. 801-806, 2014.
- [13] M.W. Spong, “The swing up control problem for the acrobot,” IEEE Control Syst. Mag., Vol.15, No.1, pp. 49-55, 1995.