Paper:

# Q-Learning in Continuous State-Action Space with Noisy and Redundant Inputs by Using a Selective Desensitization Neural Network

## Takaaki Kobayashi, Takeshi Shibuya, and Masahiko Morita

Faculty of Engineering, Information and Systems, University of Tsukuba

1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573 Japan

*J. Adv. Comput. Intell. Intell. Inform.*, Vol.19 No.6, pp. 825-832, 2015.

- [1] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
- [2] C. J. C. H. Watkins, “Learning from delayed rewards,” Ph.D. thesis, University of Cambridge, 1989.
- [3] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, Vol.8, pp. 279-292, 1992.
- [4] C. Gaskett, D. Wettergreen, and A. Zelinsky, “Q-learning in continuous state and action spaces,” Proc. 12
^{th}Australian Joint Conf. on Artificial Intell., Sydney, pp. 417-428, 1999. - [5] G. Konidaris, S. Osentoski, and P. Thomas, “Value function approximation in reinforcement learning using the Fourier basis,” Proc. 25
^{th}Conf. on Artificial Intell., San Francisco, pp. 380-385, 2011. - [6] A. Geramifard, M. Bowling, and R. S. Sutton, “Incremental least-square temporal difference learning,” Proc. 21
^{th}Conf. on Artificial Intell., Boston, pp. 356-361, 2006. - [7] J. Park and I. W. Sandberg, “Universal approximation using radial-basis-function networks,” Neural Computation, Vol.3, No.2, pp. 246-257, 1991.
- [8] L. Jouffe, “Fuzzy inference system learning by reinforcement methods,” IEEE Trans. on Systems, Man, and Cybernetics, Part C: Applications and Reviews, Vol.28, No.3, pp. 338-355, 1998.
- [9] P. Y. Glorennec, “Reinforcement learning: An overview,” Proc. European Symposium on Intelligent Techniques (ESIT-00), Aachen, pp. 14-35, 2000.
- [10] K. Nonaka, F. Tanaka, and M. Morita, “Empirical comparison of feedforward neural networks on two-variable function approximation,” IEICE Trans. Inf. & Syst.(Japanese Edition), Vol.94, No.12, pp. 2114-2125, 2011.
- [11] K. Horie, A. Suemitsu, and M. Morita, “Direct estimation of hand motion speed from surface electromyograms using a selective desensitization neural network,” J. Signal Process., Vol.18, No.4, pp. 225-228, 2014.
- [12] T. Kobayashi, T. Shibuya, and M. Morita, “Q-learning in Continuous State-Action Space with Redundant Dimensions by Using a Selective Desensitization Neural Network,” Proc. SCIS&ISIS 2014, Kitakyushu, pp. 801-806, 2014.
- [13] M.W. Spong, “The swing up control problem for the acrobot,” IEEE Control Syst. Mag., Vol.15, No.1, pp. 49-55, 1995.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.