Adaptive Critic Design with Local Gaussian Process Models

Wei Wang; Xin Chen; Jianxin He

doi:10.20965/jaciii.2016.p1135

single-jc.php

« previous

JACIII Vol.20 No.7 pp. 1135-1140

doi: 10.20965/jaciii.2016.p1135

(2016)

Paper:

Views over last 60 days: 1,161

Adaptive Critic Design with Local Gaussian Process Models

Wei Wang, Xin Chen^†, and Jianxin He

School of Automation, China University of Geosciences
Wuhan 430074, China

^†Corresponding author

Received:

July 5, 2016

Accepted:

October 1, 2016

Published:

December 20, 2016

Keywords:

local Gaussian process, adaptive critic design, value function approximation, two-phase value iteration, reinforcement learning

Abstract

In this paper, local Gaussian process (GP) approximation is introduced to build the critic network of adaptive dynamic programming (ADP). The sample data are partitioned into local regions, and for each region, an individual GP model is utilized. The nearest local model is used to predict a given state-action point. With the two-phase value iteration method for a Gaussian-kernel (GK)-based critic network which realizes the update of the hyper-parameters and value functions simultaneously, fast value function approximation can be achieved. Combining this critic network with an actor network, we present a local GK-based ADP approach. Simulations were carried out to demonstrate the feasibility of the proposed approach.

Cite this article as:

W. Wang, X. Chen, and J. He, “Adaptive Critic Design with Local Gaussian Process Models,” J. Adv. Comput. Intell. Intell. Inform., Vol.20 No.7, pp. 1135-1140, 2016.

Data files:

References

[1] R. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction,” Cambridge, MA, USA: MIT Press, 1998.
[2] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.
[3] W. B. Powell, “Approximate Dynamic Programming: Solving the Curses of Dimensionality,” New York, USA: Wiley, 2007.
[4] F. Y. Wang, H. Zhang, and D. Liu, “Adaptive dynamic programming: an introduction,” Computational Intelligence Magazine, IEEE, Vol.4, No.2, pp. 39-47, 2009.
[5] F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Trans. on., Vol.41, No.1, pp, 14-25, 2011.
[6] D. R. Liu and Q. L. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. on., Neural Networks and Learning Systems, Vol.25, No.3, pp. 621-634, 2014.
[7] X. Chen, P. H. Xie, Y. H. Xiong, et al., “Two-Phase Iteration for Value Function Approximation and Hyperparameter Optimization in Gaussian-Kernel-Based Adaptive Critic Design,” Mathematical Problems in Engineering, Vol.2015, 2015.
[8] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, et al., “Playing Atari with deep reinforcement learning,” Technical report. Deepmind Technologies, arXiv: 1312.5602 [cs.LG], 2013.
[9] A. A. Jamshidi and B. P. Warren, “A recursive local polynomial approximation method using dirichlet clouds and radial basis functions,” SIAM J. on Scientific Computation, 2013.
[10] X. N. Zhong, H. B. He, H. Zhang, and Z. Wang, “A neural network based online learning and control approach for Markov jump systems,” Neurocomputing, Vol.149, pp. 116-123, 2015.
[11] X. Xu, L. Zuo, and Z. Huang, “Reinforcement learning algorithms with function approximation: recent advances and applications,” Information Sciences, Vol.261, pp. 1-31, 2014.
[12] D. Ormoneit and S. Sen, “Kernel-Based Reinforcement Learning,” Machine Learning, Vol.49, No.2-3, pp. 161-178, 2002.
[13] N. Jong and P. Stone, “Kernel-Based Models for Reinforcement Learning,” ICML Workshop on Kernel Machines and Reinforcement Learning, Pittsburgh, PA, USA, 2006.
[14] C. E. Rasmussen and C. K. I. Williams, “Gaussian Processes for Machine Learning,” The MIT Press, 2006.
[15] R. B. Gramacy and D. W. Apley, “Local Gaussian process approximation for large computer experiments,” J. of Computational and Graphical Statistics, Vol.24, No.2, pp. 561-578, 2015.
[16] E. Snelson and Z. Ghahramani, “Local and global sparse Gaussian process approximations,” InAISTATS, Vol.11, pp. 524-531, 2007.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] R. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction,” Cambridge, MA, USA: MIT Press, 1998.

[2] [2] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.

[3] [3] W. B. Powell, “Approximate Dynamic Programming: Solving the Curses of Dimensionality,” New York, USA: Wiley, 2007.

[4] [4] F. Y. Wang, H. Zhang, and D. Liu, “Adaptive dynamic programming: an introduction,” Computational Intelligence Magazine, IEEE, Vol.4, No.2, pp. 39-47, 2009.

[5] [5] F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Trans. on., Vol.41, No.1, pp, 14-25, 2011.

[6] [6] D. R. Liu and Q. L. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. on., Neural Networks and Learning Systems, Vol.25, No.3, pp. 621-634, 2014.

[7] [7] X. Chen, P. H. Xie, Y. H. Xiong, et al., “Two-Phase Iteration for Value Function Approximation and Hyperparameter Optimization in Gaussian-Kernel-Based Adaptive Critic Design,” Mathematical Problems in Engineering, Vol.2015, 2015.

[8] [8] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, et al., “Playing Atari with deep reinforcement learning,” Technical report. Deepmind Technologies, arXiv: 1312.5602 [cs.LG], 2013.

[9] [9] A. A. Jamshidi and B. P. Warren, “A recursive local polynomial approximation method using dirichlet clouds and radial basis functions,” SIAM J. on Scientific Computation, 2013.

[10] [10] X. N. Zhong, H. B. He, H. Zhang, and Z. Wang, “A neural network based online learning and control approach for Markov jump systems,” Neurocomputing, Vol.149, pp. 116-123, 2015.

[11] [11] X. Xu, L. Zuo, and Z. Huang, “Reinforcement learning algorithms with function approximation: recent advances and applications,” Information Sciences, Vol.261, pp. 1-31, 2014.

[12] [12] D. Ormoneit and S. Sen, “Kernel-Based Reinforcement Learning,” Machine Learning, Vol.49, No.2-3, pp. 161-178, 2002.

[13] [13] N. Jong and P. Stone, “Kernel-Based Models for Reinforcement Learning,” ICML Workshop on Kernel Machines and Reinforcement Learning, Pittsburgh, PA, USA, 2006.

[14] [14] C. E. Rasmussen and C. K. I. Williams, “Gaussian Processes for Machine Learning,” The MIT Press, 2006.

[15] [15] R. B. Gramacy and D. W. Apley, “Local Gaussian process approximation for large computer experiments,” J. of Computational and Graphical Statistics, Vol.24, No.2, pp. 561-578, 2015.

[16] [16] E. Snelson and Z. Ghahramani, “Local and global sparse Gaussian process approximations,” InAISTATS, Vol.11, pp. 524-531, 2007.

Adaptive Critic Design with Local Gaussian Process Models

Wei Wang, Xin Chen†, and Jianxin He

Wei Wang, Xin Chen^†, and Jianxin He