JACIII Vol.20 No.7 pp. 1135-1140
doi: 10.20965/jaciii.2016.p1135


Adaptive Critic Design with Local Gaussian Process Models

Wei Wang, Xin Chen, and Jianxin He

School of Automation, China University of Geosciences
Wuhan 430074, China

Corresponding author

July 5, 2016
October 1, 2016
December 20, 2016
local Gaussian process, adaptive critic design, value function approximation, two-phase value iteration, reinforcement learning

In this paper, local Gaussian process (GP) approximation is introduced to build the critic network of adaptive dynamic programming (ADP). The sample data are partitioned into local regions, and for each region, an individual GP model is utilized. The nearest local model is used to predict a given state-action point. With the two-phase value iteration method for a Gaussian-kernel (GK)-based critic network which realizes the update of the hyper-parameters and value functions simultaneously, fast value function approximation can be achieved. Combining this critic network with an actor network, we present a local GK-based ADP approach. Simulations were carried out to demonstrate the feasibility of the proposed approach.

  1. [1] R. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction,” Cambridge, MA, USA: MIT Press, 1998.
  2. [2] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.
  3. [3] W. B. Powell, “Approximate Dynamic Programming: Solving the Curses of Dimensionality,” New York, USA: Wiley, 2007.
  4. [4] F. Y. Wang, H. Zhang, and D. Liu, “Adaptive dynamic programming: an introduction,” Computational Intelligence Magazine, IEEE, Vol.4, No.2, pp. 39-47, 2009.
  5. [5] F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Trans. on., Vol.41, No.1, pp, 14-25, 2011.
  6. [6] D. R. Liu and Q. L. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. on., Neural Networks and Learning Systems, Vol.25, No.3, pp. 621-634, 2014.
  7. [7] X. Chen, P. H. Xie, Y. H. Xiong, et al., “Two-Phase Iteration for Value Function Approximation and Hyperparameter Optimization in Gaussian-Kernel-Based Adaptive Critic Design,” Mathematical Problems in Engineering, Vol.2015, 2015.
  8. [8] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, et al., “Playing Atari with deep reinforcement learning,” Technical report. Deepmind Technologies, arXiv: 1312.5602 [cs.LG], 2013.
  9. [9] A. A. Jamshidi and B. P. Warren, “A recursive local polynomial approximation method using dirichlet clouds and radial basis functions,” SIAM J. on Scientific Computation, 2013.
  10. [10] X. N. Zhong, H. B. He, H. Zhang, and Z. Wang, “A neural network based online learning and control approach for Markov jump systems,” Neurocomputing, Vol.149, pp. 116-123, 2015.
  11. [11] X. Xu, L. Zuo, and Z. Huang, “Reinforcement learning algorithms with function approximation: recent advances and applications,” Information Sciences, Vol.261, pp. 1-31, 2014.
  12. [12] D. Ormoneit and S. Sen, “Kernel-Based Reinforcement Learning,” Machine Learning, Vol.49, No.2-3, pp. 161-178, 2002.
  13. [13] N. Jong and P. Stone, “Kernel-Based Models for Reinforcement Learning,” ICML Workshop on Kernel Machines and Reinforcement Learning, Pittsburgh, PA, USA, 2006.
  14. [14] C. E. Rasmussen and C. K. I. Williams, “Gaussian Processes for Machine Learning,” The MIT Press, 2006.
  15. [15] R. B. Gramacy and D. W. Apley, “Local Gaussian process approximation for large computer experiments,” J. of Computational and Graphical Statistics, Vol.24, No.2, pp. 561-578, 2015.
  16. [16] E. Snelson and Z. Ghahramani, “Local and global sparse Gaussian process approximations,” InAISTATS, Vol.11, pp. 524-531, 2007.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, IE9,10,11, Opera.

Last updated on Jun. 28, 2017