Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning
Petar S. Kormushev*, Kohei Nomoto**, Fangyan Dong*,
and Kaoru Hirota*
*Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama, 226-8502, Japan
**Graduate School of Science and Engineering, Yamagata University, Yamagata, Japan
-  R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” Cambridge, MA, MIT Press, 1998.
-  M. Humphrys, “Action Selection methods using Reinforcement Learning,” Ph.D. Thesis, University of Cambridge, June 1997.
-  L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.
-  C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach. Learn., Vol.8, pp. 279-292, 1992.
-  R.S. Sutton, “Learning to predict by the methods of temporal difference,” Mach. Learn., Vol.3, pp. 9-44, 1988.
-  P. Dayan and T. J. Sejnowski, “TD (λ) converges with probability 1,” Mach. Learn., Vol.14, No.3, pp. 295-301, 1994.
-  D. Precup, R.S. Sutton, and S. Dasgupta, “Off-policy temporal-difference learning with function approximation,” In Proc. of the Eighteenth Conf. on Machine Learning (ICML 2001), M. Kaufmann (Ed.), pp. 417-424, 2001.
-  A. Coates, P. Abbeel, and A. Ng, “Learning for Control from Multiple Demonstrations,” ICML, Vol.25, 2008.
-  A. Ng, “Reinforcement Learning and Apprenticeship Learning for Robotic Control,” Lecture Notes in Computer Science, Vol.4264, pp. 29-31, 2006.
-  P. Abbeel and A. Ng, “Exploration and apprenticeship learning in reinforcement learning,” ICML, 2005.
-  P. Abbeel, A. Coates, M. Quigley, and A. Ng, “An application of reinforcement learning to aerobatic helicopter flight,” NIPS, Vol.19, 2007.
-  B. Price and C. Boutilier, “Accelerating Reinforcement Learning through Implicit Imitation,” J. of Artificial Intelligence Research, Vol.19, pp. 569-629, 2003.
-  A. Barto and S. Mahadevan, “Recent Advances in Hierarchical Reinforcement Learning,” Discrete Event Dynamic Systems, Vol.13, pp. 341-379, 2003.
-  T. G. Dietterich, “Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition,” J. Artif. Intell. Res., Vol.13, pp. 227-303, 2000.
-  J. Kolter, M. Rodgers, and A. Ng, “A Control Architecture for Quadruped Locomotion Over Rough Terrain,” IEEE Int. Conf. on Robotics and Automation, 2008.
-  J. Kolter, P. Abbeel, and A. Ng, “Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion,” Neural Information Processing Systems, Vol.20, 2007.
-  M. Kearns and S. Singh, “Near-optimal reinforcement learning in polynomial time,” Machine Learning, 2002.
-  P. Kormushev, K. Nomoto, F. Dong, and K. Hirota, “Time manipulation technique for speeding up reinforcement learning in simulations,” Int. J. of Cybernetics and Information Technologies, Vol.8, No.1, pp. 12-24, 2008.
-  P. Kormushev, K. Nomoto, F. Dong, and K. Hirota, “Time Hopping technique for faster reinforcement learning in simulations,” available online at arXiv:0904.0545, 2009.
-  M. Kearns, Y. Mansour, and A. Y. Ng, “A sparse sampling algorithm for near-optimal planning in large Markov decision processes,” Proc. of the 16th Int. Joint Conf. on Artificial Intelligence, pp. 1324-1331, 1999.
-  A. W. Moore, “Prioritized Sweeping: Reinforcement Learning With Less Data and Less Time,” Machine Learning, Vol.13, pp. 103-129, 1994.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.