Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning
Petar S. Kormushev*, Kohei Nomoto**, Fangyan Dong*,
and Kaoru Hirota*
*Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama, 226-8502, Japan
**Graduate School of Science and Engineering, Yamagata University, Yamagata, Japan
- [1] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” Cambridge, MA, MIT Press, 1998.
- [2] M. Humphrys, “Action Selection methods using Reinforcement Learning,” Ph.D. Thesis, University of Cambridge, June 1997.
- [3] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.
- [4] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach. Learn., Vol.8, pp. 279-292, 1992.
- [5] R.S. Sutton, “Learning to predict by the methods of temporal difference,” Mach. Learn., Vol.3, pp. 9-44, 1988.
- [6] P. Dayan and T. J. Sejnowski, “TD (λ) converges with probability 1,” Mach. Learn., Vol.14, No.3, pp. 295-301, 1994.
- [7] D. Precup, R.S. Sutton, and S. Dasgupta, “Off-policy temporal-difference learning with function approximation,” In Proc. of the Eighteenth Conf. on Machine Learning (ICML 2001), M. Kaufmann (Ed.), pp. 417-424, 2001.
- [8] A. Coates, P. Abbeel, and A. Ng, “Learning for Control from Multiple Demonstrations,” ICML, Vol.25, 2008.
- [9] A. Ng, “Reinforcement Learning and Apprenticeship Learning for Robotic Control,” Lecture Notes in Computer Science, Vol.4264, pp. 29-31, 2006.
- [10] P. Abbeel and A. Ng, “Exploration and apprenticeship learning in reinforcement learning,” ICML, 2005.
- [11] P. Abbeel, A. Coates, M. Quigley, and A. Ng, “An application of reinforcement learning to aerobatic helicopter flight,” NIPS, Vol.19, 2007.
- [12] B. Price and C. Boutilier, “Accelerating Reinforcement Learning through Implicit Imitation,” J. of Artificial Intelligence Research, Vol.19, pp. 569-629, 2003.
- [13] A. Barto and S. Mahadevan, “Recent Advances in Hierarchical Reinforcement Learning,” Discrete Event Dynamic Systems, Vol.13, pp. 341-379, 2003.
- [14] T. G. Dietterich, “Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition,” J. Artif. Intell. Res., Vol.13, pp. 227-303, 2000.
- [15] J. Kolter, M. Rodgers, and A. Ng, “A Control Architecture for Quadruped Locomotion Over Rough Terrain,” IEEE Int. Conf. on Robotics and Automation, 2008.
- [16] J. Kolter, P. Abbeel, and A. Ng, “Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion,” Neural Information Processing Systems, Vol.20, 2007.
- [17] M. Kearns and S. Singh, “Near-optimal reinforcement learning in polynomial time,” Machine Learning, 2002.
- [18] P. Kormushev, K. Nomoto, F. Dong, and K. Hirota, “Time manipulation technique for speeding up reinforcement learning in simulations,” Int. J. of Cybernetics and Information Technologies, Vol.8, No.1, pp. 12-24, 2008.
- [19] P. Kormushev, K. Nomoto, F. Dong, and K. Hirota, “Time Hopping technique for faster reinforcement learning in simulations,” available online at arXiv:0904.0545, 2009.
- [20] M. Kearns, Y. Mansour, and A. Y. Ng, “A sparse sampling algorithm for near-optimal planning in large Markov decision processes,” Proc. of the 16th Int. Joint Conf. on Artificial Intelligence, pp. 1324-1331, 1999.
- [21] A. W. Moore, “Prioritized Sweeping: Reinforcement Learning With Less Data and Less Time,” Machine Learning, Vol.13, pp. 103-129, 1994.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.