Paper:
Introduction of Fixed Mode States into Online Reinforcement Learning with Penalties and Rewards and its Application to Biped Robot Waist Trajectory Generation
Seiya Kuroda*, Kazuteru Miyazaki**, and Hiroaki Kobayashi***
*Panasonic Factory Solutions Co., Ltd., 1375 Kamisukiawara, Showa-cho, Nakakoma-gun, Yamanashi 409-3895, Japan
**Research Department, National Institution for Academic Degrees and University Evaluation, 1-29-1 Gakuennishimachi, Kodaira, Tokyo 187-8587, Japan
***Department of Mechanical Engineering Informatics, Meiji University, 1-1-1 Higashimita Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
- [1] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” A Bradford Book, MIT Press, 1998.
- [2] V. Heidrich-Meisner and C. Igel, “Evolution Strategies for Direct Policy Search,” Parallel Problem Solving from Nature (PPSN X), 5199 of LNCS, pp. 428-437, Springer-Verlag, 2008.
- [3] K. Ikeda, “Exemplar-Based Direct Policy Search with Evolutionary Optimization,” Proc. of 2005 Congress on Evolutionary Computation CEC2005, pp. 2357-2364, 2005.
- [4] T. Matsui, T. Goto, and K. Izumi, “Acquiring a Government Bond Trading Strategy Using Reinforcement Learning,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.13, No.6, pp. 691-696, 2009.
- [5] K. Merrick and M. L. Maher, “Motivated Reinforcement Learning for Adaptive Characters in Open-Ended Simulation Games,” Proc. of the Int. Conf. on Advanced in Computer Entertainment Technology, pp. 127-134, 2007.
- [6] A.Miyamae, J. Sakuma, I. Ono, and S. Kobayashi, “Instance-based Policy Learning by Real-coded Genetic Algorithms and Its Application to Control of Nonholonomic Systems,” J. of The Japanese Society for Artificial Intelligence, Vol.24, No.1, pp. 104-115, 2009 (in Japanese).
- [7] J. Randløv and P. Alstrøm, “Learning to Drive a Bicycle Using Reinforcement Learning and Shaping,” Proc. of the 15th Int. Conf. on Machine Learning, pp. 463-471, 1998.
- [8] P. Stone, R. S. Sutton, and G. Kuhlamann, “Reinforcement Learning toward RoboCup Soccer Keepaway,” Adaptive Behavior, Vol.13, No.3, pp. 165-188, 2005.
- [9] T. Watanabe, K. Miyazaki, and H. Kobayashi, “A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.13, No.6, pp. 675-682, 2009.
- [10] J. Yoshimoto, M. Nishimura, Y. Tokita, and S. Ishii, “Acrobot control by learning the switching of multiple controllers,” J. of Artificial Life and Robotics, Vol.9, No.2, pp. 67-71, 2005.
- [11] R. S. Sutton, D. McAllester, S. P. Singh, and Y. Mansour, “Policy Gradient Methods for Reinforcement Learning with Function Approximation,” Advances in Neural Information Processing Systems, Vol.12, pp. 1057-1063, 2000.
- [12] D. E. Goldberg, “Genetic Algorithms in Search, Optimization, and Machine Learning,” Addison-Wesley Professional, 1989.
- [13] K. Miyazaki and S. Kobayashi, “Exploitation-Oriented Learning PS-r#,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.13, No.6, pp. 624-630, 2009.
- [14] K.Miyazaki, M. Yamamura, and S. Kobayashi, “On the Rationality of Profit Sharing in Reinforcement Learning,” Proc. of the 3rd Int. Conf. on Fuzzy Logic, Neural Nets and Soft Computing, pp. 285-288, 1994.
- [15] K. Miyazaki and S. Kobayashi, “Reinforcement Learning for Penalty Avoiding Policy Making,” Proc. of the 2000 IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 206-211, 2000.
- [16] C. J. H. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, Vol.8, pp. 55-68, 1992.
- [17] K. Hasemi and H. Suyari, “A Proposal of algorithm that reduces computational complexity for Online Profit Sharing,” Report of the Institute of Electronics, Information and Communication Engineers. Vol.NC-105, No.657, pp. 103-108, 2006 (in Japanese).
- [18] K.Miyazaki and S. Kobayashi, “A Reinforcement Learning System for Penalty Avoiding in Continuous State Spaces,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.11, No.6, pp. 668-676, 2007.
- [19] L. Christan, “Reinforcement Learning with perceptual aliasing: The Perceptual Distinctions Approach,” Proc. of the 10th National Conf. on Artificial Intelligence, pp. 183-188, 1992.
- [20] H. Kimura and S. Kobayashi, “An analysis of actor/critic algorithm using eligibility traces: reinforcement learning with imperfect value function,” Proc. of the 15th Int. Conf. on Machine Learning, pp. 278-286, 1998.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.