Paper:
Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller
Yutaka Nakamura, Takeshi Mori, Yoichi Tokita, 
Tomohiro Shibata, and Shin Ishii
 Theoretical Life Science Lab., Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- [1] S. Grillner, P. Wallen, L. Brodin, and A. Lansner, “Neuronal network generating locomotor behavior in lamprey: circuitry, transmitters, membrane properties and simulations,” Annual Review of Neuroscience, 14, pp. 169-199, 1991.
- [2] Y. Fukuoka, H. Kimura, and A. H. Cohen, “Adaptive dynamic walking of a quadruped robot on irregular terrain based on biological concepts,” International Journal of Robotics Research, 22, 3-4, pp. 187-202, 2003.
- [3] G. Taga, Y. Yamaguchi, and H. Shimizu, “Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment,” Biological Cybernetics, 65, pp. 147-159, 1991.
- [4] Y. Nakamura, T. Mori, and S. Ishii, “International conference on parallel problem solving from nature (PPSN VIII),” pp. 972-981, 2004.
- [5] S. Kakade, “A natural policy gradient,” In Advances in Neural Information Processing Systems, 14, pp. 1531-1538, 2001.
- [6] J. Peters, S. Vijayakumar, and S. Schaal, “Reinforcement learning for humanoid robotics,” Third IEEE International Conference on Humanoid Robotics 2003, Germany, 2003.
- [7] S. B. Thrun, “The role of exploration in learning control with neural networks,” Handbook of intelligent control: neural, fuzzy and adaptive approaches (Eds. by D. A. White, and D. A. Sofge), Florence, Kentucky, Van Nostrand Reinhold, 1992.
- [8] S. Ishii, W. Yoshida, and J. Yoshimoto, “Control of exploitation-exploration meta-parameter in reinforcement learning,” Neural Networks, 15, 4, pp. 665-687, 2002.
- [9] H. Kimura, and S. Kobayashi, “An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function,” 15th International Conference on Machine Learning, pp. 278-286, 1998.
- [10] E. Uchibe, and K. Doya, “Competitive-cooperative-concurrent reinforcement learning with importance sampling,” Proceedings of international conference on simulation of adaptive behavior: from animals and animats, pp. 287-296, 2004.
- [11] C. R. Shelton, “Policy improvement for pomdps using normalized importance sampling,” Proceedings of the seventeenth international conference on uncertainty in artificial intelligence, pp. 496-503, 2001.
- [12] D. Precup, R. S. Sutton, and S. Dasgupta, “Off-policy temporal-difference learning with function approximation,” Proceedings of the 18th international conference on machine learning, pp. 417-424, 2001.
- [13] R. S. Sutton, and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
- [14] M. Sato, and S. Ishii, “Reinforcement learning based on on-line em algorithm,” Advances in Neural Information Processing Systems, 11, pp. 1052-1058, 1999.
- [15] M. Sato, Y. Nakamura, and S. Ishii, “Reinforcement learning for biped locomotion,” International Conference on Artificial Neural Networks (ICANN 2002), pp. 777-782, 2002.
- [16] V. R. Konda, and J. N. Tsitsiklis, “Actor-critic algorithms,” SIAM Journal on Control and Optimization, 42, 4, pp. 1143-1146, 2003.
- [17] R. S. Sutton, D. McAllester, S. Singh, and Y. Manour, “Policy gradient method for reinforcement learning with function approximation,” Advances in Neural Information Processing Systems, Vol.12, pp. 1057-1063, 2000.
- [18] S. J. Bradtke, and A. G. Barto, “Linear least-squares algorithms for temporal difference learning,” Machine Learning, 22, pp. 33-57, 1996.
- [19] J. Yoshimoto, S. Ishii, and M. Sato, “System identification based on on-line variational bayes method and its application to reinforcement learning,” in artificial neural networks and neural information processing (ICANN/ICONIP 2003), LCN2714, Springer-Verlag, pp. 123-131, 2003.
- [20] D. J. C. MacKay, “Information Theory, Inference, and Learning Algorithms,” Cambridge University Press, 2002.
				 This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.
				 This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License. 
			
Copyright© 2005 by Fuji Technology Press Ltd. and Japan Society of Mechanical Engineers. All right reserved.
