Construction of Semi-Markov Decision Process Models of  Continuous State Space Environments Using Growing Cell Structures  and Multiagent <i>k</i>-Certainty Exploration Method

Takeshi Tateyama; Seiichi Kawata; Yoshiki Shimomura

doi:10.20965/jaciii.2009.p0608

single-jc.php

« previous

JACIII Vol.13 No.6 pp. 608-614

(2009)

doi: 10.20965/jaciii.2009.p0608

Paper:

Views over last 60 days: 938

Construction of Semi-Markov Decision Process Models of Continuous State Space Environments Using Growing Cell Structures and Multiagent k-Certainty Exploration Method

Takeshi Tateyama^, Seiichi Kawata^, and Yoshiki Shimomura^

^*Tokyo Metropolitan University, 6-6 Asahigaoka, Hino, Tokyo 191-0065, Japan

^**Advanced Institute of Industrial Technology, 1-10-40 Higashiohi , Shinagawa-ku, Tokyo 140-0011, Japan

^***Tokyo Metropolitan University, 6-6 Asahigaoka, Hino, Tokyo 191-0065, Japan

Received:

April 16, 2009

Accepted:

July 31, 2009

Published:

November 20, 2009

Keywords:

reinforcement learning, k-certainty exploration method, growing cell structure, parallel reinforcement learning, mobile robot navigation

Abstract

k-certainty exploration method, an efficient reinforcement learning algorithm, is not applied to environments whose state space is continuous because continuous state space must be changed to discrete state space. Our purpose is to construct discrete semi-Markov decision process (SMDP) models of such environments using growing cell structures to autonomously divide continuous state space then using k-certainty exploration method to construct SMDP models. Multiagent k-certainty exploration method is then used to improve exploration efficiency. Mobile robot simulation demonstrated our proposal's usefulness and efficiency.

Cite this article as:

T. Tateyama, S. Kawata, and Y. Shimomura, “Construction of Semi-Markov Decision Process Models of Continuous State Space Environments Using Growing Cell Structures and Multiagent k-Certainty Exploration Method,” J. Adv. Comput. Intell. Intell. Inform., Vol.13 No.6, pp. 608-614, 2009.

Data files:

References

[1] R. S. Sutton and A.G. Bart, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
[2] C.J.C.H. Watkins and P. Dayan, “Technical Note: Q-Learning,” Machine Learning 8, pp. 279-292, 1992.
[3] K. Miyazaki, M. Yamamura, and S. Kobayashi, “k-Certainty Exploration Method: an action selector to identify the environment in reinforcement learning,” Artificial Intelligence 91, pp. 155-171, 1997.
[4] R.E. Parr, “Hierarchical Control and Learning for Markov Decision Processes,” Ph.D. Thesis, Computer Science in the Graduate Division of the University of California at Berkeley, 1990.
[5] B. Fritzke, “Unsupervised Clustering with Growing Cell Structures,” Proc. of the Int. Joint conf. on Neural Networks (IJCNN-91), Seattle, Vol.2, pp. 531-536, 1991.
[6] B. Fritzke, “Growing Cell Structures - a self-organizing network for unsupervised and supervised learning,” Neural Networks, Vol.7, No.9, pp. 1441-1460, 1994.
[7] U.R. Zimmer and E. von Puttkamer, “Realtime-learning on an Autonomous Mobile Robot with Neural Networks,” Proc. of the Euromicro ”94 Realtime Workshop, Sweden, June 15-17, pp. 40-44, 1994.
[8] T. Tateyama, S. Kawata, and Y. Shimomura, “Parallel Reinforcement Learning Systems Using Exploration Agents,” Trans. of the Japan Society of Mechanical Engineers, Vol.74, No.739, C, pp. 200-209, 2008 (in Japanese).
[9] T. Kohonen, “Self-organization and associative memory,” Springer Series in Information Sciences 8, Heidelberg, 1984.
[10] B.J.A. Kröse and M. Eecen, “A self-organizing representation of sensor space for mobile robot navigation,” Proc. of the IEEE/RSJ/GI Int. Conf. on Intelligent Robots and Systems, pp. 9-14, 1994.
[11] N.A. Vlassis, G. Papakonstantinou, and P. Tsanakas, “Robot Map Building by Kohonen”s Self-Organizing Neural Networks,” Proc. 1st Mobinet Symposium on Robotics for Health Care, pp. 187-194, 1997.
[12] K. Terada, H. Takeda, and T. Nishida, “An acquisition of the relation between vision and action using self-organizing map and reinforcement learning,” In Second Int. Conf. on Knowledge-based Intelligent Electronic Systems, pp. 429-434, 1998.
[13] H. M. Gross, V. Stephan, and H.J. Boehme, “Sensory-based Robot Navigation using Self-organizing Networks and Q-learning,” Proc. WCNN”96, World Congress on Neural Networks 1996, San Diego, pp. 94-99, 1996.
[14] M. Tan, “Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents,” Proc. of the 10th Int. Conf. on Machine Learning, pp. 330-337, 1993.
[15] G. Laurent and E. Piat, “Parallel Q-Learning for a block-pushing problem,” Proc. of the 2001 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 286-291, 2001.
[16] R.M. Kretchmar, “Parallel Reinforcement Learning,” Proc. of the 6th World Conf. on Systemics, Cybernetics, and Informatics, Vol.6, pp. 114-118, 2002.
[17] T. Tateyama, S. Kawata, and Y. Shimomura, “Parallel Reinforcement Learning Systems including Exploration-oriented Agents,” In Proc. of Joint 3rd Int. Conf. on Soft Computing and Intelligent Systems and 7th Int. Symposium on advanced Intelligent Systems (SCIS & ISIS 2006), pp. 1471-1475, Tokyo, Japan, 2006.
[18] S. J. Bradtke and M. O. Duff, “Reinforcement learning methods for continuous-time markov decision problems,” In advances in Neural Information Processing Systems 7., MIT Press, 1995.
[19] A. McGovern, R. S. Sutton, and A. H. Fagg, “Roles of macro-actions in accelerating reinforcement learning,” In proc. of the 1997 Grace Hopper Celeration of Women in Computing, pp. 13-18, 1997.
[20] V. Braitenberg, “Vehicles:Experiments in Synthetic Psychology Boston,” MA: MIT Press, 1984.
[21] S. Ichikawa and F. Hara, “Characteristics on Swarm Intelligence Generated in Multi-Robot System - Space Coverage Behavior and its Application -,” J. of the Robotics Society of Japan, Vol.13, No.8, pp. 78-84, 1995 (in Japanese).
[22] O. Michel, “Khepera Simulator Version 2.0 User Manual,”
http://diwww.epfl.ch/lami/team/michel/khep-sim/ 1996.
[23] R.A. Brooks, “A robust layered control system for a mobile robot,” IEEE J. of Robotics and Automation, Vol.2, No.1, pp. 14-23, 1986.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] R. S. Sutton and A.G. Bart, “Reinforcement Learning: An Introduction,” MIT Press, 1998.

[B2] [2] C.J.C.H. Watkins and P. Dayan, “Technical Note: Q-Learning,” Machine Learning 8, pp. 279-292, 1992.

[B3] [3] K. Miyazaki, M. Yamamura, and S. Kobayashi, “k-Certainty Exploration Method: an action selector to identify the environment in reinforcement learning,” Artificial Intelligence 91, pp. 155-171, 1997.

[B4] [4] R.E. Parr, “Hierarchical Control and Learning for Markov Decision Processes,” Ph.D. Thesis, Computer Science in the Graduate Division of the University of California at Berkeley, 1990.

[B5] [5] B. Fritzke, “Unsupervised Clustering with Growing Cell Structures,” Proc. of the Int. Joint conf. on Neural Networks (IJCNN-91), Seattle, Vol.2, pp. 531-536, 1991.

[B6] [6] B. Fritzke, “Growing Cell Structures - a self-organizing network for unsupervised and supervised learning,” Neural Networks, Vol.7, No.9, pp. 1441-1460, 1994.

[B7] [7] U.R. Zimmer and E. von Puttkamer, “Realtime-learning on an Autonomous Mobile Robot with Neural Networks,” Proc. of the Euromicro ”94 Realtime Workshop, Sweden, June 15-17, pp. 40-44, 1994.

[B8] [8] T. Tateyama, S. Kawata, and Y. Shimomura, “Parallel Reinforcement Learning Systems Using Exploration Agents,” Trans. of the Japan Society of Mechanical Engineers, Vol.74, No.739, C, pp. 200-209, 2008 (in Japanese).

[B9] [9] T. Kohonen, “Self-organization and associative memory,” Springer Series in Information Sciences 8, Heidelberg, 1984.

[B10] [10] B.J.A. Kröse and M. Eecen, “A self-organizing representation of sensor space for mobile robot navigation,” Proc. of the IEEE/RSJ/GI Int. Conf. on Intelligent Robots and Systems, pp. 9-14, 1994.

[B11] [11] N.A. Vlassis, G. Papakonstantinou, and P. Tsanakas, “Robot Map Building by Kohonen”s Self-Organizing Neural Networks,” Proc. 1st Mobinet Symposium on Robotics for Health Care, pp. 187-194, 1997.

[B12] [12] K. Terada, H. Takeda, and T. Nishida, “An acquisition of the relation between vision and action using self-organizing map and reinforcement learning,” In Second Int. Conf. on Knowledge-based Intelligent Electronic Systems, pp. 429-434, 1998.

[B13] [13] H. M. Gross, V. Stephan, and H.J. Boehme, “Sensory-based Robot Navigation using Self-organizing Networks and Q-learning,” Proc. WCNN”96, World Congress on Neural Networks 1996, San Diego, pp. 94-99, 1996.

[B14] [14] M. Tan, “Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents,” Proc. of the 10th Int. Conf. on Machine Learning, pp. 330-337, 1993.

[B15] [15] G. Laurent and E. Piat, “Parallel Q-Learning for a block-pushing problem,” Proc. of the 2001 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 286-291, 2001.

[B16] [16] R.M. Kretchmar, “Parallel Reinforcement Learning,” Proc. of the 6th World Conf. on Systemics, Cybernetics, and Informatics, Vol.6, pp. 114-118, 2002.

[B17] [17] T. Tateyama, S. Kawata, and Y. Shimomura, “Parallel Reinforcement Learning Systems including Exploration-oriented Agents,” In Proc. of Joint 3rd Int. Conf. on Soft Computing and Intelligent Systems and 7th Int. Symposium on advanced Intelligent Systems (SCIS & ISIS 2006), pp. 1471-1475, Tokyo, Japan, 2006.

[B18] [18] S. J. Bradtke and M. O. Duff, “Reinforcement learning methods for continuous-time markov decision problems,” In advances in Neural Information Processing Systems 7., MIT Press, 1995.

[B19] [19] A. McGovern, R. S. Sutton, and A. H. Fagg, “Roles of macro-actions in accelerating reinforcement learning,” In proc. of the 1997 Grace Hopper Celeration of Women in Computing, pp. 13-18, 1997.

[B20] [20] V. Braitenberg, “Vehicles:Experiments in Synthetic Psychology Boston,” MA: MIT Press, 1984.

[B21] [21] S. Ichikawa and F. Hara, “Characteristics on Swarm Intelligence Generated in Multi-Robot System - Space Coverage Behavior and its Application -,” J. of the Robotics Society of Japan, Vol.13, No.8, pp. 78-84, 1995 (in Japanese).

[B22] [22] O. Michel, “Khepera Simulator Version 2.0 User Manual,”
http://diwww.epfl.ch/lami/team/michel/khep-sim/ 1996.

[B23] [23] R.A. Brooks, “A robust layered control system for a mobile robot,” IEEE J. of Robotics and Automation, Vol.2, No.1, pp. 14-23, 1986.

Construction of Semi-Markov Decision Process Models of Continuous State Space Environments Using Growing Cell Structures and Multiagent k-Certainty Exploration Method

Takeshi Tateyama*, Seiichi Kawata**, and Yoshiki Shimomura***

Takeshi Tateyama^, Seiichi Kawata^, and Yoshiki Shimomura^