Visualization of Learning Process in “State and Action” Space Using Self-Organizing Maps
Akira Notsu*, Yuichi Hattori**, Seiki Ubukata*, and Katsuhiro Honda*
*Osaka Prefecture University
1-1 Gakuen-cho, Nakaku, Sakai, Osaka 599-8531, Japan
**IT Platform Service Division, Nomura Research Institute, Ltd.
1-6-5 Marunouchi, Chiyoda-ku, Tokyo 100-0005, Japan
In reinforcement learning, agents can learn appropriate actions for each situation based on the consequences of these actions after interacting with the environment. Reinforcement learning is compatible with self-organizing maps that accomplish unsupervised learning by reacting to impulses and strengthening neurons. Therefore, numerous studies have investigated the topic of reinforcement learning in which agents learn the state space using self-organizing maps. In this study, while we intended to apply these previous studies to transfer the learning and visualization of the human learning process, we introduced self-organizing maps into reinforcement learning and attempted to make their “state and action” learning process visible. We performed numerical experiments with the 2D goal-search problem; our model visualized the learning process of the agent.
-  C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, Vol.8, pp. 279-292, 1992.
-  Y. Kashimura, A. Ueno, S. Tatsumi, “Kyokagakushu no tameno Particle Filter o mochiita Renzokukodokukanhyogen (Continuous action space representation using the Particle Filter for reinforcement learning),” Proc. of the Japanese Society for Artificial Intelligence National Convention 2008, 2A1-3, 2008 (in Japanese).
-  R. S. Sutton and A. G. Barto, “Generalization in Reinforcement Learning: An Introduction,” The MIT Press, 1998.
-  D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization,” IEEE Trans. on Evolutionary Computation, Vol.1, No.1, pp. 67-82, 1997.
-  F. Ferández and M. Veloso, “Probabilistic Policy Reuse in a Reinforcement Learning Agent,” Proc. of the 5th Int. Joint Conf. on Autonomous Agents and Multiagent System, pp. 720-727, 2006.
-  T. Miyoshi, “Gakushu Data ni yoru Shoki node Kokan o mochiita SOM no Tokuchomappushokika (Feature map initialization of SOM using the initial node exchange by learning data),” Intelligence and information (Japan Society for Fuzzy Theory and Intelligent Informatics), Vol.19, No.2, pp. 167-175, 2007 (in Japanese).
-  T. Tateyama, S. Kawata, and T. Oguchi, “Jikososhikikamappu o mothiita Kyoukagakusyu no Kousokukasyuhou no Teian(Proposal of high-speed method of reinforcement learning using a self-organizing map),” Trans. of the Japan Society of Mechanical Engineers, Vol.70, No.694, pp. 1722-1729, 2004 (in Japanese).
-  A. Notsu, K. Honda, H. Ichihashi, A. Ido, and Y. Komori, “Information Compression Effect Based on PCA for Reinforcement Learning Agents’ Communication,” Proc. of the 6th Int. Conf. on Soft Computing and Intelligent Systems the 13th Int. Symp. on Advanced Intelligent Systems, pp. 1318-1321, 2012.
-  T. Ueno, A. Notsu, and K. Honda, “Application of FCM-type Co-clustering to an Agent in Reinforcement Learning,” Proc. of 1st Int. Conf. on Advanced Information Technologies, #P12, pp. 1-5, 2013.
-  Y. Tezuka, A. Notsu, and K. Honda, “Utility of Turning Spot Learning under Complex Goal Search and the Limit of Memory Usage,” Proc. of Joint 7th Int. Conf. on Soft Computing and Intelligent Systems and 15th Int. Symp. on Advanced Intelligent Systems, pp. 1418-1423, 2014.
-  Y. Sato and K. Murakoshi, “Nyuryoku no Rireki o mochiita Seityougata Jikososhikikamappu (Growing self-organizing map using the history of the input),” Technical Committee on Neurocomputing, Vol.107, No.542, pp. 507-512, 2008 (in Japanese).
-  T. Kohonen, “The self-organizing map,” Proc. of the IEEE, Vol.78, No.9, pp. 1464-1480, 1990.
-  T. Kohonen, “Self-Organized Formation of Topologically Correct Maps,” Biological Cybernetics, Vol.43, No.1, pp. 56-69, 1982.
-  K. Arai, “Jikososhikikamappu kara Dousyutsu suru Mitsudomappu o mochiita Gazoukurasutaringu no Gakusyukatei (Learning process of image clustering using a density map derived from the self-organizing map),” J. of the Japan Society of Photogrammetry and Remote Sensing, Vol.43, No.5, pp. 62-67, 2004 (in Japanese).
-  G. A. Rummery and M. Niranjan, “On-line Q-learning using connectionist systems,” University of Cambridge, Department of Engineering, 1994.
-  T. Jaakkola, S. P. Shingh and M. I. Jordan, “Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems,” Proc. of the Advances in Neural Information Processing Systems, pp. 345-352, 1995.
-  A. Notsu, H. Wada, K. Honda, and H. Ichihashi, “Cell Division Approach for Search Space in Reinforcement Learning,” IJCSNS, Vol.8, No.6, pp. 18-21, 2008.
-  A. Notsu, H. Honda, H. Ichihashi and H. Wada, “Contraction Algorithm in State and Action Space for Q-learning,” Proc. of SCIS&ISIS, pp. 93-96, 2009.
-  A. Ito and M. Kanebuchi, “Chikakujouhou no Sosika ni yoru Maruchiejentkyoukagakusyu no Kousokuka – Hantagemu o Rei ni – (Speeding up of multi-agent reinforcement learning by coarse-grained sensory information – the Hunter game as an example –), Institute of Electronics, Information and Communication Engineers J., J84-D1, No.3, pp. 285-293, 2001 (in Japanese).
-  H. Iwasaki and N. Sueda, “Kyoukagakusyu ni okeru Jikososhikikamappu o mochiita Joutaikukan no Jiritsutekikouseihou (Strengthening autonomous configuration method of state space using a self-organizing map in reinforcement learning),” Proc. of the Japanese Society for Artificial Intelligence National Convention 2005, 1D3-05, 2005 (in Japanese).
-  S. Shimoyama and N. Sueda, “Mobairuejento ni okeru SOM o mochiita Kaisoutekikyoukagakusyu (Hierarchical reinforcement learning using the SOM in Mobile Agent),” Proc. of the Japanese Society for Artificial Intelligence National Convention 2005, 1D4-05, 2005 (in Japanese).