Reinforcement Signal Propagation Algorithm for Logic Circuit

Chyon Hae Kim; Tetsuya Ogata; Shigeki Sugano

doi:10.20965/jrm.2008.p0757

single-rb.php

« previous

JRM Vol.20 No.5 pp. 757-774

(2008)

doi: 10.20965/jrm.2008.p0757

Paper:

Views over last 60 days: 708

Reinforcement Signal Propagation Algorithm for Logic Circuit

Chyon Hae Kim^*, Tetsuya Ogata^**, and Shigeki Sugano^*

^*Department of Mechanical Engineering, Waseda University, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan

^**Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan

Received:

February 16, 2008

Accepted:

August 19, 2008

Published:

October 20, 2008

Keywords:

topology, self-organization, neural network, reinforcement learning, robot

Abstract

This paper proposes a group of network elements, SONE, that self-organizes network topology, aiming at online and real-time learning and adaptation in robots. SONE, consisting of node elements and link elements, develops network topology by repeating generation and elimination of themselves based on reinforcement signals that are propagated and stored between the elements. This technique proved successful in simulations in which a mobile robot avoided obstacles, and it convinced us of its feasibility for online learning.

Cite this article as:

C. Kim, T. Ogata, and S. Sugano, “Reinforcement Signal Propagation Algorithm for Logic Circuit,” J. Robot. Mechatron., Vol.20 No.5, pp. 757-774, 2008.

Data files:

References

[1] C. J. C. H. Watkins, “Learning From Delayed Rewards,” Ph.D. thesis of Cambridge University, 1989.
[2] C. J. C. H. Watkins, “Q-learning,” Machine Learning, 8, pp. 279-292, 1992.
[3] R. E. Bellman, “A Markov decision process,” Journal of Mathematical Mechanics, 6, pp. 221-229, 1957.
[4] R. S. Sutton and A. G. Barto, “Reinforcement Learning,” MIT Press, 2000.
[5] Y. Kobayashi and S. Hosoe, “Hyper-Cubic Discretization in Reinforcement Learning Based on Autonomous Decentralized Approach,” IEEE Int. Conf. on Systems Man & Cybernetics, pp. 3633-3638, 2003.
[6] Y. Takahashi and M. Asada, “Multi-Controller Fusion in Multi-Layered Reinforcement Learning,” Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems, pp. 7-12, 2001.
[7] K. Shibata, Y. Okabe, and K. Ito, “Direct-Vision-Based Reinforcement Learning Using a Layered Neural Network –For the Whole Process from Sensors to Motors–,” Transaction of the society of Instrument and Control Engineers, 37, 2, pp. 168-177, 2001 (in Japanese).
[8] K. O. Stanley and R. Miikkulainen, “Efficient Reinforcement Learning Through Evolving Neural Network Topologies,” Proc. of the Genetic and Evolutionary Computation Conf., 2002.
[9] A. Utani, G. Kobayashi, Y. Yamazaki, and N. Tosaka, “Self-Designing Neural Network with Integrated Learning Algorithm for Structure and Weight Parameters,” Transaction of The Japan Society for Computational Engineering and Science, (20010043), 2001 (in Japanese).
[10] K. O. Stanley, B. D. Bryant, and R. Miikkulainen, “Real-Time Neuroevolution in the NERO Video Game,” IEEE Transactions on Evolutionary Computation, 9, p. 653, 2005.
[11] S. Whiteson and P. Stone, “Evolutionary Function Approximation for Reinforcement Learning,” Machine Learning, 7, pp. 877-917, 2006.
[12] H. Kitano, “Neurogenttic learning: an integrated method of designing and training neural networks using genetic algorithms,” Physica D, 75, pp. 225-238, 1994.
[13] M. Suzuki and D. Floreano, “Evolutionary Active Vision Toward Three Dimensional Landmark-Navigation,” Lecture note in computer science, 2005.
[14] C. H. Kim, T. Ogata, and S. Sugano, “An Algorithm for Self-Organizing Logic Circuit based on Local Rules,” Transaction of the Society of Instrument and Control Engneers, Vol.42, No.4, pp. 334-341, 2006 (in Japanese).
[15] D. B. Parker, “Learning Logic,” Office of Technology Licensing in Stanford University, 1982.
[16] E. D. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Representations by Back-Propagating Errors,” Nature, pp. 323-533, 1986.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] C. J. C. H. Watkins, “Learning From Delayed Rewards,” Ph.D. thesis of Cambridge University, 1989.

[2] [2] C. J. C. H. Watkins, “Q-learning,” Machine Learning, 8, pp. 279-292, 1992.

[3] [3] R. E. Bellman, “A Markov decision process,” Journal of Mathematical Mechanics, 6, pp. 221-229, 1957.

[4] [4] R. S. Sutton and A. G. Barto, “Reinforcement Learning,” MIT Press, 2000.

[5] [5] Y. Kobayashi and S. Hosoe, “Hyper-Cubic Discretization in Reinforcement Learning Based on Autonomous Decentralized Approach,” IEEE Int. Conf. on Systems Man & Cybernetics, pp. 3633-3638, 2003.

[6] [6] Y. Takahashi and M. Asada, “Multi-Controller Fusion in Multi-Layered Reinforcement Learning,” Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems, pp. 7-12, 2001.

[7] [7] K. Shibata, Y. Okabe, and K. Ito, “Direct-Vision-Based Reinforcement Learning Using a Layered Neural Network –For the Whole Process from Sensors to Motors–,” Transaction of the society of Instrument and Control Engineers, 37, 2, pp. 168-177, 2001 (in Japanese).

[8] [8] K. O. Stanley and R. Miikkulainen, “Efficient Reinforcement Learning Through Evolving Neural Network Topologies,” Proc. of the Genetic and Evolutionary Computation Conf., 2002.

[9] [9] A. Utani, G. Kobayashi, Y. Yamazaki, and N. Tosaka, “Self-Designing Neural Network with Integrated Learning Algorithm for Structure and Weight Parameters,” Transaction of The Japan Society for Computational Engineering and Science, (20010043), 2001 (in Japanese).

[10] [10] K. O. Stanley, B. D. Bryant, and R. Miikkulainen, “Real-Time Neuroevolution in the NERO Video Game,” IEEE Transactions on Evolutionary Computation, 9, p. 653, 2005.

[11] [11] S. Whiteson and P. Stone, “Evolutionary Function Approximation for Reinforcement Learning,” Machine Learning, 7, pp. 877-917, 2006.

[12] [12] H. Kitano, “Neurogenttic learning: an integrated method of designing and training neural networks using genetic algorithms,” Physica D, 75, pp. 225-238, 1994.

[13] [13] M. Suzuki and D. Floreano, “Evolutionary Active Vision Toward Three Dimensional Landmark-Navigation,” Lecture note in computer science, 2005.

[14] [14] C. H. Kim, T. Ogata, and S. Sugano, “An Algorithm for Self-Organizing Logic Circuit based on Local Rules,” Transaction of the Society of Instrument and Control Engneers, Vol.42, No.4, pp. 334-341, 2006 (in Japanese).

[15] [15] D. B. Parker, “Learning Logic,” Office of Technology Licensing in Stanford University, 1982.

[16] [16] E. D. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Representations by Back-Propagating Errors,” Nature, pp. 323-533, 1986.

Reinforcement Signal Propagation Algorithm for Logic Circuit

Chyon Hae Kim*, Tetsuya Ogata**, and Shigeki Sugano*

Chyon Hae Kim^*, Tetsuya Ogata^**, and Shigeki Sugano^*