Reinforcement Signal Propagation Algorithm for Logic Circuit
Chyon Hae Kim*, Tetsuya Ogata**, and Shigeki Sugano*
*Department of Mechanical Engineering, Waseda University, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
**Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan
- [1] C. J. C. H. Watkins, “Learning From Delayed Rewards,” Ph.D. thesis of Cambridge University, 1989.
- [2] C. J. C. H. Watkins, “Q-learning,” Machine Learning, 8, pp. 279-292, 1992.
- [3] R. E. Bellman, “A Markov decision process,” Journal of Mathematical Mechanics, 6, pp. 221-229, 1957.
- [4] R. S. Sutton and A. G. Barto, “Reinforcement Learning,” MIT Press, 2000.
- [5] Y. Kobayashi and S. Hosoe, “Hyper-Cubic Discretization in Reinforcement Learning Based on Autonomous Decentralized Approach,” IEEE Int. Conf. on Systems Man & Cybernetics, pp. 3633-3638, 2003.
- [6] Y. Takahashi and M. Asada, “Multi-Controller Fusion in Multi-Layered Reinforcement Learning,” Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems, pp. 7-12, 2001.
- [7] K. Shibata, Y. Okabe, and K. Ito, “Direct-Vision-Based Reinforcement Learning Using a Layered Neural Network –For the Whole Process from Sensors to Motors–,” Transaction of the society of Instrument and Control Engineers, 37, 2, pp. 168-177, 2001 (in Japanese).
- [8] K. O. Stanley and R. Miikkulainen, “Efficient Reinforcement Learning Through Evolving Neural Network Topologies,” Proc. of the Genetic and Evolutionary Computation Conf., 2002.
- [9] A. Utani, G. Kobayashi, Y. Yamazaki, and N. Tosaka, “Self-Designing Neural Network with Integrated Learning Algorithm for Structure and Weight Parameters,” Transaction of The Japan Society for Computational Engineering and Science, (20010043), 2001 (in Japanese).
- [10] K. O. Stanley, B. D. Bryant, and R. Miikkulainen, “Real-Time Neuroevolution in the NERO Video Game,” IEEE Transactions on Evolutionary Computation, 9, p. 653, 2005.
- [11] S. Whiteson and P. Stone, “Evolutionary Function Approximation for Reinforcement Learning,” Machine Learning, 7, pp. 877-917, 2006.
- [12] H. Kitano, “Neurogenttic learning: an integrated method of designing and training neural networks using genetic algorithms,” Physica D, 75, pp. 225-238, 1994.
- [13] M. Suzuki and D. Floreano, “Evolutionary Active Vision Toward Three Dimensional Landmark-Navigation,” Lecture note in computer science, 2005.
- [14] C. H. Kim, T. Ogata, and S. Sugano, “An Algorithm for Self-Organizing Logic Circuit based on Local Rules,” Transaction of the Society of Instrument and Control Engneers, Vol.42, No.4, pp. 334-341, 2006 (in Japanese).
- [15] D. B. Parker, “Learning Logic,” Office of Technology Licensing in Stanford University, 1982.
- [16] E. D. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Representations by Back-Propagating Errors,” Nature, pp. 323-533, 1986.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.
Copyright© 2008 by Fuji Technology Press Ltd. and Japan Society of Mechanical Engineers. All right reserved.