Paper:
Reinforcement Learning Scheme for Flocking Behavior Emergence
Koichiro Morihiro*, **, Teijiro Isokawa**, Haruhiko Nishimura***, Masahito Tomimasu**, Naotake Kamiura**, and Nobuyuki Matsui**
*Hyogo University of Teacher Education, 942-1 Shimokume, Kato-shi, Hyogo 673-1494, Japan
**Division of Computer Engineering, Graduate School of Engineering, University of Hyogo, 2167 Shosha, Himeji, Hyogo 671-2201, Japan
***Graduate School of Applied Informatics, University of Hyogo, 1-3-3 Chuo-ku, Kobe, Hyogo 650-0044, Japan
Collective behavior such as bird flocking, land animal herding, and fish schooling is well known in nature. Many observations have shown that there are no leaders to control the behavior of a group. Several models have been proposed for describing the grouping behavior, which we regard as a distinctive example of aggregate motions. In these models, a fixed rule is provided for each of the individuals a priori for their interactions in a reductive and rigid manner. In contrast, we propose a new framework for the self-organized grouping of agents by reinforcement learning. It is important to introduce a learning scheme for causing collective behavior in artificial autonomous distributed systems. The behavior of agents is demonstrated and evaluated through computer simulations and it is shown that their grouping behavior emerges as a result of learning.
- [1] I. Aoki, “A Simulation Study on the Schooling Mechanism in Fish,” Bulletin of the Japanese Society of Scientific Fisheries, 48(8), pp. 1081-1088, 1982.
- [2] P. Y. Glorennec, “Reinforcement Learning: an Overview,” in Proceedings of European Symposium on Intelligent Techniques 2000, pp. 17-35, Aachen, Germany, 2000.
- [3] A. Huth and C. Wissel, “The Simulation of the Movement of Fish Schools,” Journal of Theoretical Biology, 156, pp. 365-385, 1992.
- [4] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A Survey,” Journal of Artificial Intelligence Research, 4, pp. 237-285, 1996.
- [5] B. L. Partridge, “The Structure and function of fish schools,” Scientific American, 246, pp. 90-99, 1982.
- [6] C. W. Reynolds, “Flocks, Herds, and Schools: A Distributed Behavioral Model,” Computer Graphics, 21(4), pp. 25-34, 1987.
- [7] E. Shaw, “Schooling Fishes,” American Scientist, 66, pp. 166-175, 1978.
- [8] R. S. Sutton and A. G. Barto, “Reinforcement Learning,” MIT Press, Cambridge, MA, 1982.
- [9] C. J. C. H. Watkins, “Learning from delayed rewards,” Ph.D. thesis, University of Cambridge, 1989.
- [10] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, 8, pp. 279-292, 1992.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.