Sharing Experience for Behavior Generation of Real Swarm Robot Systems Using Deep Reinforcement Learning
Toshiyuki Yasuda* and Kazuhiro Ohkura**
*University of Toyama
3190 Gofuku, Toyama 930-8555, Japan
1-4-1 Kagamiyama, Higashi-hiroshima, Hiroshima 739-8527, Japan
Swarm robotic systems (SRSs) are a type of multi-robot system in which robots operate without any form of centralized control. The typical design methodology for SRSs comprises a behavior-based approach, where the desired collective behavior is obtained manually by designing the behavior of individual robots in advance. In contrast, in an automatic design approach, a certain general methodology is adopted. This paper presents a deep reinforcement learning approach for collective behavior acquisition of SRSs. The swarm robots are expected to collect information in parallel and share their experience for accelerating their learning. We conducted real swarm robot experiments and evaluated the learning performance of the swarm in a scenario where the robots consecutively traveled between two landmarks.
-  E. Şahin, “Swarm robotics: From sources of inspiration to domains of application,” Int. Workshop on Swarm Robotics, pp. 10-20, 2004.
-  M. Brambilla et al., “Swarm robotics: a review from the swarm engineering perspective,” Swarm Intelligence, Vol.7, No.1, pp. 1-41, 2013.
-  W. M. Spears et al., “Distributed, physics-based control of swarms of vehicles,” Autonomous Robots, Vol.17, Nos.2-3, pp. 137-162, 2004.
-  O. Soysal and E. Şahin, “Probabilistic aggregation strategies in swarm robotic systems,” Proc. of the IEEE Swarm Intelligence Symp., pp. 325-332, 2005.
-  Y. Ikemoto et al., “Adaptive division-of-labor control algorithm for multi-robot systems,” J. Robot. Mechatron., Vol.22, No.4, pp. 514-525, 2010.
-  S. Nolfi and D. Floreano, “Evolutionary robotics: The biology, intelligence, and technology of self-organizing machines,” MIT Press, 2000.
-  R. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction,” MIT Press, 1998.
-  M. Dorigo et al., “Evolving self-organizing behaviors for a swarm-bot,” Autonomous Robots, Vol.17, Nos. 2-3, pp. 223-245, 2004.
-  T. Yu et al., “Cooperative transport by a swarm robotic system based on CMA-NeuroES approach,” J. Adv. Comput. Intell. Intell. Inform., Vol.17, No.6, pp. 932-942, 2013.
-  T. Salimans et al., “Evolution strategies as a scalable alternative to reinforcement learning,” arXiv:1703.03864, 2017.
-  M. J. Matarić, “Reinforcement learning in the multi-robot domain,” Robot Colonies, pp. 73-83, 1997.
-  P. Stone and M. Veloso, “Multiagent systems: A survey from a machine learning perspective,” Autonomous Robots, Vol.8, No.3, pp. 345-383, 2000.
-  L. Buşoniu et al., “Multi-agent reinforcement learning: An overview,” Studies in Computational Intelligence, Vol.310, pp. 183-221, 2010.
-  M. Tan, “Multi-agent reinforcement learning: independent vs. cooperative agents,” Proc. of Int. Conf. on Machine Learning, pp. 330-337, 1993.
-  T. Yasuda and K. Ohkura, “Autonomous role assignment in homogeneous multi-robot systems,” J. Robot. Mechatron., Vol.17, No.5, pp. 596-604, 2005.
-  M. Hüttenrauch et al., “Deep reinforcement learning for swarm systems,” J. of Machine Learning Research, Vol.20, No.54, pp. 1-31, 2019.
-  V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, Vol.518, No.7540, pp. 529-533, 2015.
-  M. J. Hausknecht and P. Stone, “Deep recurrent q-learning for partially observable MDPs,” Proc. of AAAI Fall Symposia 2015, 2015.
-  Z. Wang et al., “Dueling network architectures for deep reinforcement learning,” Proc. of the 33rd Int. Conf. on Machine Learning (ICML’16), Vol.48, pp. 1995-2003, 2016.
-  T. Schaul et al., “Prioritized experience replay,” arXiv:1511.05952, 2015.
-  O. Kilinc and G. Montana, “Multi-agent deep reinforcement learning with extremely noisy observations,” arXiv:1812.00922, 2018.
-  J. N. Foerster et al., “Bayesian action decoder for deep multi-agent reinforcement learning,” Int. Conf. on Machine Learning, pp. 1942-1951, 2019.
-  D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” Proc. of the 3rd Int. Conf. on Learning Representations (ICLR), 2014.
-  Y. Wei et al., “Developing End-to-end Control Policies for Robotics Swarms using Deep Q-learning,” J. Adv. Comput. Intell. Intell. Inform., Vol.23, No.5, 2019 (in press).