Review:
Learning Agents in Robot Navigation: Trends and Next Challenges
Fumito Uwano
Okayama University
3-1-1 Tsushima-naka, Kita-ku, Okayama 700-8530, Japan
Multiagent reinforcement learning performs well in multiple situations such as social simulation and data mining. It particularly stands out in robot control. In this approach, artificial agents behave in a system and learn their policies for their own satisfaction and that of others. Robots encode policies to simulate the performance. Therefore, learning should maintain and improve system performance. Previous studies have attempted various approaches to outperform control robots. This paper provides an overview of multiagent reinforcement learning work, primarily on navigation. Specifically, we discuss current achievements and limitations, followed by future challenges.
- [1] S. Thrun, W. Burgard, and D. Fox, “Probabilistic Robotics (Intelligent Robotics and Autonomous Agents),” The MIT Press, 2005.
- [2] K. J. Singh, A. Nayyar, D. S. Kapoor, N. Mittal, S. Mahajan, A. K. Pandit, and M. Masud, “Adaptive Flower Pollination Algorithm-Based Energy Efficient Routing Protocol for Multi-Robot Systems,” IEEE Access, Vol.9, pp. 82417-82434, 2021. https://doi.org/10.1109/ACCESS.2021.3086628
- [3] Y. Chang, L. Ballotta, and L. Carlone, “D-Lite: Navigation-Oriented Compression of 3D Scene Graphs for Multi-Robot Collaboration,” IEEE Robotics and Automation Letters, Vol.8, No.11, pp. 7527-7534, 2023. https://doi.org/10.1109/LRA.2023.3320011
- [4] S. Han, M. Dastani, and S. Wang, “Model-based Sparse Communication in Multi-agent Reinforcement Learning,” Proc. of the 2023 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 439-447, 2023.
- [5] G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant, “Conflict-Based Search for Optimal Multi-Agent Pathfinding,” Artificial Intelligence, Vol.219, pp. 40-66, 2015. https://doi.org/10.1016/j.artint.2014.11.006
- [6] Z. Ren, J. Li, H. Zhang, S. Koenig, S. Rathinam, and H. Choset, “Binary Branching Multi-Objective Conflict-Based Search for Multi-Agent Path Finding,” Proc. of the Int. Conf. on Automated Planning and Scheduling, Vol.33, No.1, pp. 361-369, 2023. https://doi.org/10.1609/icaps.v33i1.27214
- [7] C. Ge, H. Zhang, J. Li, and S. Koenig, “Cost Splitting for Multi-Objective Conflict-Based Search,” Proc. of the Int. Conf. on Automated Planning and Scheduling, Vol.33, No.1, pp. 128-137, 2023. https://doi.org/10.1609/icaps.v33i1.27187
- [8] L. Chen, Y. Wang, Y. Mo, Z. Miao, H. Wang, M. Feng, and S. Wang, “Multiagent Path Finding Using Deep Reinforcement Learning Coupled with Hot Supervision Contrastive Loss,” IEEE Trans. on Industrial Electronics, Vol.70, No.7, pp. 7032-7040, 2023. https://doi.org/10.1109/TIE.2022.3206745
- [9] C. Ferner, G. Wagner, and H. Choset, “ODrM* optimal multirobot path planning in low dimensional search spaces,” 2013 IEEE Int. Conf. on Robotics and Automation, pp. 3854-3859, 2013. https://doi.org/10.1109/ICRA.2013.6631119
- [10] G. Wagner and H. Choset, “M*: A Complete Multirobot Path Planning Algorithm with Performance Bounds,” 2011 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 3260-3267, 2011. https://doi.org/10.1109/IROS.2011.6095022
- [11] H. Asano, R. Yonetani, M. Nishimura, and T. Kozuno, “Counterfactual Fairness Filter for Fair-Delay Multi-Robot Navigation,” Proc. of the 2023 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 887-895, 2023.
- [12] Y. Miyashita, T. Yamauchi, and T. Sugawara, “Distributed Planning with Asynchronous Execution with Local Navigation for Multi-agent Pickup and Delivery Problem,” Proc. of the 2023 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 914-922, 2023.
- [13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, Vol.518, No.7540, pp. 529-533, 2015. https://doi.org/10.1038/nature14236
- [14] D. Hennes, D. Morrill, S. Omidshafiei, R. Munos, J. Perolat, M. Lanctot, A. Gruslys, J.-B. Lespiau, P. Parmas, E. Duèñez-Guzmán, and K. Tuyls, “Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients,” Proc. of the 19th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 492-501, 2020.
- [15] O. Sigaud and O. Buffet, “Markov Decision Processes in Artificial Intelligence,” Wiley-IEEE Press, 2010.
- [16] R. S. Sutton and A. G. Barto, “Introduction to Reinforcement Learning,” MIT Press, 1998.
- [17] C. J. C. H. Watkins and P. Dayan, “Q-Learning,” Machine Learning, Vol.8, Nos.3-4, pp. 279-292, 1992. https://doi.org/10.1007/BF00992698
- [18] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous Methods for Deep Reinforcement Learning,” arXiv preprint, arXiv:1602.01783, 2016. https://doi.org/10.48550/arXiv.1602.01783
- [19] R. Raileanu, E. Denton, A. Szlam, and R. Fergus, “Modeling Others Using Oneself in Multi-Agent Reinforcement Learning,” Proc. of the 35th Int. Conf. on Machine Learning, 2018.
- [20] M. Samvelyan, T. Rashid, C. S. d. Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C.-M. Hung, P. H. S. Torr, J. Foerster, and S. Whiteson, “The StarCraft Multi-Agent Challenge,” Proc. of the 18th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 2186-2188, 2019. https://doi.org/10.5555/3306127.3332052
- [21] T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, and S. Whiteson, “QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning,” Proc. of the 35th Int. Conf. on Machine Learning, pp. 4295-4304, 2018.
- [22] E. Marchesini, L. Marzari, A. Farinelli, and C. Amato, “Safe Deep Reinforcement Learning by Verifying Task-Level Properties,” Proc. of the 2023 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1466-1475, 2023.
- [23] Y. Liu, A. Halev, and X. Liu, “Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey,” Proc. of the 13th Int. Joint Conf. on Artificial Intelligence (IJCAI-21), pp. 4508-4515, 2021. https://doi.org/10.24963/ijcai.2021/614
- [24] S. Lu, K. Zhang, T. Chen, T. Başar, and L. Horesh, “Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.35, No.10, pp. 8767-8775, 2021. https://doi.org/10.1609/aaai.v35i10.17062
- [25] C. Liu, N. Geng, V. Aggarwal, T. Lan, Y. Yang, and M. Xu, “CMIX: Deep Multi-agent Reinforcement Learning with Peak and Average Constraints,” N. Oliver, F. Pérez-Cruz, S. Kramer, J. Read, and J. A. Lozano (Eds.), “Machine Learning and Knowledge Discovery in Databases. Research Track,” pp. 157-173, Springer Cham, 2021. https://doi.org/10.1007/978-3-030-86486-6
- [26] I. ElSayed-Aly, S. Bharadwaj, C. Amato, R. Ehlers, U. Topcu, and L. Feng, “Safe Multi-Agent Reinforcement Learning via Shielding,” Proc. of the 20th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 483-491, 2021.
- [27] S. Gu, J. G. Kuba, Y. Chen, Y. Du, L. Yang, A. Knoll, and Y. Yang, “Safe Multi-Agent Reinforcement Learning for Multi-Robot Control,” Artificial Intelligence, Vol.319, Article No.103905, 2023. https://doi.org/10.1016/j.artint.2023.103905
- [28] A. Demir, E. Çilden, and F. Polat, “Landmark Based Reward Shaping in Reinforcement Learning with Hidden States,” Proc. of the 18th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1922-1924, 2019.
- [29] M. H. Ikram, S. Khaliq, M. L. Anjum, and W. Hussain, “Perceptual Aliasing++: Adversarial Attack for Visual SLAM Front-End and Back-End,” IEEE Robotics and Automation Letters, Vol.7, No.2, pp. 4670-4677, 2022. https://doi.org/10.1109/LRA.2022.3150031
- [30] L. Chrisman, “Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach,” Proc. of the 10th National Conf. on Artificial Intelligence (AAAI), pp. 183-188, 1992.
- [31] O. Catal, T. Verbelen, N. Wang, M. Hartmann, and B. Dhoedt, “Bio-Inspired Monocular Drone SLAM,” System Engineering for Constrained Embedded Systems, pp. 21-26, 2022. https://doi.org/10.1145/3522784.3522788
- [32] S. Thrun, W. Burgard, and D. Fox, “A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots,” Machine Learning, Vol.31, No.1, pp. 29-53, 1998. https://doi.org/10.1023/A:1007436523611
- [33] J.-S. Gutmann and K. Konolige, “Incremental Mapping of Large Cyclic Environments,” Proc. 1999 IEEE Int. Symposium on Computational Intelligence in Robotics and Automation (CIRA’99), pp. 318-325, 1999. https://doi.org/10.1109/CIRA.1999.810068
- [34] P.-Y. Lajoie, S. Hu, G. Beltrame, and L. Carlone, “Modeling Perceptual Aliasing in SLAM via Discrete-Continuous Graphical Models,” IEEE Robotics and Automation Letters, Vol.4, No.2, pp. 1232-1239, 2019. https://doi.org/10.1109/LRA.2019.2894852
- [35] A. Ranganathan, E. Menegatti, and F. Dellaert, “Bayesian Inference in the Space of Topological Maps,” IEEE Trans. on Robotics, Vol.22, No.1, pp. 92-107, 2006. https://doi.org/10.1109/TRO.2005.861457
- [36] P. Gao, Q. Zhu, H. Lu, C. Gan, and H. Zhang, “Deep Masked Graph Matching for Correspondence Identification in Collaborative Perception,” 2023 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 6117-6123, 2023. https://doi.org/10.1109/ICRA48891.2023.10161231
- [37] D. Van Opdenbosch and E. Steinbach, “Collaborative Visual SLAM Using Compressed Feature Exchange,” IEEE Robotics and Automation Letters, Vol.4, No.1, pp. 57-64, 2019. https://doi.org/10.1109/LRA.2018.2878920
- [38] A. Siddique, W. N. Browne, and G. M. Grimshaw, “Frames-of-Reference-Based Learning: Overcoming Perceptual Aliasing in Multistep Decision-Making Tasks,” IEEE Trans. on Evolutionary Computation, Vol.26, No.1, pp. 174-187, 2022. https://doi.org/10.1109/TEVC.2021.3102241
- [39] F. Uwano and W. Browne, “Hierarchical Frames-of-References in Learning Classifier Systems,” Proc. of the Companion Conf. on Genetic and Evolutionary Computation (GECCO’23), pp. 335-338, 2023. https://doi.org/10.1145/3583133.3590588
- [40] B. Wang, J. Xie, and N. Atanasov, “DARL1N: Distributed Multi-Agent Reinforcement Learning with One-Hop Neighbors,” 2022 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 9003-9010, 2022. https://doi.org/10.1109/IROS47612.2022.9981441
- [41] J. Foerster, I. A. Assael, N. d. Freitas, and S. Whiteson, “Learning to Communicate with Deep Multi-Agent Reinforcement Learning,” Proc. of Advances in Neural Information Processing Systems (NIPS), Vol.29, 2016.
- [42] Y. Niu, R. Paleja, and M. Gombolay, “Multi-Agent Graph-Attention Communication and Teaming,” Proc. of the 20th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 964-973, 2021.
- [43] Z. Sun, H. Wu, Y. Shi, X. Yu, Y. Gao, W. Pei, Z. Yang, H. Piao, and Y. Hou, “Multi-Agent Air Combat with Two-Stage Graph-Attention Communication,” Neural Computing and Applications, Vol.35, No.27, pp. 19765-19781, 2023. https://doi.org/10.1007/s00521-023-08784-7
- [44] A. Das, T. Gervet, J. Romoff, D. Batra, D. Parik, M. Rabbat, and J. Pineau, “TarMAC: Targeted Multi-Agent Communication,” Proc. of the 36th Int. Conf. on Machine Learning, Vol.97, pp. 1538-1546, 2019.
- [45] A. Y. Ng, D. Harada, and S. J. Russell, “Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping,” Proc. of the 16th Int. Conf. on Machine Learning (ICML), pp. 278-287, 1999.
- [46] S. Devlin and D. Kudenko, “Dynamic Potential-Based Reward Shaping,” Proc. of the 11th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 433-440, 2012.
- [47] P. Mannion, S. Devlin, J. Duggan, and E. Howley, “Reward Shaping for Knowledge-Based Multi-Objective Multi-Agent Reinforcement Learning,” The Knowledge Engineering Review, Vol.33, Article No.e23, 2018. https://doi.org/10.1017/S0269888918000292
- [48] S. Russell, “Learning Agents for Uncertain Environments (Extended Abstract),” Proc. of the 11th Annual Conf. on Computational Learning Theory (COLT), pp. 101-103, 1998. https://doi.org/10.1145/279943.279964
- [49] M. Kuderer, S. Gulati, and W. Burgard, “Learning Driving Styles for Autonomous Vehicles From Demonstration,” 2015 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 2641-2646, 2015. https://doi.org/10.1109/ICRA.2015.7139555
- [50] Z. Wu, L. Sun, W. Zhan, C. Yang, and M. Tomizuka, “Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving,” IEEE Robotics and Automation Letters, Vol.5, No.4, pp. 5355-5362, 2020. https://doi.org/10.1109/LRA.2020.3005126
- [51] D. Brown, W. Goo, P. Nagarajan, and S. Niekum, “Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations,” Proc. of the 36th Int. Conf. on Machine Learning (ICML), pp. 783-792, 2019.
- [52] K. Bogert and P. Doshi, “Multi-Robot Inverse Reinforcement Learning Under Occlusion with Interactions,” Proc. of the 2014 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 173-180, 2014.
- [53] L. Fan, G. Wang, Y. Jiang, A. Mandlekar, Y. Yang, H. Zhu, A. Tang, D.-A. Huang, Y. Zhu, and A. Anandkumar, “MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge,” S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), “Advances in Neural Information Processing Systems,” Vol.35, pp. 18343-18362, Curran Associates, Inc., 2022.
- [54] Z. Wang, S. Cai, A. Liu, Y. Jin, J. Hou, B. Zhang, H. Lin, Z. He, Z. Zheng, Y. Yang, X. Ma, and Y. Liang, “JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models,” arXiv preprint, arXiv:2311.05997, 2023. https://doi.org/10.48550/arXiv.2311.05997
- [55] S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-Maron, M. Gimenez, Y. Sulsky, J. Kay, J. T. Springenberg, T. Eccles, J. Bruce, A. Razavi, A. Edwards, N. Heess, Y. Chen, R. Hadsell, O. Vinyals, M. Bordbar, and N. d. Freitas, “A Generalist Agent,” arXiv preprint, arXiv:2205.06175, 2022. https://doi.org/10.48550/arXiv.2205.06175
- [56] M. Hausknecht, P. Ammanabrolu, M.-A. Côté, and X. Yuan, “Interactive Fiction Games: A Colossal Adventure,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.34, No.05, pp. 7903-7910, 2020. https://doi.org/10.1609/aaai.v34i05.6297
- [57] A. Joshi, A. Ahmad, U. Pandey, and A. Modi, “ScriptWorld: Text Based Environment For Learning Procedural Knowledge,” Proc. of the 32nd Int. Joint Conf. on Artificial Intelligence (IJCAI-23), pp. 5095-5103, 2023.
- [58] A. Kita, N. Suenari, M. Okada, and T. Taniguchi, “Online Re-Planning and Adaptive Parameter Update for Multi-Agent Path Finding with Stochastic Travel Times,” Proc. of the 2023 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pp. 2556-2558, 2023.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.