JACIII Vol.28 No.2 pp. 380-392
doi: 10.20965/jaciii.2024.p0380

Research Paper:

Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data

Fumito Uwano*,† ORCID Icon, Satoshi Hasegawa**, and Keiki Takadama*** ORCID Icon

*Faculty of Environmental, Life, Natural Science and Technology, Okayama University
3-1-1 Tsushima-naka, Kita-ku, Okayama 700-8530, Japan

Corresponding author

**Konica Minolta, Inc.
2-7-2 Marunouchi, Chiyoda-ku, Tokyo 100-0005, Japan

***Graduate School of Informatics and Engineering, The University of Electro-Communications
1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan

June 16, 2023
December 4, 2023
March 20, 2024
inverse reinforcement learning, data generation, reward design, sub-optimal data

Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including lack and detour. The proposed method searches for new actions to determine optimal expert action data. This study adopted maze problems with sub-optimal expert action data to investigate the performance of the proposed method. The experimental results show that the proposed method finds optimal expert data better than the conventional method, and the proposed search mechanisms perform better than random search.

Cite this article as:
F. Uwano, S. Hasegawa, and K. Takadama, “Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data,” J. Adv. Comput. Intell. Intell. Inform., Vol.28 No.2, pp. 380-392, 2024.
Data files:
  1. [1] V. Mnih, K. Kavukcuoglu, D. Silver et al., “Human-level control through deep reinforcement learning,” Nature, Vol.518, No.7540, pp. 529-533, 2015.
  2. [2] A. Y. Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” Proc. of the 17th Int. Conf. on Machine Learning, pp. 663-670, 2000.
  3. [3] M. Kuderer, S. Gulati, and W. Burgard, “Learning driving styles for autonomous vehicles from demonstration,” 2015 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 2641-2646, 2015.
  4. [4] Z. Wu, L. Sun, W. Zhan, C. Yang, and M. Tomizuka, “Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving,” IEEE Robotics and Automation Letters, Vol.5, No.4, pp. 5355-5362, 2020.
  5. [5] J. Zheng, S. Liu, and L. M. Ni, “Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise,” Proc. of the 28th AAAI Conf. on Artificial Intelligence, 2014.
  6. [6] K. Shiarlis, J. Messias, and S. Whiteson, “Inverse Reinforcement Learning from Failure,” Proc. of the 2016 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS’16), pp. 1060-1068, 2016.
  7. [7] D. S. Brown, W. Goo, P. Nagarajan, and S. Niekum, “Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations,” Proc. of the 36th Int. Conf. on Machine Learning, pp. 783-792, 2019.
  8. [8] K. Amin, N. Jiang, and S. Singh, “Repeated Inverse Reinforcement Learning,” Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS’17), pp. 1813-1822, 2017.
  9. [9] M. Lopes, F. Melo, and L. Montesano, “Active Learning for Reward Estimation in Inverse Reinforcement Learning,” Machine Learning and Knowledge Discovery in Databases, pp. 31-46, 2009.
  10. [10] Y. Cui and S. Niekum, “Active Reward Learning from Critiques,” 2018 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 6907-6914, 2018.
  11. [11] C. Florensa, D. Held, X. Geng, and P. Abbeel, “Automatic Goal Generation for Reinforcement Learning Agents,” Proc. of the 35th Int. Conf. on Machine Learning, pp. 1515-1528, 2018.
  12. [12] L. Yu, J. Song, and S. Ermon, “Multi-Agent Adversarial Inverse Reinforcement Learning,” Proc. of the 36th Int. Conf. on Machine Learning, pp. 7194-7201, 2019.
  13. [13] A. G. Barto, “Intrinsic Motivation and Reinforcement Learning,” G. Baldassarre and M. Mirolli (Eds.), “Intrinsically Motivated Learning in Natural and Artificial Systems,” pp. 17-47, Springer Berlin Heidelberg, 2013.
  14. [14] D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-Driven Exploration by Self-Supervised Prediction,” Proc. of the 34th Int. Conf. on Machine Learning, pp. 2778-2787, 2017.
  15. [15] Y. Burda, H. Edwards, A. Storkey, and O. Klimov, “Exploration by random network distillation,” Proc. of the 7th Int. Conf. on Learning Representations (ICLR 2019), pp. 1-17, 2019.
  16. [16] B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey, “Maximum Entropy Inverse Reinforcement Learning,” Proc. of the 23rd AAAI Conf. on Artificial Intelligence, pp. 1433-1438, 2008.
  17. [17] R. S. Sutton and A. G. Barto, “Introduction to Reinforcement Learning (1st ed.),” MIT Press, 1998.
  18. [18] C. J. C. H. Watkins and P. Dayan, “Technical Note: Q-Learning,” Machine Learning, Vol.8, Nos.3-4, pp. 279-292, 1992.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Apr. 05, 2024