Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data

Fumito Uwano; Satoshi Hasegawa; Keiki Takadama

doi:10.20965/jaciii.2024.p0380

single-jc.php

« previous

JACIII Vol.28 No.2 pp. 380-392

doi: 10.20965/jaciii.2024.p0380

(2024)

Research Paper:

Views over last 60 days: 4,547

Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data

Fumito Uwano^,† , Satoshi Hasegawa^, and Keiki Takadama^

^*Faculty of Environmental, Life, Natural Science and Technology, Okayama University
3-1-1 Tsushima-naka, Kita-ku, Okayama 700-8530, Japan

^†Corresponding author

^**Konica Minolta, Inc.
2-7-2 Marunouchi, Chiyoda-ku, Tokyo 100-0005, Japan

^***Graduate School of Informatics and Engineering, The University of Electro-Communications
1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan

Received:

June 16, 2023

Accepted:

December 4, 2023

Published:

March 20, 2024

Keywords:

inverse reinforcement learning, data generation, reward design, sub-optimal data

Abstract

Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including lack and detour. The proposed method searches for new actions to determine optimal expert action data. This study adopted maze problems with sub-optimal expert action data to investigate the performance of the proposed method. The experimental results show that the proposed method finds optimal expert data better than the conventional method, and the proposed search mechanisms perform better than random search.

Cite this article as:

F. Uwano, S. Hasegawa, and K. Takadama, “Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data,” J. Adv. Comput. Intell. Intell. Inform., Vol.28 No.2, pp. 380-392, 2024.

Data files:

References

[1] V. Mnih, K. Kavukcuoglu, D. Silver et al., “Human-level control through deep reinforcement learning,” Nature, Vol.518, No.7540, pp. 529-533, 2015. https://doi.org/10.1038/nature14236
[2] A. Y. Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” Proc. of the 17th Int. Conf. on Machine Learning, pp. 663-670, 2000.
[3] M. Kuderer, S. Gulati, and W. Burgard, “Learning driving styles for autonomous vehicles from demonstration,” 2015 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 2641-2646, 2015. https://doi.org/10.1109/ICRA.2015.7139555
[4] Z. Wu, L. Sun, W. Zhan, C. Yang, and M. Tomizuka, “Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving,” IEEE Robotics and Automation Letters, Vol.5, No.4, pp. 5355-5362, 2020. https://doi.org/10.1109/LRA.2020.3005126
[5] J. Zheng, S. Liu, and L. M. Ni, “Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise,” Proc. of the 28th AAAI Conf. on Artificial Intelligence, 2014. https://doi.org/10.1609/aaai.v28i1.8979
[6] K. Shiarlis, J. Messias, and S. Whiteson, “Inverse Reinforcement Learning from Failure,” Proc. of the 2016 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS’16), pp. 1060-1068, 2016.
[7] D. S. Brown, W. Goo, P. Nagarajan, and S. Niekum, “Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations,” Proc. of the 36th Int. Conf. on Machine Learning, pp. 783-792, 2019.
[8] K. Amin, N. Jiang, and S. Singh, “Repeated Inverse Reinforcement Learning,” Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS’17), pp. 1813-1822, 2017.
[9] M. Lopes, F. Melo, and L. Montesano, “Active Learning for Reward Estimation in Inverse Reinforcement Learning,” Machine Learning and Knowledge Discovery in Databases, pp. 31-46, 2009. https://doi.org/10.1007/978-3-642-04174-7_3
[10] Y. Cui and S. Niekum, “Active Reward Learning from Critiques,” 2018 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 6907-6914, 2018. https://doi.org/10.1109/ICRA.2018.8460854
[11] C. Florensa, D. Held, X. Geng, and P. Abbeel, “Automatic Goal Generation for Reinforcement Learning Agents,” Proc. of the 35th Int. Conf. on Machine Learning, pp. 1515-1528, 2018.
[12] L. Yu, J. Song, and S. Ermon, “Multi-Agent Adversarial Inverse Reinforcement Learning,” Proc. of the 36th Int. Conf. on Machine Learning, pp. 7194-7201, 2019.
[13] A. G. Barto, “Intrinsic Motivation and Reinforcement Learning,” G. Baldassarre and M. Mirolli (Eds.), “Intrinsically Motivated Learning in Natural and Artificial Systems,” pp. 17-47, Springer Berlin Heidelberg, 2013. https://doi.org/10.1007/978-3-642-32375-1_2
[14] D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-Driven Exploration by Self-Supervised Prediction,” Proc. of the 34th Int. Conf. on Machine Learning, pp. 2778-2787, 2017.
[15] Y. Burda, H. Edwards, A. Storkey, and O. Klimov, “Exploration by random network distillation,” Proc. of the 7th Int. Conf. on Learning Representations (ICLR 2019), pp. 1-17, 2019.
[16] B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey, “Maximum Entropy Inverse Reinforcement Learning,” Proc. of the 23rd AAAI Conf. on Artificial Intelligence, pp. 1433-1438, 2008.
[17] R. S. Sutton and A. G. Barto, “Introduction to Reinforcement Learning (1st ed.),” MIT Press, 1998.
[18] C. J. C. H. Watkins and P. Dayan, “Technical Note: Q-Learning,” Machine Learning, Vol.8, Nos.3-4, pp. 279-292, 1992. https://doi.org/10.1023/A:1022676722315

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] V. Mnih, K. Kavukcuoglu, D. Silver et al., “Human-level control through deep reinforcement learning,” Nature, Vol.518, No.7540, pp. 529-533, 2015. https://doi.org/10.1038/nature14236

[2] [2] A. Y. Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” Proc. of the 17th Int. Conf. on Machine Learning, pp. 663-670, 2000.

[3] [3] M. Kuderer, S. Gulati, and W. Burgard, “Learning driving styles for autonomous vehicles from demonstration,” 2015 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 2641-2646, 2015. https://doi.org/10.1109/ICRA.2015.7139555

[4] [4] Z. Wu, L. Sun, W. Zhan, C. Yang, and M. Tomizuka, “Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving,” IEEE Robotics and Automation Letters, Vol.5, No.4, pp. 5355-5362, 2020. https://doi.org/10.1109/LRA.2020.3005126

[5] [5] J. Zheng, S. Liu, and L. M. Ni, “Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise,” Proc. of the 28th AAAI Conf. on Artificial Intelligence, 2014. https://doi.org/10.1609/aaai.v28i1.8979

[6] [6] K. Shiarlis, J. Messias, and S. Whiteson, “Inverse Reinforcement Learning from Failure,” Proc. of the 2016 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS’16), pp. 1060-1068, 2016.

[7] [7] D. S. Brown, W. Goo, P. Nagarajan, and S. Niekum, “Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations,” Proc. of the 36th Int. Conf. on Machine Learning, pp. 783-792, 2019.

[8] [8] K. Amin, N. Jiang, and S. Singh, “Repeated Inverse Reinforcement Learning,” Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS’17), pp. 1813-1822, 2017.

[9] [9] M. Lopes, F. Melo, and L. Montesano, “Active Learning for Reward Estimation in Inverse Reinforcement Learning,” Machine Learning and Knowledge Discovery in Databases, pp. 31-46, 2009. https://doi.org/10.1007/978-3-642-04174-7_3

[10] [10] Y. Cui and S. Niekum, “Active Reward Learning from Critiques,” 2018 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 6907-6914, 2018. https://doi.org/10.1109/ICRA.2018.8460854

[11] [11] C. Florensa, D. Held, X. Geng, and P. Abbeel, “Automatic Goal Generation for Reinforcement Learning Agents,” Proc. of the 35th Int. Conf. on Machine Learning, pp. 1515-1528, 2018.

[12] [12] L. Yu, J. Song, and S. Ermon, “Multi-Agent Adversarial Inverse Reinforcement Learning,” Proc. of the 36th Int. Conf. on Machine Learning, pp. 7194-7201, 2019.

[13] [13] A. G. Barto, “Intrinsic Motivation and Reinforcement Learning,” G. Baldassarre and M. Mirolli (Eds.), “Intrinsically Motivated Learning in Natural and Artificial Systems,” pp. 17-47, Springer Berlin Heidelberg, 2013. https://doi.org/10.1007/978-3-642-32375-1_2

[14] [14] D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-Driven Exploration by Self-Supervised Prediction,” Proc. of the 34th Int. Conf. on Machine Learning, pp. 2778-2787, 2017.

[15] [15] Y. Burda, H. Edwards, A. Storkey, and O. Klimov, “Exploration by random network distillation,” Proc. of the 7th Int. Conf. on Learning Representations (ICLR 2019), pp. 1-17, 2019.

[16] [16] B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey, “Maximum Entropy Inverse Reinforcement Learning,” Proc. of the 23rd AAAI Conf. on Artificial Intelligence, pp. 1433-1438, 2008.

[17] [17] R. S. Sutton and A. G. Barto, “Introduction to Reinforcement Learning (1st ed.),” MIT Press, 1998.

[18] [18] C. J. C. H. Watkins and P. Dayan, “Technical Note: Q-Learning,” Machine Learning, Vol.8, Nos.3-4, pp. 279-292, 1992. https://doi.org/10.1023/A:1022676722315

Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data

Fumito Uwano*,† , Satoshi Hasegawa**, and Keiki Takadama***

Fumito Uwano^,† , Satoshi Hasegawa^, and Keiki Takadama^