Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-Making

Akiko Ikenaga; Sachiyo Arai

doi:10.20965/jaciii.2024.p0393

single-jc.php

« previous

JACIII Vol.28 No.2 pp. 393-402

doi: 10.20965/jaciii.2024.p0393

(2024)

Research Paper:

Views over last 60 days: 3,499

Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-Making

Akiko Ikenaga and Sachiyo Arai^†

Graduate School of Science and Engineering, Chiba University
1-33 Yayoi-cho, Inage-ku, Chiba 263-8522, Japan

^†Corresponding author

Received:

June 19, 2023

Accepted:

December 4, 2023

Published:

March 20, 2024

Keywords:

multi-objective reinforcement learning, multi-objective Markov decision process, inverse reinforcement learning, order of preference

Abstract

Sequential decision-making under multiple objective functions includes the problem of exhaustively searching for a Pareto-optimal policy and the problem of selecting a policy from the resulting set of Pareto-optimal policies based on the decision maker’s preferences. This paper focuses on the latter problem. In order to select a policy that reflects the decision maker’s preferences, it is necessary to order these policies, which is problematic because the decision-maker’s preferences are generally tacit knowledge. Furthermore, it is difficult to order them quantitatively. For this reason, conventional methods have mainly been used to elicit preferences through dialogue with decision-makers and through one-to-one comparisons. In contrast, this paper proposes a method based on inverse reinforcement learning to estimate the weight of each objective from the decision-making sequence. The estimated weights can be used to quantitatively evaluate the Pareto-optimal policies from the viewpoints of the decision-makers preferences. We applied the proposed method to the multi-objective reinforcement learning benchmark problem and verified its effectiveness as an elicitation method of weights for each objective function.

Cite this article as:

A. Ikenaga and S. Arai, “Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-Making,” J. Adv. Comput. Intell. Intell. Inform., Vol.28 No.2, pp. 393-402, 2024.

Data files:

References

[1] D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A Survey of Multi-Objective Sequential Decision-Making,” J. of Artificial Intelligence Research, Vol.48, Issue 1, pp. 67-113, 2013.
[2] L. Barrett and S. Narayanan, “Learning All Optimal Policies with Multiple Criteria,” Proc. of the 25th Int. Conf. on Machine Learning, pp. 41-47, 2008. https://doi.org/10.1145/1390156.1390162
[3] C. Liu, X. Xu, and D. Hu, “Multiobjective Reinforcement Learning: A Comprehensive Overview,” IEEE Trans. on Systems, Man, and Cybernetics: Systems, Vol.45, Issue 3, pp. 385-398, 2014. https://doi.org/10.1109/TSMC.2014.2358639
[4] K. Van Moffaert and A. Nowé, “Multi-Objective Reinforcement Learning Using Sets of Pareto Dominating Policies,” The J. of Machine Learning Research, Vol.15, Issue 1, pp. 3483-3512, 2014.
[5] S. Guo, S. Sanner, and E. V. Bonilla, “Gaussian Process Preference Elicitation,” Advances in Neural Information Processing Systems (NIPS’2010), Vol.23, pp. 262-270, 2010.
[6] T. L. Saaty, “A Scaling Method for Priorities in Hierarchical Structures,” J. of Mathematical Psychology, Vol.15, Issue 3, pp. 234-281, 1977. https://doi.org/10.1016/0022-2496(77)90033-5
[7] P. Abbeel and A. Y. Ng, “Apprenticeship Learning via Inverse Reinforcement Learning,” Proc. of the 21st Int. Conf. on Machine Learning, 2004. https://doi.org/10.1145/1015330.1015430
[8] P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, and E. Dekker, “Empirical Evaluation Methods for Multiobjective Reinforcement Learning Algorithms,” Machine Learning, Vol.84, Issues 1-2, pp. 51-80, 2011. https://doi.org/10.1007/s10994-010-5232-5
[9] K. Van Moffaert, M. M. Drugan, and A. Nowé, “Scalarized Multi-Objective Reinforcement Learning: Novel Design Techniques,” 2013 IEEE Symp. on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 191-199, 2013. https://doi.org/10.1109/ADPRL.2013.6615007
[10] P. Dragone, S. Teso, and A. Passerini, “Constructive Preference Elicitation Over Hybrid Combinatorial Spaces,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.32, No.1, pp. 2943-2950, 2018. https://doi.org/10.1609/aaai.v32i1.11804
[11] P. Viappiani and C. Boutilier, “Recommendation Sets and Choice Queries: There Is No Exploration/Exploitation Tradeoff!,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.25, No.1, pp. 1571-1574, 2011. https://doi.org/10.1609/aaai.v25i1.7954
[12] L. M. Zintgraf, D. M. Roijers, S. Linders, C. M. Jonker, and A. Nowé, “Ordered preference elicitation strategies for supporting multi-objective decision making,” Proc. of the 17th Int. Conf. on Autonomous Agents and MultiAgent Systems, pp. 1477-1485, 2018.
[13] P. Vamplew, J. Yearwood, R. Dazeley, and A. Berry, “On the Limitations of Scalarisation for Multi-Objective Reinforcement Learning of Pareto Fronts,” Australasian Joint Conf. on Artificial Intelligence, pp. 372-378, 2008. https://doi.org/10.1007/978-3-540-89378-3_37
[14] P. Mannion, S. Devlin, K. Mason, J. Duggan, and E. Howley, “Policy Invariance Under Reward Transformations for Multi-Objective Reinforcement Learning,” Neurocomputing, Vol.263, pp. 60-73, 2017. https://doi.org/10.1016/j.neucom.2017.05.090

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A Survey of Multi-Objective Sequential Decision-Making,” J. of Artificial Intelligence Research, Vol.48, Issue 1, pp. 67-113, 2013.

[2] [2] L. Barrett and S. Narayanan, “Learning All Optimal Policies with Multiple Criteria,” Proc. of the 25th Int. Conf. on Machine Learning, pp. 41-47, 2008. https://doi.org/10.1145/1390156.1390162

[3] [3] C. Liu, X. Xu, and D. Hu, “Multiobjective Reinforcement Learning: A Comprehensive Overview,” IEEE Trans. on Systems, Man, and Cybernetics: Systems, Vol.45, Issue 3, pp. 385-398, 2014. https://doi.org/10.1109/TSMC.2014.2358639

[4] [4] K. Van Moffaert and A. Nowé, “Multi-Objective Reinforcement Learning Using Sets of Pareto Dominating Policies,” The J. of Machine Learning Research, Vol.15, Issue 1, pp. 3483-3512, 2014.

[5] [5] S. Guo, S. Sanner, and E. V. Bonilla, “Gaussian Process Preference Elicitation,” Advances in Neural Information Processing Systems (NIPS’2010), Vol.23, pp. 262-270, 2010.

[6] [6] T. L. Saaty, “A Scaling Method for Priorities in Hierarchical Structures,” J. of Mathematical Psychology, Vol.15, Issue 3, pp. 234-281, 1977. https://doi.org/10.1016/0022-2496(77)90033-5

[7] [7] P. Abbeel and A. Y. Ng, “Apprenticeship Learning via Inverse Reinforcement Learning,” Proc. of the 21st Int. Conf. on Machine Learning, 2004. https://doi.org/10.1145/1015330.1015430

[8] [8] P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, and E. Dekker, “Empirical Evaluation Methods for Multiobjective Reinforcement Learning Algorithms,” Machine Learning, Vol.84, Issues 1-2, pp. 51-80, 2011. https://doi.org/10.1007/s10994-010-5232-5

[9] [9] K. Van Moffaert, M. M. Drugan, and A. Nowé, “Scalarized Multi-Objective Reinforcement Learning: Novel Design Techniques,” 2013 IEEE Symp. on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 191-199, 2013. https://doi.org/10.1109/ADPRL.2013.6615007

[10] [10] P. Dragone, S. Teso, and A. Passerini, “Constructive Preference Elicitation Over Hybrid Combinatorial Spaces,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.32, No.1, pp. 2943-2950, 2018. https://doi.org/10.1609/aaai.v32i1.11804

[11] [11] P. Viappiani and C. Boutilier, “Recommendation Sets and Choice Queries: There Is No Exploration/Exploitation Tradeoff!,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.25, No.1, pp. 1571-1574, 2011. https://doi.org/10.1609/aaai.v25i1.7954

[12] [12] L. M. Zintgraf, D. M. Roijers, S. Linders, C. M. Jonker, and A. Nowé, “Ordered preference elicitation strategies for supporting multi-objective decision making,” Proc. of the 17th Int. Conf. on Autonomous Agents and MultiAgent Systems, pp. 1477-1485, 2018.

[13] [13] P. Vamplew, J. Yearwood, R. Dazeley, and A. Berry, “On the Limitations of Scalarisation for Multi-Objective Reinforcement Learning of Pareto Fronts,” Australasian Joint Conf. on Artificial Intelligence, pp. 372-378, 2008. https://doi.org/10.1007/978-3-540-89378-3_37

[14] [14] P. Mannion, S. Devlin, K. Mason, J. Duggan, and E. Howley, “Policy Invariance Under Reward Transformations for Multi-Objective Reinforcement Learning,” Neurocomputing, Vol.263, pp. 60-73, 2017. https://doi.org/10.1016/j.neucom.2017.05.090

Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-Making

Akiko Ikenaga and Sachiyo Arai†

Akiko Ikenaga and Sachiyo Arai^†