JACIII Vol.28 No.2 pp. 393-402
doi: 10.20965/jaciii.2024.p0393

Research Paper:

Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-Making

Akiko Ikenaga and Sachiyo Arai ORCID Icon

Graduate School of Science and Engineering, Chiba University
1-33 Yayoi-cho, Inage-ku, Chiba 263-8522, Japan

Corresponding author

June 19, 2023
December 4, 2023
March 20, 2024
multi-objective reinforcement learning, multi-objective Markov decision process, inverse reinforcement learning, order of preference

Sequential decision-making under multiple objective functions includes the problem of exhaustively searching for a Pareto-optimal policy and the problem of selecting a policy from the resulting set of Pareto-optimal policies based on the decision maker’s preferences. This paper focuses on the latter problem. In order to select a policy that reflects the decision maker’s preferences, it is necessary to order these policies, which is problematic because the decision-maker’s preferences are generally tacit knowledge. Furthermore, it is difficult to order them quantitatively. For this reason, conventional methods have mainly been used to elicit preferences through dialogue with decision-makers and through one-to-one comparisons. In contrast, this paper proposes a method based on inverse reinforcement learning to estimate the weight of each objective from the decision-making sequence. The estimated weights can be used to quantitatively evaluate the Pareto-optimal policies from the viewpoints of the decision-makers preferences. We applied the proposed method to the multi-objective reinforcement learning benchmark problem and verified its effectiveness as an elicitation method of weights for each objective function.

Cite this article as:
A. Ikenaga and S. Arai, “Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-Making,” J. Adv. Comput. Intell. Intell. Inform., Vol.28 No.2, pp. 393-402, 2024.
Data files:
  1. [1] D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A Survey of Multi-Objective Sequential Decision-Making,” J. of Artificial Intelligence Research, Vol.48, Issue 1, pp. 67-113, 2013.
  2. [2] L. Barrett and S. Narayanan, “Learning All Optimal Policies with Multiple Criteria,” Proc. of the 25th Int. Conf. on Machine Learning, pp. 41-47, 2008.
  3. [3] C. Liu, X. Xu, and D. Hu, “Multiobjective Reinforcement Learning: A Comprehensive Overview,” IEEE Trans. on Systems, Man, and Cybernetics: Systems, Vol.45, Issue 3, pp. 385-398, 2014.
  4. [4] K. Van Moffaert and A. Nowé, “Multi-Objective Reinforcement Learning Using Sets of Pareto Dominating Policies,” The J. of Machine Learning Research, Vol.15, Issue 1, pp. 3483-3512, 2014.
  5. [5] S. Guo, S. Sanner, and E. V. Bonilla, “Gaussian Process Preference Elicitation,” Advances in Neural Information Processing Systems (NIPS’2010), Vol.23, pp. 262-270, 2010.
  6. [6] T. L. Saaty, “A Scaling Method for Priorities in Hierarchical Structures,” J. of Mathematical Psychology, Vol.15, Issue 3, pp. 234-281, 1977.
  7. [7] P. Abbeel and A. Y. Ng, “Apprenticeship Learning via Inverse Reinforcement Learning,” Proc. of the 21st Int. Conf. on Machine Learning, 2004.
  8. [8] P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, and E. Dekker, “Empirical Evaluation Methods for Multiobjective Reinforcement Learning Algorithms,” Machine Learning, Vol.84, Issues 1-2, pp. 51-80, 2011.
  9. [9] K. Van Moffaert, M. M. Drugan, and A. Nowé, “Scalarized Multi-Objective Reinforcement Learning: Novel Design Techniques,” 2013 IEEE Symp. on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 191-199, 2013.
  10. [10] P. Dragone, S. Teso, and A. Passerini, “Constructive Preference Elicitation Over Hybrid Combinatorial Spaces,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.32, No.1, pp. 2943-2950, 2018.
  11. [11] P. Viappiani and C. Boutilier, “Recommendation Sets and Choice Queries: There Is No Exploration/Exploitation Tradeoff!,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.25, No.1, pp. 1571-1574, 2011.
  12. [12] L. M. Zintgraf, D. M. Roijers, S. Linders, C. M. Jonker, and A. Nowé, “Ordered preference elicitation strategies for supporting multi-objective decision making,” Proc. of the 17th Int. Conf. on Autonomous Agents and MultiAgent Systems, pp. 1477-1485, 2018.
  13. [13] P. Vamplew, J. Yearwood, R. Dazeley, and A. Berry, “On the Limitations of Scalarisation for Multi-Objective Reinforcement Learning of Pareto Fronts,” Australasian Joint Conf. on Artificial Intelligence, pp. 372-378, 2008.
  14. [14] P. Mannion, S. Devlin, K. Mason, J. Duggan, and E. Howley, “Policy Invariance Under Reward Transformations for Multi-Objective Reinforcement Learning,” Neurocomputing, Vol.263, pp. 60-73, 2017.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Jul. 12, 2024