Research Paper:
Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-Making
Akiko Ikenaga and Sachiyo Arai
Graduate School of Science and Engineering, Chiba University
1-33 Yayoi-cho, Inage-ku, Chiba 263-8522, Japan
Corresponding author
Sequential decision-making under multiple objective functions includes the problem of exhaustively searching for a Pareto-optimal policy and the problem of selecting a policy from the resulting set of Pareto-optimal policies based on the decision maker’s preferences. This paper focuses on the latter problem. In order to select a policy that reflects the decision maker’s preferences, it is necessary to order these policies, which is problematic because the decision-maker’s preferences are generally tacit knowledge. Furthermore, it is difficult to order them quantitatively. For this reason, conventional methods have mainly been used to elicit preferences through dialogue with decision-makers and through one-to-one comparisons. In contrast, this paper proposes a method based on inverse reinforcement learning to estimate the weight of each objective from the decision-making sequence. The estimated weights can be used to quantitatively evaluate the Pareto-optimal policies from the viewpoints of the decision-makers preferences. We applied the proposed method to the multi-objective reinforcement learning benchmark problem and verified its effectiveness as an elicitation method of weights for each objective function.
- [1] D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A Survey of Multi-Objective Sequential Decision-Making,” J. of Artificial Intelligence Research, Vol.48, Issue 1, pp. 67-113, 2013.
- [2] L. Barrett and S. Narayanan, “Learning All Optimal Policies with Multiple Criteria,” Proc. of the 25th Int. Conf. on Machine Learning, pp. 41-47, 2008. https://doi.org/10.1145/1390156.1390162
- [3] C. Liu, X. Xu, and D. Hu, “Multiobjective Reinforcement Learning: A Comprehensive Overview,” IEEE Trans. on Systems, Man, and Cybernetics: Systems, Vol.45, Issue 3, pp. 385-398, 2014. https://doi.org/10.1109/TSMC.2014.2358639
- [4] K. Van Moffaert and A. Nowé, “Multi-Objective Reinforcement Learning Using Sets of Pareto Dominating Policies,” The J. of Machine Learning Research, Vol.15, Issue 1, pp. 3483-3512, 2014.
- [5] S. Guo, S. Sanner, and E. V. Bonilla, “Gaussian Process Preference Elicitation,” Advances in Neural Information Processing Systems (NIPS’2010), Vol.23, pp. 262-270, 2010.
- [6] T. L. Saaty, “A Scaling Method for Priorities in Hierarchical Structures,” J. of Mathematical Psychology, Vol.15, Issue 3, pp. 234-281, 1977. https://doi.org/10.1016/0022-2496(77)90033-5
- [7] P. Abbeel and A. Y. Ng, “Apprenticeship Learning via Inverse Reinforcement Learning,” Proc. of the 21st Int. Conf. on Machine Learning, 2004. https://doi.org/10.1145/1015330.1015430
- [8] P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, and E. Dekker, “Empirical Evaluation Methods for Multiobjective Reinforcement Learning Algorithms,” Machine Learning, Vol.84, Issues 1-2, pp. 51-80, 2011. https://doi.org/10.1007/s10994-010-5232-5
- [9] K. Van Moffaert, M. M. Drugan, and A. Nowé, “Scalarized Multi-Objective Reinforcement Learning: Novel Design Techniques,” 2013 IEEE Symp. on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 191-199, 2013. https://doi.org/10.1109/ADPRL.2013.6615007
- [10] P. Dragone, S. Teso, and A. Passerini, “Constructive Preference Elicitation Over Hybrid Combinatorial Spaces,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.32, No.1, pp. 2943-2950, 2018. https://doi.org/10.1609/aaai.v32i1.11804
- [11] P. Viappiani and C. Boutilier, “Recommendation Sets and Choice Queries: There Is No Exploration/Exploitation Tradeoff!,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.25, No.1, pp. 1571-1574, 2011. https://doi.org/10.1609/aaai.v25i1.7954
- [12] L. M. Zintgraf, D. M. Roijers, S. Linders, C. M. Jonker, and A. Nowé, “Ordered preference elicitation strategies for supporting multi-objective decision making,” Proc. of the 17th Int. Conf. on Autonomous Agents and MultiAgent Systems, pp. 1477-1485, 2018.
- [13] P. Vamplew, J. Yearwood, R. Dazeley, and A. Berry, “On the Limitations of Scalarisation for Multi-Objective Reinforcement Learning of Pareto Fronts,” Australasian Joint Conf. on Artificial Intelligence, pp. 372-378, 2008. https://doi.org/10.1007/978-3-540-89378-3_37
- [14] P. Mannion, S. Devlin, K. Mason, J. Duggan, and E. Howley, “Policy Invariance Under Reward Transformations for Multi-Objective Reinforcement Learning,” Neurocomputing, Vol.263, pp. 60-73, 2017. https://doi.org/10.1016/j.neucom.2017.05.090
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.