Paper:
Generating Diverse Optimal Road Management Plans in Post-Disaster by Applying Envelope Multi-Objective Deep Reinforcement Learning
Soo-Hyun Joo*, , Yoshiki Ogawa** , and Yoshihide Sekimoto**
*International Development & Infrastructure Network Lab, Hongik University
Z1-202, 94 Wausan-ro, Mapo-gu, Seoul 04066, Korea
Corresponding author
**Center for Spatial Information Science, The University of Tokyo
Tokyo, Japan
The authors used a data-driven reinforcement learning model for the post-disaster rapid recovery of human mobility, considering human-mobility recovery rate, road connectivity, and travel cost as the recovery components, to generate the reward framework. Each component has relative importance with respect to the others. However, if the preference is different from the original one, the optimal policy may not always be identified. This limitation must be addressed to enhance the robustness and generalizability of the proposed deep Q-network model. Therefore, a set of optimal policies were identified over a predetermined preference space, and the underlying importance was evaluated by applying envelope multi-objective reinforcement learning. The agent used in this study could distinguish the importance of each damaged road based on a given relative preference and derive a road-recovery policy suitable for each criterion. Furthermore, the authors provided the guidelines for constructing the optimal road-management plan. Based on the generalized policy network, the government can access diverse restoration strategies and select the most appropriate one depending on the disaster situation.
- [1] Weathernews, Inc., “Analysis of 20,000 flood damage reports in Western Japan Flooding,” July 10, 2018 (in Japanese). https://jp.weathernews.com/news/23807/ [Accessed November 1, 2022]
- [2] S. Joo, Y. Ogawa, and Y. Sekimoto, “Road-reconstruction after multi-locational flooding in multi-agent deep RL with the consideration of human mobility – Case study: Western Japan flooding in 2018,” Int. J. of Disaster Risk Reduction, Vol.70, 102780, 2022. https://doi.org/10.1016/j.ijdrr.2021.102780
- [3] Asahi Weekly, “582 sections of roads nationwide are closed and traffic jams occur,” July 15, 2018. https://www.asahi.com/articles/ASL7H35D0L7HPTIL00H.html [Accessed October 10, 2022]
- [4] R. Wako, Y. Sekimoto, H. Kanasugi, and R. Shibasaki, “Analysis of people’s route and destination choice in evacuation using GPS log data,” J. of Japan Society of Civil Engineering, Ser. D3 (Infrastructure Planning and Management), Vol.70, No.5, pp. I_681-I_688, 2014 (in Japanese). https://doi.org/10.2208/jscejipm.70.I_681
- [5] X. Lu, L. Bengtsson, and P. Holme, “Predictability of population displacement after the 2010 Haiti earthquake,” Proc. of the National Academy of Sciences, Vol.109, No.29, pp. 11576-11581, 2012. https://doi.org/10.1073/pnas.1203882109
- [6] X. Song et al., “Modeling and probabilistic reasoning of population evacuation during large-scale disaster,” Proc. of the 19th ACM SIGKDD Int. Conf. on Knowledge discovery and data mining (KDD’13), pp. 1231-1239, 2013. https://doi.org/10.1145/2487575.2488189
- [7] K. C. Roy and S. Hasan, “Modeling the dynamics of hurricane evacuation decisions from Twitter data: An input output hidden Markov modeling approach,” Trans. Research Part C: Emerging Technologies, Vol.123, 102976, 2021. https://doi.org/10.1016/j.trc.2021.102976
- [8] T. Yabe, K. Tsubouchi, A. Sudo, and Y. Sekimoto, “A framework for evacuation hotspot detection after large scale disasters using location data from smartphones: Case study of Kumamoto earthquake,” Proc. of the 24th ACM SIGSPATIAL Int. Conf. on Advances in Geographic Information Systems (SIGSPACIAL’16), 44, 2016. https://doi.org/10.1145/2996913.2997014
- [9] L. Zhang, X. Lv, and S. Dhakal, “A reinforcement learning-based stakeholder value aggregation model for collaborative decision making on disaster resilience,” Y. K. Cho, F. Leite, A. Behzadan, and C. Wang (Eds.), “Computing in Civil Engineering 2019: Smart Cities, Sustainability, and Resilience,” pp. 490-497, American Society of Civil Engineers, 2019. https://doi.org/10.1061/9780784482445.063
- [10] Z. Yang et al., “Coordinating disaster emergency response with heuristic reinforcement learning,” Proc. of the 12th IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining (ASONAM’20), pp. 565-572, 2020. https://doi.org/10.1109/ASONAM49781.2020.9381416
- [11] S. Ivanov, “Reinforcement learning textbook,” arXiv: 2201.09746, 2022. https://doi.org/10.48550/arXiv.2201.09746
- [12] Z.-P. Su, J.-G. Jiang, C.-Y. Liang, and G.-F. Zhang, “Path selection in disaster response management based on Q-learning,” Int. J. of Automation and Computing, Vol.8, No.1, pp. 100-106, 2011. https://doi.org/10.1007/s11633-010-0560-2
- [13] S. Yang, Y. Ogawa, K. Ikeuchi, Y. Akiyama, and R. Shibasaki, “Firm-level behavior control after large-scale urban flooding using multi-agent deep reinforcement learning,” Proc. of the 2nd ACM SIGSPATIAL Int. Workshop on GeoSpatial Simulation (GeoSim’19), pp. 24-27, 2019. https://doi.org/10.1145/3356470.3365529
- [14] E. Hayat and D. Amaratunga, “Road reconstruction in post-disaster recovery; challenges and obstacles,” 2011. http://eprints.hud.ac.uk/id/eprint/30774/ [Accessed April 4, 2022]
- [15] C. Kaliba, M. Muya, and K. Mumba, “Cost escalation and schedule delays in road construction projects in Zambia,” Int. J. of Project Management, Vol.27, No.5, pp. 522-531, 2009. https://doi.org/10.1016/j.ijproman.2008.07.003
- [16] K. Van Moffaert, M. M. Drugan, and A. Nowé, “Scalarized multi-objective reinforcement learning: Novel design techniques,” 2013 IEEE Symp. on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 191-199, 2013. https://doi.org/10.1109/ADPRL.2013.6615007
- [17] R. Yang, X. Sun, and K. Narasimhan, “A generalized algorithm for multi-objective reinforcement learning and policy adaption,” J. Neural Inf. Process, Vol.32, 2019.
- [18] H. Mossalam, Y. M. Assael, D. M. Roijers, and S. Whiteson, “Multi-objective deep reinforcement learning,” arXiv: 1610.02707, 2016. https://doi.org/10.48550/arXiv.1610.02707
- [19] K. Van Moffaert and A. Nowé, “Multi-objective reinforcement learning using sets of Pareto dominating policies,” J. of Machine Learning Research, Vol.15, No.1, pp. 3483-3512, 2014.
- [20] W. Si, J. Li, P. Ding, and R. Rao, “A multi-objective deep reinforcement learning approach for stock index future’s intraday trading,” 2017 10th Int. Symp. on Computational Intelligence and Design (ISCID), pp. 431-436, 2017. https://doi.org/10.1109/ISCID.2017.210
- [21] M. A. Khamis and W. Gomaa, “Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework,” Engineering Applications of Artificial Intelligence, Vol.29, pp. 134-151, 2014. https://doi.org/10.1016/j.engappai.2014.01.007
- [22] L. Ma et al., “Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning,” Knowledge-Based Systems, Vol.133, pp. 278-293, 2017. https://doi.org/10.1016/j.knosys.2017.07.024
- [23] Japan Digital Road Map, “What is the DRM Database?,” 2005. http://www.drm.jp/english/drm/database/structure.html [Accessed March 13, 2022]
- [24] S. Joo, T. Kashiyama, Y. Sekimoto, and T. Seto, “An analysis of factors influencing disaster mobility using location data from smartphones: Case study of western Japan flooding,” J. Disaster Res., Vol.14, No.6, pp. 903-911, 2019. https://doi.org/10.20965/jdr.2019.p0903
- [25] T. Yabe, Y. Sekimoto, A. Sudo, and K. Tsubouchi, “Predicting delay of commuting activities following frequently occurring disasters using location data from smartphones,” J. Disaster Res., Vol.12, No.2, pp. 287-295, 2017. 10.20965/jdr.2017.p0287
- [26] K. Ohkubo, K. Kamemura, and M. Hamada, “On the business continuity planning of expressway based on the case analysis of embankment damage due to natural disaster,” J. of Japan Society of Civil Engineers, Ser. F5 (Professional Practices in Civil Engineering), Vol.69, No.1, pp. 1-13, 2013 (in Japanese). https://doi.org/10.2208/jscejppce.69.1
- [27] S. Thrun and A. Schwartz, “Issues in using function approximation for reinforcement learning,” Proc. of 4th Connectionist Models Summer School, 1993.
- [28] R. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction,” MIT Press, 2018.
- [29] A. Abels, D. Roijers, T. Lenaerts, A. Nowé, and D. Steckelmacher, “Dynamic weights in multi-objective deep reinforcement learning,” Proc. of the 36th Int. Conf. on Machine Leaning, pp. 11-20, 2019.
- [30] T. Tajmajer, “Modular multi-objective deep reinforcement learning with decision values,” Proc. of the Federated Conf. on Computer Science and Information Systems (FedCSIS), pp. 85-93, 2018.
- [31] C. C. White, C. I. White, and K. W. Kim, “Solution procedures for vector criterion Markov decision processes,” Large Scale Systems, Vol.1, No.2, pp. 129-140, 1980.
- [32] D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A survey of multi-objective sequential decision-making,” J. of Artificial Intelligence Research, Vol.48, No.1, pp. 67-113, 2013.
- [33] S. Natarajan and P. Tadepalli, “Dynamic preferences in multi-criteria reinforcement learning,” Proc. of the 22nd Int. Conf. on Machine Learning (ICML’05), pp. 601-608, 2005. https://doi.org/10.1145/1102351.1102427
- [34] M. Pirotta, S. Parisi, and M. Restelli, “Multi-objective reinforcement learning with continuous pareto frontier approximation,” Proc. of the 29th AAAI Conf. on Artificial Intelligence (AAAI’15), pp. 2928-2934, 2015.
- [35] S. Parisi, M. Pirotta, and J. Peters, “Manifold-based multi-objective policy search with sample reuse,” Neurocomputing, Vol.263, pp. 3-14, 2017. https://doi.org/10.1016/j.neucom.2016.11.094
- [36] A. Castelletti, F. Pianosi, and M. Restelli, “Multi-objective fitted Q-iteration: Pareto frontier approximation in one single run,” 2011 Int. Conf. on Networking, Sensing and Control, pp. 260-265, 2011. https://doi.org/10.1109/ICNSC.2011.5874921
- [37] A. Castelletti, F. Pianosi, and M. Restelli, “Tree-based fitted Q-iteration for multi-objective Markov decision problems,” The 2012 Int. Joint Conf. on Neural Networks (IJCNN), 2012. https://doi.org/10.1109/IJCNN.2012.6252759
- [38] Q. Meng, W. H. K. Lam, and L. Yang, “General stochastic user equilibrium traffic assignment problem with link capacity constraints,” J. of Advanced Transportation, Vol.42, No.4, pp. 429-465, 2008. https://doi.org/10.1002/atr.5670420403
- [39] Japan International Cooperation Agency (JICA), “The study of reconstruction processes from large-scale disasters – JICA’s support for reconstruction –,” 2013.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.