single-dr.php

JDR Vol.18 No.8 pp. 884-894
(2023)
doi: 10.20965/jdr.2023.p0884

Paper:

Generating Diverse Optimal Road Management Plans in Post-Disaster by Applying Envelope Multi-Objective Deep Reinforcement Learning

Soo-Hyun Joo*,† ORCID Icon, Yoshiki Ogawa** ORCID Icon, and Yoshihide Sekimoto** ORCID Icon

*International Development & Infrastructure Network Lab, Hongik University
Z1-202, 94 Wausan-ro, Mapo-gu, Seoul 04066, Korea

Corresponding author

**Center for Spatial Information Science, The University of Tokyo
Tokyo, Japan

Received:
May 8, 2023
Accepted:
October 16, 2023
Published:
December 1, 2023
Keywords:
western Japan flooding, road restoration, relative importance, multi-objective reinforcement learning
Abstract

The authors used a data-driven reinforcement learning model for the post-disaster rapid recovery of human mobility, considering human-mobility recovery rate, road connectivity, and travel cost as the recovery components, to generate the reward framework. Each component has relative importance with respect to the others. However, if the preference is different from the original one, the optimal policy may not always be identified. This limitation must be addressed to enhance the robustness and generalizability of the proposed deep Q-network model. Therefore, a set of optimal policies were identified over a predetermined preference space, and the underlying importance was evaluated by applying envelope multi-objective reinforcement learning. The agent used in this study could distinguish the importance of each damaged road based on a given relative preference and derive a road-recovery policy suitable for each criterion. Furthermore, the authors provided the guidelines for constructing the optimal road-management plan. Based on the generalized policy network, the government can access diverse restoration strategies and select the most appropriate one depending on the disaster situation.

Cite this article as:
S. Joo, Y. Ogawa, and Y. Sekimoto, “Generating Diverse Optimal Road Management Plans in Post-Disaster by Applying Envelope Multi-Objective Deep Reinforcement Learning,” J. Disaster Res., Vol.18 No.8, pp. 884-894, 2023.
Data files:
References
  1. [1] Weathernews, Inc., “Analysis of 20,000 flood damage reports in Western Japan Flooding,” July 10, 2018 (in Japanese). https://jp.weathernews.com/news/23807/ [Accessed November 1, 2022]
  2. [2] S. Joo, Y. Ogawa, and Y. Sekimoto, “Road-reconstruction after multi-locational flooding in multi-agent deep RL with the consideration of human mobility – Case study: Western Japan flooding in 2018,” Int. J. of Disaster Risk Reduction, Vol.70, 102780, 2022. https://doi.org/10.1016/j.ijdrr.2021.102780
  3. [3] Asahi Weekly, “582 sections of roads nationwide are closed and traffic jams occur,” July 15, 2018. https://www.asahi.com/articles/ASL7H35D0L7HPTIL00H.html [Accessed October 10, 2022]
  4. [4] R. Wako, Y. Sekimoto, H. Kanasugi, and R. Shibasaki, “Analysis of people’s route and destination choice in evacuation using GPS log data,” J. of Japan Society of Civil Engineering, Ser. D3 (Infrastructure Planning and Management), Vol.70, No.5, pp. I_681-I_688, 2014 (in Japanese). https://doi.org/10.2208/jscejipm.70.I_681
  5. [5] X. Lu, L. Bengtsson, and P. Holme, “Predictability of population displacement after the 2010 Haiti earthquake,” Proc. of the National Academy of Sciences, Vol.109, No.29, pp. 11576-11581, 2012. https://doi.org/10.1073/pnas.1203882109
  6. [6] X. Song et al., “Modeling and probabilistic reasoning of population evacuation during large-scale disaster,” Proc. of the 19th ACM SIGKDD Int. Conf. on Knowledge discovery and data mining (KDD’13), pp. 1231-1239, 2013. https://doi.org/10.1145/2487575.2488189
  7. [7] K. C. Roy and S. Hasan, “Modeling the dynamics of hurricane evacuation decisions from Twitter data: An input output hidden Markov modeling approach,” Trans. Research Part C: Emerging Technologies, Vol.123, 102976, 2021. https://doi.org/10.1016/j.trc.2021.102976
  8. [8] T. Yabe, K. Tsubouchi, A. Sudo, and Y. Sekimoto, “A framework for evacuation hotspot detection after large scale disasters using location data from smartphones: Case study of Kumamoto earthquake,” Proc. of the 24th ACM SIGSPATIAL Int. Conf. on Advances in Geographic Information Systems (SIGSPACIAL’16), 44, 2016. https://doi.org/10.1145/2996913.2997014
  9. [9] L. Zhang, X. Lv, and S. Dhakal, “A reinforcement learning-based stakeholder value aggregation model for collaborative decision making on disaster resilience,” Y. K. Cho, F. Leite, A. Behzadan, and C. Wang (Eds.), “Computing in Civil Engineering 2019: Smart Cities, Sustainability, and Resilience,” pp. 490-497, American Society of Civil Engineers, 2019. https://doi.org/10.1061/9780784482445.063
  10. [10] Z. Yang et al., “Coordinating disaster emergency response with heuristic reinforcement learning,” Proc. of the 12th IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining (ASONAM’20), pp. 565-572, 2020. https://doi.org/10.1109/ASONAM49781.2020.9381416
  11. [11] S. Ivanov, “Reinforcement learning textbook,” arXiv: 2201.09746, 2022. https://doi.org/10.48550/arXiv.2201.09746
  12. [12] Z.-P. Su, J.-G. Jiang, C.-Y. Liang, and G.-F. Zhang, “Path selection in disaster response management based on Q-learning,” Int. J. of Automation and Computing, Vol.8, No.1, pp. 100-106, 2011. https://doi.org/10.1007/s11633-010-0560-2
  13. [13] S. Yang, Y. Ogawa, K. Ikeuchi, Y. Akiyama, and R. Shibasaki, “Firm-level behavior control after large-scale urban flooding using multi-agent deep reinforcement learning,” Proc. of the 2nd ACM SIGSPATIAL Int. Workshop on GeoSpatial Simulation (GeoSim’19), pp. 24-27, 2019. https://doi.org/10.1145/3356470.3365529
  14. [14] E. Hayat and D. Amaratunga, “Road reconstruction in post-disaster recovery; challenges and obstacles,” 2011. http://eprints.hud.ac.uk/id/eprint/30774/ [Accessed April 4, 2022]
  15. [15] C. Kaliba, M. Muya, and K. Mumba, “Cost escalation and schedule delays in road construction projects in Zambia,” Int. J. of Project Management, Vol.27, No.5, pp. 522-531, 2009. https://doi.org/10.1016/j.ijproman.2008.07.003
  16. [16] K. Van Moffaert, M. M. Drugan, and A. Nowé, “Scalarized multi-objective reinforcement learning: Novel design techniques,” 2013 IEEE Symp. on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 191-199, 2013. https://doi.org/10.1109/ADPRL.2013.6615007
  17. [17] R. Yang, X. Sun, and K. Narasimhan, “A generalized algorithm for multi-objective reinforcement learning and policy adaption,” J. Neural Inf. Process, Vol.32, 2019.
  18. [18] H. Mossalam, Y. M. Assael, D. M. Roijers, and S. Whiteson, “Multi-objective deep reinforcement learning,” arXiv: 1610.02707, 2016. https://doi.org/10.48550/arXiv.1610.02707
  19. [19] K. Van Moffaert and A. Nowé, “Multi-objective reinforcement learning using sets of Pareto dominating policies,” J. of Machine Learning Research, Vol.15, No.1, pp. 3483-3512, 2014.
  20. [20] W. Si, J. Li, P. Ding, and R. Rao, “A multi-objective deep reinforcement learning approach for stock index future’s intraday trading,” 2017 10th Int. Symp. on Computational Intelligence and Design (ISCID), pp. 431-436, 2017. https://doi.org/10.1109/ISCID.2017.210
  21. [21] M. A. Khamis and W. Gomaa, “Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework,” Engineering Applications of Artificial Intelligence, Vol.29, pp. 134-151, 2014. https://doi.org/10.1016/j.engappai.2014.01.007
  22. [22] L. Ma et al., “Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning,” Knowledge-Based Systems, Vol.133, pp. 278-293, 2017. https://doi.org/10.1016/j.knosys.2017.07.024
  23. [23] Japan Digital Road Map, “What is the DRM Database?,” 2005. http://www.drm.jp/english/drm/database/structure.html [Accessed March 13, 2022]
  24. [24] S. Joo, T. Kashiyama, Y. Sekimoto, and T. Seto, “An analysis of factors influencing disaster mobility using location data from smartphones: Case study of western Japan flooding,” J. Disaster Res., Vol.14, No.6, pp. 903-911, 2019. https://doi.org/10.20965/jdr.2019.p0903
  25. [25] T. Yabe, Y. Sekimoto, A. Sudo, and K. Tsubouchi, “Predicting delay of commuting activities following frequently occurring disasters using location data from smartphones,” J. Disaster Res., Vol.12, No.2, pp. 287-295, 2017. 10.20965/jdr.2017.p0287
  26. [26] K. Ohkubo, K. Kamemura, and M. Hamada, “On the business continuity planning of expressway based on the case analysis of embankment damage due to natural disaster,” J. of Japan Society of Civil Engineers, Ser. F5 (Professional Practices in Civil Engineering), Vol.69, No.1, pp. 1-13, 2013 (in Japanese). https://doi.org/10.2208/jscejppce.69.1
  27. [27] S. Thrun and A. Schwartz, “Issues in using function approximation for reinforcement learning,” Proc. of 4th Connectionist Models Summer School, 1993.
  28. [28] R. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction,” MIT Press, 2018.
  29. [29] A. Abels, D. Roijers, T. Lenaerts, A. Nowé, and D. Steckelmacher, “Dynamic weights in multi-objective deep reinforcement learning,” Proc. of the 36th Int. Conf. on Machine Leaning, pp. 11-20, 2019.
  30. [30] T. Tajmajer, “Modular multi-objective deep reinforcement learning with decision values,” Proc. of the Federated Conf. on Computer Science and Information Systems (FedCSIS), pp. 85-93, 2018.
  31. [31] C. C. White, C. I. White, and K. W. Kim, “Solution procedures for vector criterion Markov decision processes,” Large Scale Systems, Vol.1, No.2, pp. 129-140, 1980.
  32. [32] D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A survey of multi-objective sequential decision-making,” J. of Artificial Intelligence Research, Vol.48, No.1, pp. 67-113, 2013.
  33. [33] S. Natarajan and P. Tadepalli, “Dynamic preferences in multi-criteria reinforcement learning,” Proc. of the 22nd Int. Conf. on Machine Learning (ICML’05), pp. 601-608, 2005. https://doi.org/10.1145/1102351.1102427
  34. [34] M. Pirotta, S. Parisi, and M. Restelli, “Multi-objective reinforcement learning with continuous pareto frontier approximation,” Proc. of the 29th AAAI Conf. on Artificial Intelligence (AAAI’15), pp. 2928-2934, 2015.
  35. [35] S. Parisi, M. Pirotta, and J. Peters, “Manifold-based multi-objective policy search with sample reuse,” Neurocomputing, Vol.263, pp. 3-14, 2017. https://doi.org/10.1016/j.neucom.2016.11.094
  36. [36] A. Castelletti, F. Pianosi, and M. Restelli, “Multi-objective fitted Q-iteration: Pareto frontier approximation in one single run,” 2011 Int. Conf. on Networking, Sensing and Control, pp. 260-265, 2011. https://doi.org/10.1109/ICNSC.2011.5874921
  37. [37] A. Castelletti, F. Pianosi, and M. Restelli, “Tree-based fitted Q-iteration for multi-objective Markov decision problems,” The 2012 Int. Joint Conf. on Neural Networks (IJCNN), 2012. https://doi.org/10.1109/IJCNN.2012.6252759
  38. [38] Q. Meng, W. H. K. Lam, and L. Yang, “General stochastic user equilibrium traffic assignment problem with link capacity constraints,” J. of Advanced Transportation, Vol.42, No.4, pp. 429-465, 2008. https://doi.org/10.1002/atr.5670420403
  39. [39] Japan International Cooperation Agency (JICA), “The study of reconstruction processes from large-scale disasters – JICA’s support for reconstruction –,” 2013.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Apr. 22, 2024