Paper:

# Mutually Dependent Markov Decision Processes

## Toshiharu Fujita^{*} and Akifumi Kira^{**}

^{*}Graduate School of Engineering, Kyushu Institute of Technology, 1-1 Sensui-cho, Tobata, Kitakyushu 804-8550, Japan

^{**}Graduate School of Economics and Management, Tohoku University, 27-1 Kawauchi, Aoba-ku, Sendai 980-8576, Japan

In this paper, we introduce a basic framework for mutually dependent Markov decision processes (MDMDP) showing recursive mutual dependence. Our model is structured upon two types of finite-stage Markov decision processes. At each stage, the reward in one process is given by the optimal value of the alternative process problem, whose initial state is determined by the current state and decision in the original process. We formulate the MDMDP model and derive mutually dependent recursive equations by dynamic programming. Furthermore, MDMDP is illustrated in a numerical example. The model enables easier treatment of some classes of complex multi-stage decision processes.

- [1] R.E. Bellman, “Dynamic Programming,” Princeton Univ. Press, NJ, 1957.
- [2] R. A. Howard, “Dynamic Programming and Markov Processes,” MITPress, Cambridge, Mass., 1960.
- [3] G. L. Nemhauser, “Introduction to Dynamic Programming,” Wiley, NY, 1966.
- [4] D.P. Bertsekas, “Dynamic Programming and Stochastic Control,” Academic Press, NY, 1976.
- [5] D.J. White, “Dynamic Programming,” Holden-Day, San Francisco, Calif., 1969.
- [6] M. L. Puterman, “Markov Decision Processes : discrete stochastic dynamic programming,” Wiley & Sons, NY, 1994.
- [7] M. Sniedovich, “Dynamic Programming,” Marcel Dekker, Inc. NY, 1992.
- [8] S. Iwamoto and T. Fujita, “Stochastic decision-making in a fuzzy environment,” J. of Operations Research Society of Japan, Vol.38, pp. 467-482, 1995.
- [9] S. Iwamoto, T. Ueno and T. Fujita, “Controlled Markov Chains with Utility Functions,” Eds. H. Zhenting, J. A. Filar and A. Chen, Markov Processes and Controlled Markov Chains, Chap.8, pp. 135-148, Kluwer, 2002.
- [10] T. Fujita, “An optimal path problem with fuzzy expectation,” Proc. of the 8th Bellman Continuum, pp. 191-195, 2000.
- [11] T. Fujita, “An optimal path problem with threshold probability,” Frontiers in Artificial Intelligence and Applications, Vol.69, pp. 792-796, 2001.
- [12] T. Fujita, “On nondeterministic dynamic programming,” Mathematical Analysis in Economics (Kyoto, 2005), RIMS
*Kôkyûroku*No.1488, pp. 15-24, 2006. - [13] T. Fujita, “Infinite stage nondeterministic stopped decision processes and maximum linear equation,” Bulletin of the Kyushu Institute of Technology, Vol.55, pp. 15-24, 2008.
- [14] T. Fujita, “Deterministic decision process under range constraint through all stages,” Proc. of Modeling Decisions for Artificial Intelligence 2008 (CD-ROM), pp. 60-70, 2008.
- [15] T. Fujita, and K. Tsurusaki, “Stochastic optimization of multiplicative functions with negative value,” J. of Operations Research Society of Japan, Vol.41, No.3, pp. 351-373, 1998.
- [16] A. Kira, T. Ueno and T. Fujita “Threshold probability of nonterminal type in finite horizon Markov decision processes,” J. of Mathematical Analysis and Applications, Vol.386, pp. 461-472, 2012.
- [17] T. Fujita, “Re-examination of Markov policies for additive decision process,” Bulletin of Informatics Cybernetics, Vol.29, No.1, pp. 51-65, 1997.
- [18] T. Fujita, “On policy classes in dynamic programming theory,” Proc. of the 9th Bellman Continuum Int. Workshop on Uncertain System and Soft Computing, Series of Information & Management Sciences Vol.2, pp. 39-43, 2002.
- [19] U. Bertelé and F. Brioschi, “Nonserial Dynamic Programming,” Academic Press, NY, 1972.