Behavior Acquisition in Partially Observable Environments by Autonomous Segmentation of the Observation Space

Kousuke Inoue; Tamio Arai; Jun Ota

doi:10.20965/jrm.2015.p0293

single-rb.php

« previous

JRM Vol.27 No.3 pp. 293-304

(2015)

doi: 10.20965/jrm.2015.p0293

Paper:

Views over last 60 days: 1,578

Behavior Acquisition in Partially Observable Environments by Autonomous Segmentation of the Observation Space

Kousuke Inoue^, Tamio Arai^, and Jun Ota^

^*Department of Intelligent Systems Engineering, Faculty of Engineering, Ibaraki University
4-12-1 Nakanarusawa-cho, Hitachi, Ibaraki 316-8511, Japan

^**Shibaura Institute of Technology
3-7-5 Toyosu, Koto-ku, Tokyo 135-8548, Japan

^***Research into Artifacts, Center for Engineering (RACE), The University of Tokyo
5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8568, Japan

Received:

December 26, 2014

Accepted:

May 11, 2015

Published:

June 20, 2015

Keywords:

learning, partially observable Markov decision process, autonomous state-space construction

Abstract

State representation

In this paper, we propose a method by which an agent can autonomously construct a state-representation to achieve state-identification with a sufficient Markovian property. Furthermore, the agent does this using continuous and multi-dimensional observationspace in partially observable environments. In order to deal with the non-Markovian property of the environment, a state-representation of a decision tree structure based on past observations and actions is used. This representation is gradually segmented to achieve appropriate state-distinction. Because the observation-space of the agent is not segmented in advance, the agent has to determine the cause of its state-representation insufficiency: (1) insufficient observation-space segmentation, or (2) perceptual aliasing. In the proposed method, the cause is determined using a statistical analysis of past experiences, and the method of state-segmentation is decided based on this cause. Results of simulations in two-dimensional grid-environments and experiments with real mobile robot navigating in two-dimensional continuous workspace show that an agent can successfully acquire navigation behaviors with many hidden states.

Cite this article as:

K. Inoue, T. Arai, and J. Ota, “Behavior Acquisition in Partially Observable Environments by Autonomous Segmentation of the Observation Space,” J. Robot. Mechatron., Vol.27 No.3, pp. 293-304, 2015.

Data files:

References

[1] R. Pfeifer and C. Scheier, “Understanding Intelligence,” MIT Press, 1999.
[2] H. Kawano, “Three-Dimensional Obstacle Avoidance of Blimp-Type Unmanned Aerial Vehicle Flying in Unknown and Non-Uniform Wind Disturbance,” J. of Robotics and Mechatronics, Vol.19, No.2, pp. 166-173, 2007.
[3] S. Thrun, W. Burgard, and D. Fox, “Probabilistic Robotics,” MIT Press, 2005.
[4] L. Chrisman, “Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach,” Proc. 10th Int. Conf. on Artificial Intelligence, pp. 183-188, 1992.
[5] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and Acting in Partially Observable Stochastic Domains,” Artificial Intelligence, Vol.101, pp. 99-134, 1998.
[6] S. Singh, T. Jaakkola, and M. Jordan, “Learning Without State-Estimation in Partially Observable Markovian Decision Processes,” Proc. 11^th Int. Conf. on Machine Learning, pp. 284-292, 1994.
[7] M. Littman, “Memoryless policies: Theoretical limitations and practical results,” Proc. Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats 3, MIT Press, pp. 297-305, 1994.
[8] L.-J. Lin and T. M. Mitchell, “Reinforcement learning with hidden states,” Proc. 2^nd Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats, pp. 271-280, 1993.
[9] S. D. Whitehead and L. J. Lin, “Reinforcement learning of non-Markov decision processes,” Artificial Intelligence, Vol.73, No.1-2, pp. 271-306, 1995.
[10] M. Wiering and J. Schmidhuber, “HQ-Learning,” Adaptive Behavior, Vol.6, No.2, pp. 219-246, 1998.
[11] R. Sun and C. Sessions, “Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors,” IEEE Trans. Systems, Man, and Cybernetics, Vol.B-30-3, pp. 403-418, 2000.
[12] H. Lee, H. Kayama, and K. Abe, “Labeling Q-Learning in POMDP Environments,” IEICE Trans. Information and Systems, Vol.E85-D. No.9, pp. 1425-1432, 2002.
[13] L. Lin and T. M. Mitchell, “Memory approaches to reinforcement learning in non- Markovian domains,” Technical Report CMU-CS-92-138, Carnegie Mellon University, 1992.
[14] S. Thrun, “Monte Carlo POMDPs,” Neural Information Processing Systems, Vol.12, MIT Press, pp. 1064-1070, 2000.
[15] R. A. McCallum, “Instance-Based Utile Distinction for Reinforcement Learning with Hidden State,” Proc. 12^th Int. Conf. on Machine Learning, pp. 387-395, 1995.
[16] N. Suematsu and A. Hayashi, “A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory,” Neural Information Processing Systems, Vol.11, MIT Press, pp. 1059-1065, 1999.
[17] A. K. McCallum, “Learning to use selective attention and short-term memory in sequential tasks,” From Animals to Animats 4: Proc. of 4^th Int. Conf. on Simulation of Adaptive Behavior, The MIT Press, pp. 315-324, 1996.
[18] H. Murao and S. Kitamura, “QLASS: an enhancement of Q-learning to generate state space adaptively,” Proc. European Conf. on Artificial Life, 1997.
[19] K. Yamada, K. Ohkura, M. M. Svinin, and K. Ueda, “Adaptive Segmentation of the State Space based on Bayesian Discrimination in Reinforcement Learning,” Proc. 6th Int. Symp. on Artificial Life and Robotics, pp. 168-171, 2001.
[20] T. Nakamura and T. Ogasawara, “Self-Partitioning State Space for Behavior Acquisition of Vision-Based Mobile Robots,” J. of Robotics and Mechatronics, Vol.13, No.6, pp. 625-636, 2001.
[21] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
[22] C. J. C. H. Watkins and P. Dayan, “Technical Note: Q-Learning,” Machine Learning, Vol.8, pp. 279-292, 1992.
[23] P. Maes, “Behavior-Based Artificial Intelligence,” Proc. of the 2nd Conf. on Simulated and Adaptive Behavior, MIT Press, pp. 2-10, 1993.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] R. Pfeifer and C. Scheier, “Understanding Intelligence,” MIT Press, 1999.

[B2] [2] H. Kawano, “Three-Dimensional Obstacle Avoidance of Blimp-Type Unmanned Aerial Vehicle Flying in Unknown and Non-Uniform Wind Disturbance,” J. of Robotics and Mechatronics, Vol.19, No.2, pp. 166-173, 2007.

[B3] [3] S. Thrun, W. Burgard, and D. Fox, “Probabilistic Robotics,” MIT Press, 2005.

[B4] [4] L. Chrisman, “Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach,” Proc. 10th Int. Conf. on Artificial Intelligence, pp. 183-188, 1992.

[B5] [5] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and Acting in Partially Observable Stochastic Domains,” Artificial Intelligence, Vol.101, pp. 99-134, 1998.

[B6] [6] S. Singh, T. Jaakkola, and M. Jordan, “Learning Without State-Estimation in Partially Observable Markovian Decision Processes,” Proc. 11^th Int. Conf. on Machine Learning, pp. 284-292, 1994.

[B7] [7] M. Littman, “Memoryless policies: Theoretical limitations and practical results,” Proc. Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats 3, MIT Press, pp. 297-305, 1994.

[B8] [8] L.-J. Lin and T. M. Mitchell, “Reinforcement learning with hidden states,” Proc. 2^nd Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats, pp. 271-280, 1993.

[B9] [9] S. D. Whitehead and L. J. Lin, “Reinforcement learning of non-Markov decision processes,” Artificial Intelligence, Vol.73, No.1-2, pp. 271-306, 1995.

[B10] [10] M. Wiering and J. Schmidhuber, “HQ-Learning,” Adaptive Behavior, Vol.6, No.2, pp. 219-246, 1998.

[B11] [11] R. Sun and C. Sessions, “Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors,” IEEE Trans. Systems, Man, and Cybernetics, Vol.B-30-3, pp. 403-418, 2000.

[B12] [12] H. Lee, H. Kayama, and K. Abe, “Labeling Q-Learning in POMDP Environments,” IEICE Trans. Information and Systems, Vol.E85-D. No.9, pp. 1425-1432, 2002.

[B13] [13] L. Lin and T. M. Mitchell, “Memory approaches to reinforcement learning in non- Markovian domains,” Technical Report CMU-CS-92-138, Carnegie Mellon University, 1992.

[B14] [14] S. Thrun, “Monte Carlo POMDPs,” Neural Information Processing Systems, Vol.12, MIT Press, pp. 1064-1070, 2000.

[B15] [15] R. A. McCallum, “Instance-Based Utile Distinction for Reinforcement Learning with Hidden State,” Proc. 12^th Int. Conf. on Machine Learning, pp. 387-395, 1995.

[B16] [16] N. Suematsu and A. Hayashi, “A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory,” Neural Information Processing Systems, Vol.11, MIT Press, pp. 1059-1065, 1999.

[B17] [17] A. K. McCallum, “Learning to use selective attention and short-term memory in sequential tasks,” From Animals to Animats 4: Proc. of 4^th Int. Conf. on Simulation of Adaptive Behavior, The MIT Press, pp. 315-324, 1996.

[B18] [18] H. Murao and S. Kitamura, “QLASS: an enhancement of Q-learning to generate state space adaptively,” Proc. European Conf. on Artificial Life, 1997.

[B19] [19] K. Yamada, K. Ohkura, M. M. Svinin, and K. Ueda, “Adaptive Segmentation of the State Space based on Bayesian Discrimination in Reinforcement Learning,” Proc. 6th Int. Symp. on Artificial Life and Robotics, pp. 168-171, 2001.

[B20] [20] T. Nakamura and T. Ogasawara, “Self-Partitioning State Space for Behavior Acquisition of Vision-Based Mobile Robots,” J. of Robotics and Mechatronics, Vol.13, No.6, pp. 625-636, 2001.

[B21] [21] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.

[B22] [22] C. J. C. H. Watkins and P. Dayan, “Technical Note: Q-Learning,” Machine Learning, Vol.8, pp. 279-292, 1992.

[B23] [23] P. Maes, “Behavior-Based Artificial Intelligence,” Proc. of the 2nd Conf. on Simulated and Adaptive Behavior, MIT Press, pp. 2-10, 1993.

Behavior Acquisition in Partially Observable Environments by Autonomous Segmentation of the Observation Space

Kousuke Inoue*, Tamio Arai**, and Jun Ota***

Kousuke Inoue^, Tamio Arai^, and Jun Ota^