JRM Vol.27 No.3 pp. 293-304
doi: 10.20965/jrm.2015.p0293


Behavior Acquisition in Partially Observable Environments by Autonomous Segmentation of the Observation Space

Kousuke Inoue*, Tamio Arai**, and Jun Ota***

*Department of Intelligent Systems Engineering, Faculty of Engineering, Ibaraki University
4-12-1 Nakanarusawa-cho, Hitachi, Ibaraki 316-8511, Japan

**Shibaura Institute of Technology
3-7-5 Toyosu, Koto-ku, Tokyo 135-8548, Japan

***Research into Artifacts, Center for Engineering (RACE), The University of Tokyo
5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8568, Japan

December 26, 2014
May 11, 2015
June 20, 2015
learning, partially observable Markov decision process, autonomous state-space construction

State representation
In this paper, we propose a method by which an agent can autonomously construct a state-representation to achieve state-identification with a sufficient Markovian property. Furthermore, the agent does this using continuous and multi-dimensional observationspace in partially observable environments. In order to deal with the non-Markovian property of the environment, a state-representation of a decision tree structure based on past observations and actions is used. This representation is gradually segmented to achieve appropriate state-distinction. Because the observation-space of the agent is not segmented in advance, the agent has to determine the cause of its state-representation insufficiency: (1) insufficient observation-space segmentation, or (2) perceptual aliasing. In the proposed method, the cause is determined using a statistical analysis of past experiences, and the method of state-segmentation is decided based on this cause. Results of simulations in two-dimensional grid-environments and experiments with real mobile robot navigating in two-dimensional continuous workspace show that an agent can successfully acquire navigation behaviors with many hidden states.
Cite this article as:
K. Inoue, T. Arai, and J. Ota, “Behavior Acquisition in Partially Observable Environments by Autonomous Segmentation of the Observation Space,” J. Robot. Mechatron., Vol.27 No.3, pp. 293-304, 2015.
Data files:
  1. [1] R. Pfeifer and C. Scheier, “Understanding Intelligence,” MIT Press, 1999.
  2. [2] H. Kawano, “Three-Dimensional Obstacle Avoidance of Blimp-Type Unmanned Aerial Vehicle Flying in Unknown and Non-Uniform Wind Disturbance,” J. of Robotics and Mechatronics, Vol.19, No.2, pp. 166-173, 2007.
  3. [3] S. Thrun, W. Burgard, and D. Fox, “Probabilistic Robotics,” MIT Press, 2005.
  4. [4] L. Chrisman, “Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach,” Proc. 10th Int. Conf. on Artificial Intelligence, pp. 183-188, 1992.
  5. [5] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and Acting in Partially Observable Stochastic Domains,” Artificial Intelligence, Vol.101, pp. 99-134, 1998.
  6. [6] S. Singh, T. Jaakkola, and M. Jordan, “Learning Without State-Estimation in Partially Observable Markovian Decision Processes,” Proc. 11th Int. Conf. on Machine Learning, pp. 284-292, 1994.
  7. [7] M. Littman, “Memoryless policies: Theoretical limitations and practical results,” Proc. Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats 3, MIT Press, pp. 297-305, 1994.
  8. [8] L.-J. Lin and T. M. Mitchell, “Reinforcement learning with hidden states,” Proc. 2nd Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats, pp. 271-280, 1993.
  9. [9] S. D. Whitehead and L. J. Lin, “Reinforcement learning of non-Markov decision processes,” Artificial Intelligence, Vol.73, No.1-2, pp. 271-306, 1995.
  10. [10] M. Wiering and J. Schmidhuber, “HQ-Learning,” Adaptive Behavior, Vol.6, No.2, pp. 219-246, 1998.
  11. [11] R. Sun and C. Sessions, “Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors,” IEEE Trans. Systems, Man, and Cybernetics, Vol.B-30-3, pp. 403-418, 2000.
  12. [12] H. Lee, H. Kayama, and K. Abe, “Labeling Q-Learning in POMDP Environments,” IEICE Trans. Information and Systems, Vol.E85-D. No.9, pp. 1425-1432, 2002.
  13. [13] L. Lin and T. M. Mitchell, “Memory approaches to reinforcement learning in non- Markovian domains,” Technical Report CMU-CS-92-138, Carnegie Mellon University, 1992.
  14. [14] S. Thrun, “Monte Carlo POMDPs,” Neural Information Processing Systems, Vol.12, MIT Press, pp. 1064-1070, 2000.
  15. [15] R. A. McCallum, “Instance-Based Utile Distinction for Reinforcement Learning with Hidden State,” Proc. 12th Int. Conf. on Machine Learning, pp. 387-395, 1995.
  16. [16] N. Suematsu and A. Hayashi, “A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory,” Neural Information Processing Systems, Vol.11, MIT Press, pp. 1059-1065, 1999.
  17. [17] A. K. McCallum, “Learning to use selective attention and short-term memory in sequential tasks,” From Animals to Animats 4: Proc. of 4th Int. Conf. on Simulation of Adaptive Behavior, The MIT Press, pp. 315-324, 1996.
  18. [18] H. Murao and S. Kitamura, “QLASS: an enhancement of Q-learning to generate state space adaptively,” Proc. European Conf. on Artificial Life, 1997.
  19. [19] K. Yamada, K. Ohkura, M. M. Svinin, and K. Ueda, “Adaptive Segmentation of the State Space based on Bayesian Discrimination in Reinforcement Learning,” Proc. 6th Int. Symp. on Artificial Life and Robotics, pp. 168-171, 2001.
  20. [20] T. Nakamura and T. Ogasawara, “Self-Partitioning State Space for Behavior Acquisition of Vision-Based Mobile Robots,” J. of Robotics and Mechatronics, Vol.13, No.6, pp. 625-636, 2001.
  21. [21] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
  22. [22] C. J. C. H. Watkins and P. Dayan, “Technical Note: Q-Learning,” Machine Learning, Vol.8, pp. 279-292, 1992.
  23. [23] P. Maes, “Behavior-Based Artificial Intelligence,” Proc. of the 2nd Conf. on Simulated and Adaptive Behavior, MIT Press, pp. 2-10, 1993.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on May. 19, 2024