Observed Body Clustering for Imitation Based on Value System

Yoshihiro Tamura; Yasutake Takahashi; Minoru Asada

doi:10.20965/jaciii.2010.p0802

single-jc.php

« previous

JACIII Vol.14 No.7 pp. 802-812

doi: 10.20965/jaciii.2010.p0802

(2010)

Paper:

Views over last 60 days: 628

Observed Body Clustering for Imitation Based on Value System

Yoshihiro Tamura^*, Yasutake Takahashi^**,
and Minoru Asada^*,***

^*Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka 565-0871, Japan

^**Graduate School of Engineering, University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan

^***JST ERATO Asada Synergistic Intelligence Project, 2-1 Yamadaoka, Suita, Osaka 565-0871, Japan

Received:

April 15, 2010

Accepted:

July 4, 2010

Published:

November 20, 2010

Keywords:

reinforcement learning, imitation, state value, clustering

Abstract

In order to develop skills, actions, and behavior in a human symbiotic environment, a robot must learn something from behavior observation of predecessors or humans. Recently, robotic imitation methods based on many approaches have been proposed. We have proposed reinforcement learning based approaches for the imitation and investigated them under an assumption that an observer recognizes the body parts of the performer and maps them to the ones of its own. However, the assumption is not always applicable because of physical differences between the performer and the observer. In order to learn various behaviors from the observation, the robot has to cluster the observed body area of the performer on the camera image and maps the clustered parts to its own body parts based on reasonable criterion for itself and feedback the data for the imitation. This paper shows that the clustering the body area on the camera image into the body parts of its own based on the estimation of the state value in a framework of reinforcement learning as well as it imitates the observed behavior based on the state value estimation. Clustering parameters are updated based on the temporal difference error analogously so the parameters of the state value function of the behavior are updated based on the temporal difference error. The validity of the proposed method is investigated by applying it to an imitation of a dynamic throwing motion of an inverted pendulum robot and human.

Cite this article as:

Y. Tamura, Y. Takahashi, and M. Asada, “Observed Body Clustering for Imitation Based on Value System,” J. Adv. Comput. Intell. Intell. Inform., Vol.14 No.7, pp. 802-812, 2010.

Data files:

References

[1] D. C. Bentivegna, C. G. Atkeson, and G. Chenga, “Learning tasks from observation and practice,” Robotics and Autonomous Systems, Vol.47, pp. 163-169, 2004.
[2] B. Price and C. Boutilier, “Accelerating Reinforcement Learning through Implicit Imitatione,” J. of Artificial Intelligence Research, Vol.19, pp. 569-629, Dec. 2003.
[3] S. D. Whitehead, “Complexity and Cooperation in Q-Learning,” In Proc. Eighth Int.Workshop on Machine Learning (ML91), pp. 363-367, 1991.
[4] T. Inamura, Y. Nakamura, and I. Toshima, “Embodied Symbol Emergence based on Mimesis Theory,” Int. J. of Robotics Research, Vol.23, No.4, pp. 363-377, 2004.
[5] S. Schaal, A. Ijspeert, and A. Billard, “Computational approaches to motor learning by imitation,” 2004.
[6] J. H. Connell and S. Mahadevan, “ROBOT LEARNING,” Kluwer Academic Publishers, 1993.
[7] A. N. Meltzoff, “‘Like me’: a foundation for social cognition,” Developmental Science, Vol.10, No.1, pp. 126-134, 2007.
[8] Y. Takahashi, T. Kawamata, M. Asada, and M. Negrello, “Emulation and Behavior Understanding through Shared Values,” In Proc. of the 2007 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 3950-3955, Oct. 2007.
[9] Y. Takahashi, Y. Tamura, and M. Asada, “Behavior Development through Interaction between Acquisition and Recognition of Observed Behaviors,” In Proc. of 2008 IEEEWorld Congress on Computational Intelligence (WCCI2008), pp. 1518-1528, June 2008.
[10] Y. Takahashi, Y. Tamura, and M. Asada, “Human Instruction Recognition and Self Behavior Acquisition Based on State Value,” In Proc. of the 18th IEEE Int. Conf. on Fuzzy Systems, pp. 969-974, 2009.
[11] Y. Nagai, C. Muhl, and K. J. Rohlfing, “Toward Designing a Robot that Learns Actions from Parental Demonstrations,” In Proc. of the 2008 IEEE Int. Conf. on Robotics and Automation (ICRA2008), pp. 3545-3550, 2008.
[12] Y. Nagai and K. J. Rohlfing, “Computational Analysis ofMotionese Toward Scaffolding Robot Action Learning,” IEEE Trans. on Autonomous Mental Development, Vol.1, No.1, pp. 44-54, 2009.
[13] Y. Takahashi, Y. Tamura, and M. Asada, “Mutual Development of Behavior Acquisition and Recognition Based on Value System,” In From Animals to Animats, Vol.10 (Proc. of 10th Int. Conf. on Simulation of Adaptive Behavior, SAB 2008), pp. 291-300, July 2008.
[14] R. Sutton and A. Barto, “Reinforcement Learning: An Introduction,” MIT Press, Cambridge, MA, 1998.
[15] A. Bonarini, A. Lazaric, F. Montrone, and M. Restelli, “Reinforcement Distribution in Fuzzy Q-Learning,” Fuzzy Sets and Systems, Vol.160, pp. 1420-1443, 2009.
[16] K. Doya, “Temporal Difference Learning in Continuous Time and Space,” In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing System, Vol.8, pp. 1073-1079, MIT Press, Cambridge, MA, 1996.
[17] T. Horiuchi, A. Fujino, O. Katai, and T. Sawaragi, “Fuzzy Interpolation-Based Q-Learning with Continuous Inputs and Outputs,” Trans. of the Society of Instrument and Control Engineers, Vol.35, No.2, pp. 271-279, 1999.
[18] Y. Takahashi, M. Takeda, and M. Asada, “Improvement Continuous Valued Q-learning and its Application to Vision Guided Behavior Acquisition,” In The Fourth Int. Workshop on RoboCup, pp. 255-260, 2000.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] D. C. Bentivegna, C. G. Atkeson, and G. Chenga, “Learning tasks from observation and practice,” Robotics and Autonomous Systems, Vol.47, pp. 163-169, 2004.

[2] [2] B. Price and C. Boutilier, “Accelerating Reinforcement Learning through Implicit Imitatione,” J. of Artificial Intelligence Research, Vol.19, pp. 569-629, Dec. 2003.

[3] [3] S. D. Whitehead, “Complexity and Cooperation in Q-Learning,” In Proc. Eighth Int.Workshop on Machine Learning (ML91), pp. 363-367, 1991.

[4] [4] T. Inamura, Y. Nakamura, and I. Toshima, “Embodied Symbol Emergence based on Mimesis Theory,” Int. J. of Robotics Research, Vol.23, No.4, pp. 363-377, 2004.

[5] [5] S. Schaal, A. Ijspeert, and A. Billard, “Computational approaches to motor learning by imitation,” 2004.

[6] [6] J. H. Connell and S. Mahadevan, “ROBOT LEARNING,” Kluwer Academic Publishers, 1993.

[7] [7] A. N. Meltzoff, “‘Like me’: a foundation for social cognition,” Developmental Science, Vol.10, No.1, pp. 126-134, 2007.

[8] [8] Y. Takahashi, T. Kawamata, M. Asada, and M. Negrello, “Emulation and Behavior Understanding through Shared Values,” In Proc. of the 2007 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 3950-3955, Oct. 2007.

[9] [9] Y. Takahashi, Y. Tamura, and M. Asada, “Behavior Development through Interaction between Acquisition and Recognition of Observed Behaviors,” In Proc. of 2008 IEEEWorld Congress on Computational Intelligence (WCCI2008), pp. 1518-1528, June 2008.

[10] [10] Y. Takahashi, Y. Tamura, and M. Asada, “Human Instruction Recognition and Self Behavior Acquisition Based on State Value,” In Proc. of the 18th IEEE Int. Conf. on Fuzzy Systems, pp. 969-974, 2009.

[11] [11] Y. Nagai, C. Muhl, and K. J. Rohlfing, “Toward Designing a Robot that Learns Actions from Parental Demonstrations,” In Proc. of the 2008 IEEE Int. Conf. on Robotics and Automation (ICRA2008), pp. 3545-3550, 2008.

[12] [12] Y. Nagai and K. J. Rohlfing, “Computational Analysis ofMotionese Toward Scaffolding Robot Action Learning,” IEEE Trans. on Autonomous Mental Development, Vol.1, No.1, pp. 44-54, 2009.

[13] [13] Y. Takahashi, Y. Tamura, and M. Asada, “Mutual Development of Behavior Acquisition and Recognition Based on Value System,” In From Animals to Animats, Vol.10 (Proc. of 10th Int. Conf. on Simulation of Adaptive Behavior, SAB 2008), pp. 291-300, July 2008.

[14] [14] R. Sutton and A. Barto, “Reinforcement Learning: An Introduction,” MIT Press, Cambridge, MA, 1998.

[15] [15] A. Bonarini, A. Lazaric, F. Montrone, and M. Restelli, “Reinforcement Distribution in Fuzzy Q-Learning,” Fuzzy Sets and Systems, Vol.160, pp. 1420-1443, 2009.

[16] [16] K. Doya, “Temporal Difference Learning in Continuous Time and Space,” In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing System, Vol.8, pp. 1073-1079, MIT Press, Cambridge, MA, 1996.

[17] [17] T. Horiuchi, A. Fujino, O. Katai, and T. Sawaragi, “Fuzzy Interpolation-Based Q-Learning with Continuous Inputs and Outputs,” Trans. of the Society of Instrument and Control Engineers, Vol.35, No.2, pp. 271-279, 1999.

[18] [18] Y. Takahashi, M. Takeda, and M. Asada, “Improvement Continuous Valued Q-learning and its Application to Vision Guided Behavior Acquisition,” In The Fourth Int. Workshop on RoboCup, pp. 255-260, 2000.

Observed Body Clustering for Imitation Based on Value System

Yoshihiro Tamura*, Yasutake Takahashi**, and Minoru Asada*,***

Yoshihiro Tamura^*, Yasutake Takahashi^**,
and Minoru Asada^*,***