JRM Vol.24 No.2 pp. 311-319
doi: 10.20965/jrm.2012.p0311


Mouth Movement Recognition Using Template Matching and its Implementation in an Intelligent Room

Kiyoshi Takita*, Takeshi Nagayasu*, Hidetsugu Asano**,
Kenji Terabayashi*, and Kazunori Umeda*

*Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan

**Pioneer Corporation, 1-1 Shin-ogura, Saiwai-ku, Kawasaki-shi, Kanagawa 212-0031, Japan

September 30, 2011
January 6, 2012
April 20, 2012
intelligent room, mouth movement recognition, template matching, image processing

This paper proposes a method of recognizing movements of the mouth from images and implements the method in an intelligent room. The proposed method uses template matching and recognizes mouth movements for the purpose of indicating a target object in an intelligent room. First, the operator’s face is detected. Then, the mouth region is extracted from the facial region using the result of template matching with a template image of the lips. Dynamic Programming (DP) matching is applied to a similarity measure that is obtained by template matching. The effectiveness of the proposed method is evaluated through experiments to recognize several names of common home appliances and operations.

Cite this article as:
Kiyoshi Takita, Takeshi Nagayasu, Hidetsugu Asano,
Kenji Terabayashi, and Kazunori Umeda, “Mouth Movement Recognition Using Template Matching and its Implementation in an Intelligent Room,” J. Robot. Mechatron., Vol.24, No.2, pp. 311-319, 2012.
Data files:
  1. [1] T. Mori and T. Sato, “Robotic Room: Its Concept and Realization, Robotics and Autonomous Systems,” Robotics and Autonomous Systems, Vol.28, No.2, pp. 141-144, 1999.
  2. [2] J. H. Lee and H. Hashimoto, “Intelligent Space – Concept and Contents –,” Advanced Robotics, Vol.16, No.4, pp. 265-280, 2002.
  3. [3] T. Mori, H. Noguchi, and T. Sato, “Sensing room – Room-type behavior measurement and accumulation environment –,” J. of the Robotics Society of Japan, Vol.23, No.6, pp. 665-669, 2005.
  4. [4] B. Brumitt, B. Meyers, J. Krumm, A. Kern, and S. Shafer, “EasyLiving: Technologies for Intelligent Environments,” Proc. Int. Symp. on Handheld and Ubiquitous Computing, pp. 12-27, 2000.
  5. [5] K. Irie, N. Wakamura, and K. Umeda, “Construction of an intelligent room based on gesture recognition – operation of electric appliances with hand gestures,” Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 193-198, 2004.
  6. [6] K. Irie, N. Wakamura, and K. Umeda, “Construction of an Intelligent Room Based on Gesture Recognition,” Trans. of the Japan Society of Mechanical Engineers, C, Vol.73, No.725, pp. 258-265, 2007 (in Japanese).
  7. [7] T. Nakanishi, K. Terabayashi, and K. Umeda, “Mouth Motion Recognition for Intelligent Room Using DPMatching,” IEEJ Trans. EIS Vol.129, No.5, pp. 940-946, 2009 (in Japanese).
  8. [8] T. Saitoh and R. Konishi, “Lip Reading Based on Trajectory Feature,” Trans. IEICE Vol.J90-D, No.4, pp. 1105-1114, 2007 (in Japanese).
  9. [9] T. Wark and S. Sridharan, “A syntactic approach to automatic lip feature extraction for speaker identification,” Proc. IEEE ICASSP, Vol.6, pp. 3693-3696, 1998.
  10. [10] R. W. Frischholz and U. Dieckmann, “Bioid: A Multimodal Biometric Identification System,” IEEE Computer, Vol.33, No.2, pp. 64-68, 2000.
  11. [11] L. G. ves da Silveira, J. Facon, and D. L. Borges, “Visual speech recognition: A solution from feature extraction to words classification,” Proc. XVI Brazilian Symposium on Computer Graphics and Image Processing, pp. 399-405, 2003.
  12. [12] M. J. Lyons, C.-H. Chan, and N. Tetsutani, “Mouthtype: text entry by hand and mouth,” Proc. Conf. on Human Factors in Computing Systems, pp. 1383-1386, 2004.
  13. [13] Y. Ogoshi, H. Ide, C. Araki, and H. Kimura, “Active Lip Contour Using Hue Characteristics Energy Model for A Lip Reading System,” Trans. IEEJ, Vol.128, No.5, pp. 811-812, 2008.
  14. [14] K. Takita, T. Nagayasu, H. Asano, K. Terabayashi, and K. Umeda, “An Investigation into Feature for Mouth Motion Recognition Using DP matching,” Dynamic Image processing for real Application 2011, pp. 302-307, 2011 (in Japanese).
  15. [15] C. Bregler and Y. Konig “‘Eigenlips’ for robust speech recognition,” Proc. Int. Conf. Acoust. Speech Signal Process (ICASSP), pp. 669-672, 1994.
  16. [16] O. Vanegas, K. Tokuda, and T. Kitamura, “Lip location normalized training for visual speech recognition,” IEICE Trans. Inf. & Syst., Vol.E83-D, No.11, pp. 1969-1977, Nov. 2000.
  17. [17] J. Kim, J. Lee, and K. Shirai, “An efficient lip-reading method robust to illumination variation,” IEICE Trans. Fundamentals, Vol.E85-A, No.9, pp. 2164-2168, Sept. 2002.
  18. [18] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition, Vol.1, pp. 511-518, 2001.
  19. [19] T. Nishimura, T.Mukai, S. Nozaki, and R. Oka, “Spotting Recognition of Gestures Performed by People form a Single Time-Varying Image Using Low-Resolution Features,” Trans. IEICE, Vol.J80-DII, No.6, pp. 1563-1570, 1997.
  20. [20] S. Uchida and H. Sakoe, “Analytical DP Matching,” Trans. IEICE, Vol.J90-D, No.8, pp. 2137-2146, 2007.

  21. Supporting Online Materials:
  22. [a]

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Mar. 05, 2021