Downsized Evolutionary Video Processing for Lips Tracking and Data Acquisition

Takuya Akashi; Yuji Wakasa; Kanya Tanaka; Stephen Karungaru; Minoru Fukumi

doi:10.20965/jaciii.2007.p1030

single-jc.php

« previous

JACIII Vol.11 No.8 pp. 1030-1042

doi: 10.20965/jaciii.2007.p1030

(2007)

Paper:

Views over last 60 days: 824

Downsized Evolutionary Video Processing for Lips Tracking and Data Acquisition

Takuya Akashi^, Yuji Wakasa^, Kanya Tanaka^*,
Stephen Karungaru^, and Minoru Fukumi^

^*Graduate School of Science and Engineering, Yamaguchi University, 2-16-1 Tokiwa-dai, Ube, Yamaguchi 755-8611, Japan

^**Institute of Technology and Science, The University of Tokushima, 2-1 Minami-josanjima, Tokushima 770-8506, Japan

Received:

March 19, 2007

Accepted:

July 10, 2007

Published:

October 20, 2007

Keywords:

evolutionary video processing, image understanding, genetic algorithm, template matching, lips image

Abstract

In this paper, high-speed lips tracking and data acquisition of a talking person in natural scenes are presented. Our approach is based on the Evolutionary Video Processing. This method has a trade-off between accuracy and a processing time. To solve this problem, in this paper, we proposed Evolutionary Video Processing with automatic SD-Control. In our simulations, the effectiveness of the proposed method is verified by a comparison experiment. The proposed method improves the performance, speed and accuracy, from 68.4% to 86.2%. Furthermore, it is evaluated that our proposed method can continue to chase the lips region even in such a case. It is demonstrated that the lips region detection and tracking at high speed and with high accuracy is possible, with acquisition of its numerical geometric change information.

Cite this article as:

T. Akashi, Y. Wakasa, K. Tanaka, S. Karungaru, and M. Fukumi, “Downsized Evolutionary Video Processing for Lips Tracking and Data Acquisition,” J. Adv. Comput. Intell. Intell. Inform., Vol.11 No.8, pp. 1030-1042, 2007.

Data files:

References

[1] W. E. Green and P. Y. Oh, “Autonomous hovering of a fixed-wing micro air vehicle,” Proc. of the 2006 IEEE Int. Conf. on Robotics and Automation, pp. 2164-2169, USA, May, 2006.
[2] L. L. Otis, D. Piao, C. W. Gibson, and Q. Zhu, “Quantifying labial blood flow using optical doppler tomography,” Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology, Vol.98, No.2, pp. 189-194, August, 2004.
[3] H. M. Gloster Jr., “The use of second-intention healing for partialthickness Mohs defects involving the vermilion and/or mucosal surfaces of the lip,” Journal of the American Academy of Dermatology, Vol.47, No.6, pp. 893-897, December, 2002.
[4] S. Weitzul and R. S. Taylor, “Lip reconstruction,” eMedicine Journal [serial online], 2002.
[5] B. Jahan-Parwar and K. Blackwell, “Lips and perioral region anatomy,” eMedicine Journal [serial online], 2002.
[6] H. McGurk and J. MacDonald, “Hearing lips and seeing voices,” Nature, Vol.264, No.5588, pp. 746-748, December, 1976.
[7] H. G. Okuno, K. Nakadai, and H. Kitano, “Social interaction of humanoid robot through auditory and visual tracking,” Proc. of the Eighth Int. Conf. on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE-2002), Vol.2358 of Lecture Notes in Artificial Intelligence, pp. 725-735, June, 2002.
[8] I. R. Olson, J. C. Gatenby, and J. C. Gore, “A comparison of bound and unbound audio-visual information processing in the human cerebral cortex,” Cognitive Brain Research, Vol.14, No.1, pp. 129-138, June, 2002.
[9] J. Luettin, N. A. Thacker, and S. W. Beet, “Visual speech recognition using active shape models and hidden markov models,” Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ’96), Vol.2, pp. 817-820, Atlanta, USA, May, 1996.
[10] R. Séguier and N. Cladel, “Multiobjectives genetic snakes application on audio-visual speech recognition,” Proc. of the Fourth EURASIP Conf. focused on Video / Image Processing and Multimedia Communications (EC-VIP-MC 2003), pp. 625-630, Zagreb, Croatia, July, 2003.
[11] C. Bregler and Y. Konig, “ ‘eigenlips’ for robust speech recognition,” Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-94), Vol.ii, pp. II/669-II/672, Adelaide, Australia, April, 1994.
[12] T. Akashi, M. Fukumi, and N. Akamatsu, “Real-time genetic lips region detection and tracking in natural video scenes,” Proc. of the 2004 IEEE Conf. on Cybernetics and Intelligent Systems (CIS 2004), pp. 682-687, Singapore, December, 2004.
[13] L. Revéret, “From raw image of the lips to articulatory parameters: A viseme-based prediction,” Proc. of the Fifth biennial European Conf. on Speech Communication and Technology (Eurospeech ’97), pp. 2011-2014, Rhodes, Greece, September, 1997.
[14] C.-C. Chiang, W.-K. Tai, M.-T. Yang, Y.-T. Huang, and C.-J. Huang, “A novel method for detecting lips, eyes and faces in real time,” Real-Time Imaging, Vol.9, No.4, pp. 277-287, August, 2003.
[15] K. Iwano, S. Tamura, and S. Furui, “Bimodal speech recognition using lip movement measured by optical-flow analysis,” Proc. of the First ISCA/IEEE/ASJ/IEICE Int. Workshop on Hands-Free Speech Communication (HSC2001), pp. 187-190, Kyoto, Japan, April, 2001.
[16] L. Ballerini, “Genetic snakes for medical images segmentation,” Proc. of SPIE Mathematical Modeling and Estimation Techniques in Computer Vision, Vol.3457, pp. 284-295, San Diego, USA, September, 1998.
[17] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active shape models,” Int. Journal of Computer Vision, Vol.1, pp. 321-331, 1988.
[18] D. Cristinacce and T. F. Cootes, “Facial feature detection using ADABOOST with shape constraints,” Proc. of the Fourteenth British Machine Vision Conf., pp. 231-240, Norwich, UK, Sep., 2003.
[19] N. Oliver, A. Pentland, and F. Bérard, “Lafter: a real-time face and lips tracker with facial expression recognition,” Pattern Recognition, Vol.33, No.8, pp. 1263-1403, 2000.
[20] A. Doucet, N. de Freitas, and N. Gordon. “Sequential Monte Carlo Methods in Practice,” New York: Springer-Verlag, 2001.
[21] M. Isard and A. Blake, “Condensation – conditional density propagation for visual tracking,” Int. Journal of Computer Vision, Vol.29, No.1, pp. 5-28, 1998.
[22] D. Serby, E. K. Meier, and L. V. Gool, “Probabilistic object tracking using multiple features,” Proc. of the 17th IEEE Int. Conf. on Pattern Recognition (ICPR 2004), Vol.2, pp. 184-187, UK, August, 2004.
[23] M. W. Lee, I. Cohen, and S. K. Jung, “Particle filter with analytical inference for human body tracking,” Proc. of IEEE Workshop on Motion and Video Computing (MOTION2002), pp. 159-165, USA, December, 2002.
[24] K. N. Plataniotis and A. N. Venetsanopoulos, “Color Image Processing and Applications,” Springer-Verlag, Berlin, Germany, 2000.
[25] F. Klein. “Erlangen program,” Inaugural address at the University of Erlangen, 1872.
[26] T. Akashi, Y. Mitsukura, M. Fukumi, and N. Akamatsu, “Genetic lips extraction method for varying shape,” IEEJ Transactions on Electronics, Information and Systems (Japanese Edition), 124-C(1), pp. 128-137, January, 2004 (in Japanese).
[27] J. J. Grefenstette, “GENESIS: a system for using genetic search procedures,” Proc. of Conf. on Intelligent Systems and Machines, pp. 161-165, Rochester, MI, 1984.
[28] D. Whitley, “The GENITOR algorithm and selection pressure: Why rank-based allocation of reproductive trials is best,” Proc. of the Third Int. Conf. on Genetic Algorithms (ICGA’89), pp. 116-121, Fairfax, Virginia, USA, 1989.
[29] J. E. Baker, “Adaptive selection methods for genetic algorithms,” Proc. of the First Int. Conf. on Genetic Algorithms and their applications, pp. 101-111, Hillsdale, New Jersey, USA, 1985.
[30] A. Rosenfeld and G. J. Vanderbrug, “Coarse-fine template matching,” IEEE Transactions on Systems, Man and Cybernetics, Vol.SMC-7, No.2, pp. 104-107, February, 1977.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] W. E. Green and P. Y. Oh, “Autonomous hovering of a fixed-wing micro air vehicle,” Proc. of the 2006 IEEE Int. Conf. on Robotics and Automation, pp. 2164-2169, USA, May, 2006.

[2] [2] L. L. Otis, D. Piao, C. W. Gibson, and Q. Zhu, “Quantifying labial blood flow using optical doppler tomography,” Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology, Vol.98, No.2, pp. 189-194, August, 2004.

[3] [3] H. M. Gloster Jr., “The use of second-intention healing for partialthickness Mohs defects involving the vermilion and/or mucosal surfaces of the lip,” Journal of the American Academy of Dermatology, Vol.47, No.6, pp. 893-897, December, 2002.

[4] [4] S. Weitzul and R. S. Taylor, “Lip reconstruction,” eMedicine Journal [serial online], 2002.

[5] [5] B. Jahan-Parwar and K. Blackwell, “Lips and perioral region anatomy,” eMedicine Journal [serial online], 2002.

[6] [6] H. McGurk and J. MacDonald, “Hearing lips and seeing voices,” Nature, Vol.264, No.5588, pp. 746-748, December, 1976.

[7] [7] H. G. Okuno, K. Nakadai, and H. Kitano, “Social interaction of humanoid robot through auditory and visual tracking,” Proc. of the Eighth Int. Conf. on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE-2002), Vol.2358 of Lecture Notes in Artificial Intelligence, pp. 725-735, June, 2002.

[8] [8] I. R. Olson, J. C. Gatenby, and J. C. Gore, “A comparison of bound and unbound audio-visual information processing in the human cerebral cortex,” Cognitive Brain Research, Vol.14, No.1, pp. 129-138, June, 2002.

[9] [9] J. Luettin, N. A. Thacker, and S. W. Beet, “Visual speech recognition using active shape models and hidden markov models,” Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ’96), Vol.2, pp. 817-820, Atlanta, USA, May, 1996.

[10] [10] R. Séguier and N. Cladel, “Multiobjectives genetic snakes application on audio-visual speech recognition,” Proc. of the Fourth EURASIP Conf. focused on Video / Image Processing and Multimedia Communications (EC-VIP-MC 2003), pp. 625-630, Zagreb, Croatia, July, 2003.

[11] [11] C. Bregler and Y. Konig, “ ‘eigenlips’ for robust speech recognition,” Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-94), Vol.ii, pp. II/669-II/672, Adelaide, Australia, April, 1994.

[12] [12] T. Akashi, M. Fukumi, and N. Akamatsu, “Real-time genetic lips region detection and tracking in natural video scenes,” Proc. of the 2004 IEEE Conf. on Cybernetics and Intelligent Systems (CIS 2004), pp. 682-687, Singapore, December, 2004.

[13] [13] L. Revéret, “From raw image of the lips to articulatory parameters: A viseme-based prediction,” Proc. of the Fifth biennial European Conf. on Speech Communication and Technology (Eurospeech ’97), pp. 2011-2014, Rhodes, Greece, September, 1997.

[14] [14] C.-C. Chiang, W.-K. Tai, M.-T. Yang, Y.-T. Huang, and C.-J. Huang, “A novel method for detecting lips, eyes and faces in real time,” Real-Time Imaging, Vol.9, No.4, pp. 277-287, August, 2003.

[15] [15] K. Iwano, S. Tamura, and S. Furui, “Bimodal speech recognition using lip movement measured by optical-flow analysis,” Proc. of the First ISCA/IEEE/ASJ/IEICE Int. Workshop on Hands-Free Speech Communication (HSC2001), pp. 187-190, Kyoto, Japan, April, 2001.

[16] [16] L. Ballerini, “Genetic snakes for medical images segmentation,” Proc. of SPIE Mathematical Modeling and Estimation Techniques in Computer Vision, Vol.3457, pp. 284-295, San Diego, USA, September, 1998.

[17] [17] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active shape models,” Int. Journal of Computer Vision, Vol.1, pp. 321-331, 1988.

[18] [18] D. Cristinacce and T. F. Cootes, “Facial feature detection using ADABOOST with shape constraints,” Proc. of the Fourteenth British Machine Vision Conf., pp. 231-240, Norwich, UK, Sep., 2003.

[19] [19] N. Oliver, A. Pentland, and F. Bérard, “Lafter: a real-time face and lips tracker with facial expression recognition,” Pattern Recognition, Vol.33, No.8, pp. 1263-1403, 2000.

[20] [20] A. Doucet, N. de Freitas, and N. Gordon. “Sequential Monte Carlo Methods in Practice,” New York: Springer-Verlag, 2001.

[21] [21] M. Isard and A. Blake, “Condensation – conditional density propagation for visual tracking,” Int. Journal of Computer Vision, Vol.29, No.1, pp. 5-28, 1998.

[22] [22] D. Serby, E. K. Meier, and L. V. Gool, “Probabilistic object tracking using multiple features,” Proc. of the 17th IEEE Int. Conf. on Pattern Recognition (ICPR 2004), Vol.2, pp. 184-187, UK, August, 2004.

[23] [23] M. W. Lee, I. Cohen, and S. K. Jung, “Particle filter with analytical inference for human body tracking,” Proc. of IEEE Workshop on Motion and Video Computing (MOTION2002), pp. 159-165, USA, December, 2002.

[24] [24] K. N. Plataniotis and A. N. Venetsanopoulos, “Color Image Processing and Applications,” Springer-Verlag, Berlin, Germany, 2000.

[25] [25] F. Klein. “Erlangen program,” Inaugural address at the University of Erlangen, 1872.

[26] [26] T. Akashi, Y. Mitsukura, M. Fukumi, and N. Akamatsu, “Genetic lips extraction method for varying shape,” IEEJ Transactions on Electronics, Information and Systems (Japanese Edition), 124-C(1), pp. 128-137, January, 2004 (in Japanese).

[27] [27] J. J. Grefenstette, “GENESIS: a system for using genetic search procedures,” Proc. of Conf. on Intelligent Systems and Machines, pp. 161-165, Rochester, MI, 1984.

[28] [28] D. Whitley, “The GENITOR algorithm and selection pressure: Why rank-based allocation of reproductive trials is best,” Proc. of the Third Int. Conf. on Genetic Algorithms (ICGA’89), pp. 116-121, Fairfax, Virginia, USA, 1989.

[29] [29] J. E. Baker, “Adaptive selection methods for genetic algorithms,” Proc. of the First Int. Conf. on Genetic Algorithms and their applications, pp. 101-111, Hillsdale, New Jersey, USA, 1985.

[30] [30] A. Rosenfeld and G. J. Vanderbrug, “Coarse-fine template matching,” IEEE Transactions on Systems, Man and Cybernetics, Vol.SMC-7, No.2, pp. 104-107, February, 1977.

Downsized Evolutionary Video Processing for Lips Tracking and Data Acquisition

Takuya Akashi*, Yuji Wakasa*, Kanya Tanaka*, Stephen Karungaru**, and Minoru Fukumi**

Takuya Akashi^, Yuji Wakasa^, Kanya Tanaka^*,
Stephen Karungaru^, and Minoru Fukumi^