JACIII Vol.8 No.2 pp. 190-199
doi: 10.20965/jaciii.2004.p0190


Joint Audio-Visual Tracking Based on Dynamically Weighted Linear Combination of Probability State Density

Masaru Tsuchida*,**, Takahito Kawanishi*, Hiroshi Murase*,***, and Shigeru Takagi*

*NTT Communication Science Laboratories, NTT Corporation, 3-1, Morinosato-Wakamiya, Atsugi 243-0198, Japan

**Currently, NTT-DATA Corporation, Kayabacho Tower Bldg., 1-21-2 Shinkawa, Chuo-ku, Tokyo 104-0033, Japan

***Currently, Graduate School of Information Science, Nagoya University, Furo-cho, Chigusa-ku, Nagoya 464-8603, Japan

July 29, 2003
December 1, 2003
March 20, 2004
speaker tracking, integration of visual and audio information, probability density distribution, weighted linear combination, time information

This paper proposes a method that can be applied to speaker tracking under stabilized, continuous conditions using visual and audio information even when input information is interrupted due to disturbance or occlusion caused by the effects of noise or varying illumination. Using this method, the position of a speaker is expressed based on a likelihood distribution that is obtained through integration of visual information and audio information. First, visual and audio information is integrated as as a weighted linear combination of probability density distribution, which is estimated as a result of the observation of the visual and audio information. In this case, the weight is taken as a variable, which varys in proportion to the maximum value of probability density distributions obtained for each type of information. Next, the result obtained as described above and the weighted linear combination of the distribution in the past are obtained, and the result thus obtained is taken as the likelihood distribution related to the position of the speaker. By changing the weight dynamically, it becomes possible to select the type of information freely or to add weight and, accordingly, to conduct stabilized, continuous tracking even when the speaker cannot be detected momentarily due to occlusion, voice interruption, or noise. We conducted a series of experiments on speaker tracking using circular microphone array and an omni-directional camera. In this way, we have succeeded in confirming it possible to perform stabilized tracking on speakers continuously in spite of occlusion or voice interruption.

Cite this article as:
Masaru Tsuchida, Takahito Kawanishi, Hiroshi Murase, and Shigeru Takagi, “Joint Audio-Visual Tracking Based on Dynamically Weighted Linear Combination of Probability State Density,” J. Adv. Comput. Intell. Intell. Inform., Vol.8, No.2, pp. 190-199, 2004.
Data files:

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Mar. 01, 2021