Joint Audio-Visual Tracking Based on Dynamically Weighted Linear Combination of Probability State Density

Masaru Tsuchida; Takahito Kawanishi; Hiroshi Murase; Shigeru Takagi

doi:10.20965/jaciii.2004.p0190

single-jc.php

« previous

JACIII Vol.8 No.2 pp. 190-199

doi: 10.20965/jaciii.2004.p0190

(2004)

Paper:

Views over last 60 days: 679

Joint Audio-Visual Tracking Based on Dynamically Weighted Linear Combination of Probability State Density

Masaru Tsuchida^*,**, Takahito Kawanishi^, Hiroshi Murase^,***, and Shigeru Takagi^*

^*NTT Communication Science Laboratories, NTT Corporation, 3-1, Morinosato-Wakamiya, Atsugi 243-0198, Japan

^**Currently, NTT-DATA Corporation, Kayabacho Tower Bldg., 1-21-2 Shinkawa, Chuo-ku, Tokyo 104-0033, Japan

^***Currently, Graduate School of Information Science, Nagoya University, Furo-cho, Chigusa-ku, Nagoya 464-8603, Japan

Received:

July 29, 2003

Accepted:

December 1, 2003

Published:

March 20, 2004

Keywords:

speaker tracking, integration of visual and audio information, probability density distribution, weighted linear combination, time information

Abstract

This paper proposes a method that can be applied to speaker tracking under stabilized, continuous conditions using visual and audio information even when input information is interrupted due to disturbance or occlusion caused by the effects of noise or varying illumination. Using this method, the position of a speaker is expressed based on a likelihood distribution that is obtained through integration of visual information and audio information. First, visual and audio information is integrated as as a weighted linear combination of probability density distribution, which is estimated as a result of the observation of the visual and audio information. In this case, the weight is taken as a variable, which varys in proportion to the maximum value of probability density distributions obtained for each type of information. Next, the result obtained as described above and the weighted linear combination of the distribution in the past are obtained, and the result thus obtained is taken as the likelihood distribution related to the position of the speaker. By changing the weight dynamically, it becomes possible to select the type of information freely or to add weight and, accordingly, to conduct stabilized, continuous tracking even when the speaker cannot be detected momentarily due to occlusion, voice interruption, or noise. We conducted a series of experiments on speaker tracking using circular microphone array and an omni-directional camera. In this way, we have succeeded in confirming it possible to perform stabilized tracking on speakers continuously in spite of occlusion or voice interruption.

Cite this article as:

M. Tsuchida, T. Kawanishi, H. Murase, and S. Takagi, “Joint Audio-Visual Tracking Based on Dynamically Weighted Linear Combination of Probability State Density,” J. Adv. Comput. Intell. Intell. Inform., Vol.8 No.2, pp. 190-199, 2004.

Data files:

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

Joint Audio-Visual Tracking Based on Dynamically Weighted Linear Combination of Probability State Density

Masaru Tsuchida*,**, Takahito Kawanishi*, Hiroshi Murase*,***, and Shigeru Takagi*

Masaru Tsuchida^*,**, Takahito Kawanishi^, Hiroshi Murase^,***, and Shigeru Takagi^*