Design and Assessment of Sound Source Localization System with a UAV-Embedded Microphone Array

Kotaro Hoshiba; Osamu Sugiyama; Akihide Nagamine; Ryosuke Kojima; Makoto Kumon; Kazuhiro Nakadai

doi:10.20965/jrm.2017.p0154

single-rb.php

« previous

JRM Vol.29 No.1 pp. 154-167

(2017)

doi: 10.20965/jrm.2017.p0154

Paper:

Views over last 60 days: 2,592

Design and Assessment of Sound Source Localization System with a UAV-Embedded Microphone Array

Kotaro Hoshiba^1, Osamu Sugiyama^2, Akihide Nagamine^3, Ryosuke Kojima^4, Makoto Kumon^5, and Kazuhiro Nakadai^1,*6

^*1Department of Systems and Control Engineering, School of Engineering, Tokyo Institute of Technology
2-12-1 Ookayama, Meguro-ku, Tokyo 152-8552, Japan

^*2Kyoto University Hospital
54 Kawaharacho, Shogoin, Sakyo-ku, Kyoto, Kyoto 606-8507, Japan

^*3Department of Electrical and Electronic Engineering, School of Engineering, Tokyo Institute of Technology

^*4Graduate School of Information Science and Engineering, Tokyo Institute of Technology
2-12-1 Ookayama, Meguro-ku, Tokyo 152-8552, Japan

^*5Graduate School of Science and Technology, Kumamoto University
2-39-1 Kurokami, Chuo-ku, Kumamoto, Kumamoto 860-8555, Japan

^*6Honda Research Institute Japan Co., Ltd.
8-1 Honcho, Wako, Saitama 351-0188, Japan

Received:

July 24, 2016

Accepted:

December 15, 2016

Published:

February 20, 2017

Keywords:

robot audition, sound source localization, multiple signal classification, actual environmental measurement, unmanned aerial vehicle

Abstract

We have studied on robot-audition-based sound source localization using a microphone array embedded on a UAV (unmanned aerial vehicle) to locate people who need assistance in a disaster-stricken area. A localization method with high robustness against noise and a small calculation cost have been proposed to solve a problem specific to the outdoor sound environment. In this paper, the proposed method is extended for practical use, a system based on the method is designed and implemented, and results of sound source localization conducted in the actual outdoor environment are shown. First, a 2.5-dimensional sound source localization method, which is a two-dimensional sound source localization plus distance estimation, is proposed. Then, the offline sound source localization system is structured using the proposed method, and the accuracy of the localization results is evaluated and discussed. As a result, the usability of the proposed extended method and newly developed three-dimensional visualization tool is confirmed, and a change in the detection accuracy for different types or distances of the sound source is found. Next, the sound source localization is conducted in real-time by extending the offline system to online to ensure that the detection performance of the offline system is kept in the online system. Moreover, the relationship between the parameters and detection accuracy is evaluated to localize only a target sound source. As a result, indices to determine an appropriate threshold are obtained and localization of a target sound source is realized at a designated accuracy.

Visualization of localization result

Cite this article as:

K. Hoshiba, O. Sugiyama, A. Nagamine, R. Kojima, M. Kumon, and K. Nakadai, “Design and Assessment of Sound Source Localization System with a UAV-Embedded Microphone Array,” J. Robot. Mechatron., Vol.29 No.1, pp. 154-167, 2017.

Data files:

References

[1] K. Nakadai, T. Lourens, H. G. Okuno, and H. Kitano, “Active audition for humanoid,” Proc. of 17th National Conf. on Artificial Intelligence (AAAI-2000), pp. 832-839, 2000.
[2] S. Yamamoto, K. Nakadai, M. Nakano, H. Tsujino, J. M. Valin, K. Komatani, T. Ogata, and H. G. Okuno, “Design and implementation of a robot audition system for automatic speech recognition of simultaneous speech,” Proc. of the 2007 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU-2007), pp. 111-116, 2007.
[3] H. Nakajima, K. Nakadai, Y. Hasegawa, and H. Tsujino, “Blind source separation with parameter-free adaptive step-size method for robot audition,” IEEE Trans. on Audio, Speech, and Language Processing, Vol.18, No.6, pp. 1476-1485, 2010.
[4] K. Okutani, T. Yoshida, K. Nakamura, and K. Nakadai, “Outdoor auditory scene analysis using a moving microphone array embedded in a quadrocopter,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 3288-3293, 2012.
[5] T. Ohata, K. Nakamura, T. Mizumoto, T. Tezuka, and K. Nakadai, “Improvement in outdoor sound source detection using a quadrotor-embedded microphone array,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 1902-1907, 2014.
[6] K. Nakamura, K. Nakadai, F. Asano, Y. Hasegawa, and H. Tsujino, “Intelligent sound source localization for dynamic environments,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 664-669, 2009.
[7] K. Nakamura, K. Nakadai, and G. Ince, “Real-time super-resolution Sound Source Localization for robots,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 694-699, 2012.
[8] M. Basiri, F. Schill, P. U. Lima, and D. Floreano, “Robust acoustic source localization of emergency signals from Micro Air Vehicles,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 4737-4742, 2012.
[9] Y. Bando, T. Mizumoto, K. Itoyama, K. Nakadai, and H. G. Okuno, “Posture estimation of hose-shaped robot using microphone array localization,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 3446-3451, 2013.
[10] Y. Sasaki, N. Hatao, K. Yoshii, and S. Kagami, “Nested iGMM recognition and multiple hypothesis tracking of moving sound sources for mobile robot audition,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 3930-3936, 2013.
[11] K. Niwa, S. Esaki, Y. Hioka, T. Nishino, and K. Takeda, “An Estimation Method of Distance between Each Sound Source and Microphone Array Utilizing Eigenvalue Distribution of Spatial Correlation Matrix,” IEICE Trans. on Fundamentals of Electronics, Communications and Computer Sciences, Vol.J97-A, No.2, pp. 68-76, 2014 (in Japanese).
[12] R. O. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Trans. on Antennas and Propagation, Vol.34, No.3, pp. 276-280, 1986.
[13] K. Furukawa, K. Okutani, K. Nagira, T. Otsuka, K. Itoyama, K. Nakadai, and H. G. Okuno, “Noise correlation matrix estimation for improving sound source localization by multirotor UAV,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 3943-3948, 2013.
[14] Y. Sasaki, S. Masunaga, S. Thompson, S. Kagami, and H. Mizoguchi, “Sound Localization and Separation for Mobile Robot Tele-Operation by Tri-Concentric Microphone Array,” J. of Robotics and Mechatronics, Vol.19, No.3, pp. 281-289, 2007.
[15] Y. Kubota, M. Yoshida, K. Komatani, T. Ogata, and H. G. Okuno, “Design and Implementation of 3D Auditory Scene Visualizer towards Auditory Awareness with Face Tracking,” Proc. of the Tenth IEEE Int. Symposium on Multimedia (ISM), pp. 468-476, 2008.
[16] T. Mizumoto, K. Nakadai, T, Yoshida, R. Takeda, T. Otsuka, T. Takahashi, and H. G. Okuno, “Design and Implementation of Selectable Sound Separation on the Texai Telepresence System using HARK,” Proc. of the IEEE Int. Conf. on Robots and Automation (ICRA), pp. 2130-2137, 2011.
[17] K. Nakadai, T. Takahashi, H. G. Okuno, H. Nakajima, Y. Hasegawa, and H. Tsujino, “Design and Implementation of Robot Audition System ’HARK’ – Open Source Software for Listening to Three Simultaneous Speakers,” Advanced Robotics, Vol.24, No.5-6, pp. 739-761, 2010.
[18] S. Uemura, O. Sugiyama, R. Kojima, and K. Nakadai, “Outdoor Acoustic Event Identification using Sound Source Separation and Deep Learning with a Quadrotor-Embedded Microphone Array,” Proc. of the 6th Int. Conf. on Advanced Mechatronics, pp. 329-330, 2015.
[19] A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, “The DET curve in assessment of detection task performance,” Proc. of the Fifth European Conf. on Speech Communication and Technology, pp. 1895-1898, 1997.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] K. Nakadai, T. Lourens, H. G. Okuno, and H. Kitano, “Active audition for humanoid,” Proc. of 17th National Conf. on Artificial Intelligence (AAAI-2000), pp. 832-839, 2000.

[2] [2] S. Yamamoto, K. Nakadai, M. Nakano, H. Tsujino, J. M. Valin, K. Komatani, T. Ogata, and H. G. Okuno, “Design and implementation of a robot audition system for automatic speech recognition of simultaneous speech,” Proc. of the 2007 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU-2007), pp. 111-116, 2007.

[3] [3] H. Nakajima, K. Nakadai, Y. Hasegawa, and H. Tsujino, “Blind source separation with parameter-free adaptive step-size method for robot audition,” IEEE Trans. on Audio, Speech, and Language Processing, Vol.18, No.6, pp. 1476-1485, 2010.

[4] [4] K. Okutani, T. Yoshida, K. Nakamura, and K. Nakadai, “Outdoor auditory scene analysis using a moving microphone array embedded in a quadrocopter,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 3288-3293, 2012.

[5] [5] T. Ohata, K. Nakamura, T. Mizumoto, T. Tezuka, and K. Nakadai, “Improvement in outdoor sound source detection using a quadrotor-embedded microphone array,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 1902-1907, 2014.

[6] [6] K. Nakamura, K. Nakadai, F. Asano, Y. Hasegawa, and H. Tsujino, “Intelligent sound source localization for dynamic environments,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 664-669, 2009.

[7] [7] K. Nakamura, K. Nakadai, and G. Ince, “Real-time super-resolution Sound Source Localization for robots,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 694-699, 2012.

[8] [8] M. Basiri, F. Schill, P. U. Lima, and D. Floreano, “Robust acoustic source localization of emergency signals from Micro Air Vehicles,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 4737-4742, 2012.

[9] [9] Y. Bando, T. Mizumoto, K. Itoyama, K. Nakadai, and H. G. Okuno, “Posture estimation of hose-shaped robot using microphone array localization,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 3446-3451, 2013.

[10] [10] Y. Sasaki, N. Hatao, K. Yoshii, and S. Kagami, “Nested iGMM recognition and multiple hypothesis tracking of moving sound sources for mobile robot audition,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 3930-3936, 2013.

[11] [11] K. Niwa, S. Esaki, Y. Hioka, T. Nishino, and K. Takeda, “An Estimation Method of Distance between Each Sound Source and Microphone Array Utilizing Eigenvalue Distribution of Spatial Correlation Matrix,” IEICE Trans. on Fundamentals of Electronics, Communications and Computer Sciences, Vol.J97-A, No.2, pp. 68-76, 2014 (in Japanese).

[12] [12] R. O. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Trans. on Antennas and Propagation, Vol.34, No.3, pp. 276-280, 1986.

[13] [13] K. Furukawa, K. Okutani, K. Nagira, T. Otsuka, K. Itoyama, K. Nakadai, and H. G. Okuno, “Noise correlation matrix estimation for improving sound source localization by multirotor UAV,” Proc. of the IEEE/RSJ Int. Conf. on Robots and Intelligent Systems (IROS), pp. 3943-3948, 2013.

[14] [14] Y. Sasaki, S. Masunaga, S. Thompson, S. Kagami, and H. Mizoguchi, “Sound Localization and Separation for Mobile Robot Tele-Operation by Tri-Concentric Microphone Array,” J. of Robotics and Mechatronics, Vol.19, No.3, pp. 281-289, 2007.

[15] [15] Y. Kubota, M. Yoshida, K. Komatani, T. Ogata, and H. G. Okuno, “Design and Implementation of 3D Auditory Scene Visualizer towards Auditory Awareness with Face Tracking,” Proc. of the Tenth IEEE Int. Symposium on Multimedia (ISM), pp. 468-476, 2008.

[16] [16] T. Mizumoto, K. Nakadai, T, Yoshida, R. Takeda, T. Otsuka, T. Takahashi, and H. G. Okuno, “Design and Implementation of Selectable Sound Separation on the Texai Telepresence System using HARK,” Proc. of the IEEE Int. Conf. on Robots and Automation (ICRA), pp. 2130-2137, 2011.

[17] [17] K. Nakadai, T. Takahashi, H. G. Okuno, H. Nakajima, Y. Hasegawa, and H. Tsujino, “Design and Implementation of Robot Audition System ’HARK’ – Open Source Software for Listening to Three Simultaneous Speakers,” Advanced Robotics, Vol.24, No.5-6, pp. 739-761, 2010.

[18] [18] S. Uemura, O. Sugiyama, R. Kojima, and K. Nakadai, “Outdoor Acoustic Event Identification using Sound Source Separation and Deep Learning with a Quadrotor-Embedded Microphone Array,” Proc. of the 6th Int. Conf. on Advanced Mechatronics, pp. 329-330, 2015.

[19] [19] A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, “The DET curve in assessment of detection task performance,” Proc. of the Fifth European Conf. on Speech Communication and Technology, pp. 1895-1898, 1997.

Design and Assessment of Sound Source Localization System with a UAV-Embedded Microphone Array

Kotaro Hoshiba*1, Osamu Sugiyama*2, Akihide Nagamine*3, Ryosuke Kojima*4, Makoto Kumon*5, and Kazuhiro Nakadai*1,*6

Kotaro Hoshiba^1, Osamu Sugiyama^2, Akihide Nagamine^3, Ryosuke Kojima^4, Makoto Kumon^5, and Kazuhiro Nakadai^1,*6