Layout Optimization of Cooperative Distributed Microphone Arrays Based on Estimation of Source Separation Performance

Kouhei Sekiguchi; Yoshiaki B; o; Katsutoshi Itoyama; Kazuyoshi Yoshii

doi:10.20965/jrm.2017.p0083

single-rb.php

« previous

JRM Vol.29 No.1 pp. 83-93

doi: 10.20965/jrm.2017.p0083

(2017)

Paper:

Views over last 60 days: 1,531

Layout Optimization of Cooperative Distributed Microphone Arrays Based on Estimation of Source Separation Performance

Kouhei Sekiguchi, Yoshiaki Bando, Katsutoshi Itoyama, and Kazuyoshi Yoshii

Graduate School of Informatics, Kyoto University
Yoshida-honmachi, Sakyo, Kyoto 606-8501, Japan

Received:

July 20, 2016

Accepted:

November 4, 2016

Published:

February 20, 2017

Keywords:

cooperative source separation, multiple mobile robots

Abstract

The active audition method presented here improves source separation performance by moving multiple mobile robots to optimal positions. One advantage of using multiple mobile robots that each has a microphone array is that each robot can work independently or as part of a big reconfigurable array. To determine optimal layout of the robots, we must be able to predict source separation performance from source position information because actual source signals are unknown and actual separation performance cannot be calculated. Our method thus simulates delay-and-sum beamforming from a possible layout to calculate gain theoretically, i.e., the expected ratio of a target sound source to other sound sources in the corresponding separated signal. Robots are moved into the layout with the highest average gain over target sources. Experimental results showed that our method improved the harmonic mean of signal-to-distortion ratios (SDRs) by 5.5 dB in simulation and by 3.5 dB in a real environment.

Optimizing robot positions for source separation

Cite this article as:

K. Sekiguchi, Y. Bando, K. Itoyama, and K. Yoshii, “Layout Optimization of Cooperative Distributed Microphone Arrays Based on Estimation of Source Separation Performance,” J. Robot. Mechatron., Vol.29 No.1, pp. 83-93, 2017.

Data files:

References

[1] H. G. Okuno, K. Nakadai, and H.-D. Kim, “Robot Audition: Missing Feature Theory Approach and Active Audition,” Robotics Research, Vol.70, pp. 227-244, Springer, 2011.
[2] E. Martinson, T. Apker, and M. Bugajska, “Optimizing a Reconfigurable Robotic Microphone Array,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 125-130, 2011.
[3] T. Nakashima, K. Komatani, and S. Sato, “Natural Interaction with Robots, Knowbots and Smartphones,” chapter Integration of Multiple Sound Source Localization Results for Speaker Identification in Multiparty Dialogue System, pp. 153-165, Springer, 2014.
[4] K. Sekiguchi, Y. Bando, K. Itoyama, and K. Yoshii, “Optimizing the Layout of Multiple Mobile Robots for Cooperative Sound Source Separation,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 5548-5554, 2015.
[5] K. Nakadai, H. G. Okuno, and H. Kitano, “Real-Time Sound Source Localization and Separation for Robot Audition,” IEEE Int. Conf. on Spoken Language Processing, pp. 193-196, 2002.
[6] H. Miura, T. Yoshida, K. Nakamura, and K. Nakadai, “SLAM-based Online Calibration of Asynchronous Microphone Array for Robot Audition,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 524-529, 2011.
[7] D. Su, T. Vidal-Calleja, and J. V. Miro, “Simultaneous Asynchronous Microphone Array Calibration and Sound Source Localisation,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 5561-5567, 2015.
[8] H. Nakajima, K. Nakadai, Y. Hasegawa, and H. Tsujino, “Blind Source Separation With Parameter-Free Adaptive Step-Size Method for Robot Audition,” IEEE Trans. on Audio, Speech and Language Processing, Vol.18, No.6, pp. 1476-1485, 2010.
[9] K. Nakadai, T. Lourens, H. G. Okuno, and H. Kitano, “Active Audition for Humanoid,” American Association for Artificial Intelligence, pp. 832-839, 2000.
[10] E. Berglund and J. Sitte, “Sound Source Localisation Through Active Audition,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 509-514, 2005.
[11] H.-D. Kim, J.-S. Choi, and M. Kim, “Human-Robot Interaction in Real Environment by Audio-Visual Integration,” J. of Control, Automation and Systems, Vol.5, No.1, pp. 61-69, 2007.
[12] G. L. Reid and E. Milios, “Active Stereo Sound Localization,” J. of Acoustical Society of America, Vol.113, No.1, pp. 185-193, 2003.
[13] Y. Sasaki, S. Kagami, and H. Mizoguchi, “Multiple Sound Source Mapping for a Mobile Robot by Self-motion Triangulation,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 380-385, 2006.
[14] T. Yoshida and K. Nakadai, “Active Audio-Visual Integration for Voice Activity Detection based on a Causal Bayesian Network,” IEEE RAS Int. Conf. on Humanoid Robots, pp. 370-375, 2012.
[15] Y. Sasaki, T. Fujihara, S. Kagami, H. Mizoguchi, and K. Oro, “32-Channel Omni-Directional Microphone Array Design and Implementation,” J. of Robotics and Mechatronics, Vol.23, No.3, pp. 378-385, 2011.
[16] N. Mitianoudis and M. Davies, “Independent Component Analysis and Blind Signal Separation,” Lecture Notes in Computer Science, Vol.3195, chapter Permutation Alignment for Frequency Domain ICA Using Subspace Beamforming Methods, pp. 669-676, Springer, 2004.
[17] L. Parra and C. Spence, “Convolutive Blind Separation of Nonstationary Sources,” IEEE Trans. on Speech and Audio Processing, Vol.8, No.3, pp. 320-327, 2000.
[18] R. Mazur and A. Mertins, “A CUDA Implementation of Independent Component Analysis in the Time-frequency Domain,” European Signal Processing Conf., 2011.
[19] J. Hao, I. Lee, T.-W. Lee, and T. J. Sejnowski, “Independent Vector Analysis for Source Separation Using a Mixture of Gaussians Prior,” J. of Neural Computatation, Vol.22, No.6, ppp. 1646-1673, 2010.
[20] I. Lee, T. Kim, and T.-W. Lee, “Fast Fixed-point Independent Vector Analysis Algorithms for Convolutive Blind Source Separation,” J. of Signal Processing, Vol.87, No.8, pp. 1859-1871, 2007.
[21] H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, and K. Shikano, “Blind Source Separation Based on a Fast-Convergence Algorithm Combining ICA and Beamforming,” IEEE Trans. on Audio, Speech and Language Processing, Vol.14, No.2, pp. 666-678, 2006.
[22] J. Mockus, “On Bayesian Methods for Seeking the Extremum,” Proceedings of the IFIP Technical Conf., pp. 400-404, Springer-Verlag, 1974.
[23] Y. Sagisaka and N. Uratani, “ATR Spoken Language Database,” J. of the Acoustic Society of Japan, Vol.48, No.12, pp. 878-882, 1992.
[24] C. Raffel, B. McFee, E. J. humphrey, J. Salamon, O. Nieto, D. Liang, and D. P. W. Ellis, “mir eval: A Transparent Implementation of Common MIR Metrics,” The Int. Society of Music Information Retrieval, pp. 367-372, 2014.
[25] E. Vincent, R. Gribonval, and C. Fevotte, “Performance Measurement in Blind Audio Source Separation,” IEEE Trans. on Audio, Speech and Language Processing, Vol.14, No.4, pp. 1462-1469, 2006.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] H. G. Okuno, K. Nakadai, and H.-D. Kim, “Robot Audition: Missing Feature Theory Approach and Active Audition,” Robotics Research, Vol.70, pp. 227-244, Springer, 2011.

[2] [2] E. Martinson, T. Apker, and M. Bugajska, “Optimizing a Reconfigurable Robotic Microphone Array,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 125-130, 2011.

[3] [3] T. Nakashima, K. Komatani, and S. Sato, “Natural Interaction with Robots, Knowbots and Smartphones,” chapter Integration of Multiple Sound Source Localization Results for Speaker Identification in Multiparty Dialogue System, pp. 153-165, Springer, 2014.

[4] [4] K. Sekiguchi, Y. Bando, K. Itoyama, and K. Yoshii, “Optimizing the Layout of Multiple Mobile Robots for Cooperative Sound Source Separation,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 5548-5554, 2015.

[5] [5] K. Nakadai, H. G. Okuno, and H. Kitano, “Real-Time Sound Source Localization and Separation for Robot Audition,” IEEE Int. Conf. on Spoken Language Processing, pp. 193-196, 2002.

[6] [6] H. Miura, T. Yoshida, K. Nakamura, and K. Nakadai, “SLAM-based Online Calibration of Asynchronous Microphone Array for Robot Audition,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 524-529, 2011.

[7] [7] D. Su, T. Vidal-Calleja, and J. V. Miro, “Simultaneous Asynchronous Microphone Array Calibration and Sound Source Localisation,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 5561-5567, 2015.

[8] [8] H. Nakajima, K. Nakadai, Y. Hasegawa, and H. Tsujino, “Blind Source Separation With Parameter-Free Adaptive Step-Size Method for Robot Audition,” IEEE Trans. on Audio, Speech and Language Processing, Vol.18, No.6, pp. 1476-1485, 2010.

[9] [9] K. Nakadai, T. Lourens, H. G. Okuno, and H. Kitano, “Active Audition for Humanoid,” American Association for Artificial Intelligence, pp. 832-839, 2000.

[10] [10] E. Berglund and J. Sitte, “Sound Source Localisation Through Active Audition,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 509-514, 2005.

[11] [11] H.-D. Kim, J.-S. Choi, and M. Kim, “Human-Robot Interaction in Real Environment by Audio-Visual Integration,” J. of Control, Automation and Systems, Vol.5, No.1, pp. 61-69, 2007.

[12] [12] G. L. Reid and E. Milios, “Active Stereo Sound Localization,” J. of Acoustical Society of America, Vol.113, No.1, pp. 185-193, 2003.

[13] [13] Y. Sasaki, S. Kagami, and H. Mizoguchi, “Multiple Sound Source Mapping for a Mobile Robot by Self-motion Triangulation,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 380-385, 2006.

[14] [14] T. Yoshida and K. Nakadai, “Active Audio-Visual Integration for Voice Activity Detection based on a Causal Bayesian Network,” IEEE RAS Int. Conf. on Humanoid Robots, pp. 370-375, 2012.

[15] [15] Y. Sasaki, T. Fujihara, S. Kagami, H. Mizoguchi, and K. Oro, “32-Channel Omni-Directional Microphone Array Design and Implementation,” J. of Robotics and Mechatronics, Vol.23, No.3, pp. 378-385, 2011.

[16] [16] N. Mitianoudis and M. Davies, “Independent Component Analysis and Blind Signal Separation,” Lecture Notes in Computer Science, Vol.3195, chapter Permutation Alignment for Frequency Domain ICA Using Subspace Beamforming Methods, pp. 669-676, Springer, 2004.

[17] [17] L. Parra and C. Spence, “Convolutive Blind Separation of Nonstationary Sources,” IEEE Trans. on Speech and Audio Processing, Vol.8, No.3, pp. 320-327, 2000.

[18] [18] R. Mazur and A. Mertins, “A CUDA Implementation of Independent Component Analysis in the Time-frequency Domain,” European Signal Processing Conf., 2011.

[19] [19] J. Hao, I. Lee, T.-W. Lee, and T. J. Sejnowski, “Independent Vector Analysis for Source Separation Using a Mixture of Gaussians Prior,” J. of Neural Computatation, Vol.22, No.6, ppp. 1646-1673, 2010.

[20] [20] I. Lee, T. Kim, and T.-W. Lee, “Fast Fixed-point Independent Vector Analysis Algorithms for Convolutive Blind Source Separation,” J. of Signal Processing, Vol.87, No.8, pp. 1859-1871, 2007.

[21] [21] H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, and K. Shikano, “Blind Source Separation Based on a Fast-Convergence Algorithm Combining ICA and Beamforming,” IEEE Trans. on Audio, Speech and Language Processing, Vol.14, No.2, pp. 666-678, 2006.

[22] [22] J. Mockus, “On Bayesian Methods for Seeking the Extremum,” Proceedings of the IFIP Technical Conf., pp. 400-404, Springer-Verlag, 1974.

[23] [23] Y. Sagisaka and N. Uratani, “ATR Spoken Language Database,” J. of the Acoustic Society of Japan, Vol.48, No.12, pp. 878-882, 1992.

[24] [24] C. Raffel, B. McFee, E. J. humphrey, J. Salamon, O. Nieto, D. Liang, and D. P. W. Ellis, “mir eval: A Transparent Implementation of Common MIR Metrics,” The Int. Society of Music Information Retrieval, pp. 367-372, 2014.

[25] [25] E. Vincent, R. Gribonval, and C. Fevotte, “Performance Measurement in Blind Audio Source Separation,” IEEE Trans. on Audio, Speech and Language Processing, Vol.14, No.4, pp. 1462-1469, 2006.