Development, Deployment and Applications of Robot Audition Open Source Software HARK

Kazuhiro Nakadai; Hiroshi G. Okuno; Takeshi Mizumoto

doi:10.20965/jrm.2017.p0016

single-rb.php

« previous

JRM Vol.29 No.1 pp. 16-25

doi: 10.20965/jrm.2017.p0016

(2017)

Paper:

Views over last 60 days: 2,234

Development, Deployment and Applications of Robot Audition Open Source Software HARK

Kazuhiro Nakadai^,, Hiroshi G. Okuno^, and Takeshi Mizumoto^*

^*Honda Research Institute Japan Co., Ltd.
8-1 Honcho, Wako-shi, Saitama 351-0114, Japan

^**Graduate Program for Embodiment Informatics, Waseda University
2-4-12 Okubo, Shinjuku, Tokyo 169-0072, Japan

^***Graduate School of Information Science and Engineering, Tokyo Institute of Technology
2-12-1 Ookayama, Meguro-ku, Tokyo 152-8552, Japan

Received:

July 29, 2016

Accepted:

October 5, 2016

Published:

February 20, 2017

Keywords:

robot audition, open source software, microphone array processing, embedded software, cloud service

Abstract

Robot audition is a research field that focuses on developing technologies so that robots can hear sound through their own ears (microphones). By compiling robot audition studies performed over more than 10 years, open source software for research purposes called HARK (Honda Research Institute Japan Audition for Robots with Kyoto University) was released to the public in 2008. HARK is updated every year, and free tutorials are often held for its promotion. In this paper, the major functions of HARK – such as sound source localization, sound source separation, and automatic speech recognition – are explained. In order to promote HARK, HARK-Embedded for embedding purposes and HARK-SaaS used as Software as a Service (SaaS) have been actively studied and developed in recent years; these technologies are also described in the paper. In addition, applications of HARK are introduced as case studies.

Open source software for robot audition HARK

Cite this article as:

K. Nakadai, H. Okuno, and T. Mizumoto, “Development, Deployment and Applications of Robot Audition Open Source Software HARK,” J. Robot. Mechatron., Vol.29 No.1, pp. 16-25, 2017.

Data files:

References

[1] K. Nakadai et al., “Active Audition for Humanoid,” AAAI-2000, pp. 832-839, 2000.
[2] K. Nakadai et al., “Design and Implementation of Robot Audition System “HARK”,” Advanced Robotics, Vol.24, pp. 739-761, 2010.
[3] C. Côté et al., “Code reusability tools for programming mobile robots,” IEEE/RSJ IROS 2004, pp. 1820-1825, 2004.
[4] R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Trans. on Antennas and Propagation, Vol.34, No.3, pp. 276-280, 1986.
[5] F. Asano et al., “Localization and extraction of brain activity using generalized eigenvalue decomposition,” IEEE ICASSP 2008, pp. 565-568, 2008.
[6] K. Nakamura et al., “A real-time super-resolution robot audition system that improves the robustness of simultaneous speech recognition,” Advanced Robotics, Vol.27, No.12, pp. 933-945, 2013.
[7] T. Ohata et al, “Improvement in Outdoor Sound Source Detection Using a Quadrotor-Embedded Microphone Array,” IEEE/RSJ IROS, 2014.
[8] H. Nakajima et al., “Blind Source Separation with Parameter-Free Adaptive Step-Size Method for Robot Audition,” IEEE Trans. ASLP, Vol.18, No.6, pp. 1476-1484, 2010.
[9] H. Nakajima, N. Tanaka, and H. Tsuru, “Minimum sidelobe beamforming based on Mini-Max criterion,” Acoust. Sci. & Tech., Vol.25, No.6, pp. 486-488, 2004.
[10] V. A. N. Barroso and J. M. F. Moura, “Maximum likelihood beamforming in the presence of outliers,” IEEE ICASSP-91, pp. 1409-1412, 1991.
[11] M. L. Seltzer et al., “A Bayesian Framework for Spectrographic Mask Estimation for Missing Feature Speech Recognition,” Speech Communication, Vol.43, No.4, pp. 379-393, 2004.
[12] R. A. Monzingo and T. W. Miller, “Introduction to adaptive arrays,” SciTech Publishing, 1980.
[13] O. L. Frost, “An algorithm for linearly constrained adaptive array processing,” Proc. of the IEEE, Vol.60, No.8, pp. 926-935, 1972.
[14] L. J. Griffiths and C. W. Jim, “An alternative approach to linearly constrained adaptive beamforming,” IEEE Trans. on Antennas and Propagation, Vol.30, No.1, pp. 27-34, 1982.
[15] L. C. Parra and C. V. Alvino, “Geometric source separation: Mergin convolutive source separation with geometric beamforming,” IEEE Trans. on Speech and Audio Processing, Vol.10, No.6, pp. 352-362, 2002.
[16] M. Knaak et al., “Geometrically Constrained Independent Component Analysis,” IEEE Trans. on ASLP, Vol.15, No.2, pp. 715-726, 2007.
[17] H. Nakajima et al., “An easily-configurable robot audition system using Histogram-based Recursive Level Estimation,” IEEE/RSJ IROS 2010, pp. 958-963, 2010.
[18] R. Takeda et al., “Efficient Blind Dereverberation and Echo Cancellation Based on Independent Component Analysis for Actual Acoustic Signals,” Neural Computation, Vol.24, No.1, pp. 234-272, 2011.
[19] G. Ince et al., “Whole Body Motion Noise Cancellation of a Robot for Improved Automatic Speech Recognition,” Advanced Robotics, Vol.25, No.11, pp. 1405-1426, 2011.
[20] Y. Bando et al., “Human-voice enhancement based on online RPCA for a hose-shaped rescue robot with a microphone array,” 2015 IEEE Int. Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 1-6, 2015.
[21] S. Yamamoto et al., “Enhanced robot speech recognition based on microphone array source separation and missing feature theory,” IEEE/RAS ICRA 2005, pp. 1427-1482, 2005.
[22] K. Nakadai et al., “Robot-Audition-based Human-Machine Interface for a Car,” IEEE/RSJ IROS 2015, pp. 6129-6136, 2015.
[23] T. Mizumoto et al., “Design and implementation of selectable sound separation on the Texai telepresence system using HARK,” IEEE/RAS ICRA-2011, pp. 2130-2137, 2011.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] K. Nakadai et al., “Active Audition for Humanoid,” AAAI-2000, pp. 832-839, 2000.

[2] [2] K. Nakadai et al., “Design and Implementation of Robot Audition System “HARK”,” Advanced Robotics, Vol.24, pp. 739-761, 2010.

[3] [3] C. Côté et al., “Code reusability tools for programming mobile robots,” IEEE/RSJ IROS 2004, pp. 1820-1825, 2004.

[4] [4] R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Trans. on Antennas and Propagation, Vol.34, No.3, pp. 276-280, 1986.

[5] [5] F. Asano et al., “Localization and extraction of brain activity using generalized eigenvalue decomposition,” IEEE ICASSP 2008, pp. 565-568, 2008.

[6] [6] K. Nakamura et al., “A real-time super-resolution robot audition system that improves the robustness of simultaneous speech recognition,” Advanced Robotics, Vol.27, No.12, pp. 933-945, 2013.

[7] [7] T. Ohata et al, “Improvement in Outdoor Sound Source Detection Using a Quadrotor-Embedded Microphone Array,” IEEE/RSJ IROS, 2014.

[8] [8] H. Nakajima et al., “Blind Source Separation with Parameter-Free Adaptive Step-Size Method for Robot Audition,” IEEE Trans. ASLP, Vol.18, No.6, pp. 1476-1484, 2010.

[9] [9] H. Nakajima, N. Tanaka, and H. Tsuru, “Minimum sidelobe beamforming based on Mini-Max criterion,” Acoust. Sci. & Tech., Vol.25, No.6, pp. 486-488, 2004.

[10] [10] V. A. N. Barroso and J. M. F. Moura, “Maximum likelihood beamforming in the presence of outliers,” IEEE ICASSP-91, pp. 1409-1412, 1991.

[11] [11] M. L. Seltzer et al., “A Bayesian Framework for Spectrographic Mask Estimation for Missing Feature Speech Recognition,” Speech Communication, Vol.43, No.4, pp. 379-393, 2004.

[12] [12] R. A. Monzingo and T. W. Miller, “Introduction to adaptive arrays,” SciTech Publishing, 1980.

[13] [13] O. L. Frost, “An algorithm for linearly constrained adaptive array processing,” Proc. of the IEEE, Vol.60, No.8, pp. 926-935, 1972.

[14] [14] L. J. Griffiths and C. W. Jim, “An alternative approach to linearly constrained adaptive beamforming,” IEEE Trans. on Antennas and Propagation, Vol.30, No.1, pp. 27-34, 1982.

[15] [15] L. C. Parra and C. V. Alvino, “Geometric source separation: Mergin convolutive source separation with geometric beamforming,” IEEE Trans. on Speech and Audio Processing, Vol.10, No.6, pp. 352-362, 2002.

[16] [16] M. Knaak et al., “Geometrically Constrained Independent Component Analysis,” IEEE Trans. on ASLP, Vol.15, No.2, pp. 715-726, 2007.

[17] [17] H. Nakajima et al., “An easily-configurable robot audition system using Histogram-based Recursive Level Estimation,” IEEE/RSJ IROS 2010, pp. 958-963, 2010.

[18] [18] R. Takeda et al., “Efficient Blind Dereverberation and Echo Cancellation Based on Independent Component Analysis for Actual Acoustic Signals,” Neural Computation, Vol.24, No.1, pp. 234-272, 2011.

[19] [19] G. Ince et al., “Whole Body Motion Noise Cancellation of a Robot for Improved Automatic Speech Recognition,” Advanced Robotics, Vol.25, No.11, pp. 1405-1426, 2011.

[20] [20] Y. Bando et al., “Human-voice enhancement based on online RPCA for a hose-shaped rescue robot with a microphone array,” 2015 IEEE Int. Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 1-6, 2015.

[21] [21] S. Yamamoto et al., “Enhanced robot speech recognition based on microphone array source separation and missing feature theory,” IEEE/RAS ICRA 2005, pp. 1427-1482, 2005.

[22] [22] K. Nakadai et al., “Robot-Audition-based Human-Machine Interface for a Car,” IEEE/RSJ IROS 2015, pp. 6129-6136, 2015.

[23] [23] T. Mizumoto et al., “Design and implementation of selectable sound separation on the Texai telepresence system using HARK,” IEEE/RAS ICRA-2011, pp. 2130-2137, 2011.

Development, Deployment and Applications of Robot Audition Open Source Software HARK

Kazuhiro Nakadai*,***, Hiroshi G. Okuno**, and Takeshi Mizumoto*

Kazuhiro Nakadai^,, Hiroshi G. Okuno^, and Takeshi Mizumoto^*