SN Ratio Estimation and Speech Segment Detection of Extracted Signals Through Independent Component Analysis

Takeshi Koya; Nobuo Iwasaki; Takaaki Ishibashi; Go Hirano; Hiroshi Shiratsuchi; Hiromu Got; a

doi:10.20965/jaciii.2010.p0364

single-jc.php

« previous

JACIII Vol.14 No.4 pp. 364-374

doi: 10.20965/jaciii.2010.p0364

(2010)

Paper:

Views over last 60 days: 830

SN Ratio Estimation and Speech Segment Detection of Extracted Signals Through Independent Component Analysis

Takeshi Koya^, Nobuo Iwasaki^, Takaaki Ishibashi^, Go Hirano^, Hiroshi Shiratsuchi^, and Hiromu Gotanda^**

^*Solutions Development Laboratory, Advanced Solutions Technology Japan, Shinjyuku, Tokyo 169-0051, Japan

^**Graduate School of Advanced Technology, Kinki University, 11-6 Kayanomori, Iizuka-shi, Fukuoka 820-8555, Japan

^***Kumamoto National College of Technology, Koshi-shi, Kumamoto 861-1102, Japan

Received:

September 1, 2009

Accepted:

February 12, 2010

Published:

May 20, 2010

Keywords:

independent component analysis, noise reduction, SN ratio estimation, voice activity detection

Abstract

In real world environments where acoustic signals are contaminated with various noises, it is difficult to estimate the Signal-to-Noise Ratio (SNR) only from signals observed at microphones; the knowledge of acoustic transfer functions and original source signals is inevitable for SNR estimation. The present paper proposes a method to estimate SNR approximately in the real world environments without the knowledge of transfer functions and source signals: SNR is estimated after application of Independent Component Analysis (ICA) to the signals observed at microphones. Our proposed method also works as a speech segment detector since detection of speech segments are necessarily carried out in the course of SNR estimation. From several experimental results, the proposed method has been confirmed to be valid.

Cite this article as:

T. Koya, N. Iwasaki, T. Ishibashi, G. Hirano, H. Shiratsuchi, and H. Gotanda, “SN Ratio Estimation and Speech Segment Detection of Extracted Signals Through Independent Component Analysis,” J. Adv. Comput. Intell. Intell. Inform., Vol.14 No.4, pp. 364-374, 2010.

Data files:

References

[1] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. of the IEEE, Vol.77, No.2, pp. 257-286, 1989.
[2] S. F. BOLL, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction,” IEEE Trans. Acoustic, Speech, Signal Processing, Vol.ASSP-27, pp. 113-120, 1979.
[3] J.M. Gorriz, C. G. Puntonet, J. Ramirez, and J. C. Segura, “Bispectrum Estimators for Voice Activity Detection and Speech Recognition,” Lecture Notes in Artificial Intelligence, pp. 2595-2598, 2005.
[4] M. Fujimoto and Y. Ariki, “Combination of GMM Based Speech Estimation Method and Temporal Domain SVD Based Speech Enhancement for Noise Robust Speech Recognition,” IEICE Trans., Vol.J88-D-II, No.2, pp. 250-265, 2005. (in Japanese)
[5] Y. Kida and T. Kawahara, “Voice Activity Detection Based on Optimally Weighted Combination of Multiple Features,” IEICE Trans., Vol.J89-D, No.8, pp. 1820-1828, 2006. (in Japanese)
[6] S. Araki, S. Makino, Y. Hinamoto, R. Mukai, T. Nishikawa, and H. Saruwatari “Equivalence between Frequency Domain Blind Source Separation and Frequency Domain Adaptive Beamforming for Convolutive Mixtures,” EURASIP J. on Applied Signal Processing, Vol.2003, No.11, pp. 1157-1166, 2003.
[7] A. Cichocki and S. Amari, “Adaptive Blind Signal and Image Processing,” John Wiley & Sons, 2002.
[8] Y. Nagata, T. Fujioka, and M. Abe, “Target Signal Detection System Using Two Directional Microphones,” Trans. IEICE, Vol.J83-A, No.12, pp. 1445-1454, 2000. (in Japanese)
[9] C. Servière, “Separation of speech signals with segmentation of the impulse responses under reverberant conditions,” Proc. ICA2003, pp. 511-516, 2003.
[10] N. Murata, S. Ikeda, and A. Ziehe, “An approach to blind source separation based on temporal structure of speech signals,” Neurocomputing, Vol.41, Issue 1-4, pp. 1-24, 2001.
[11] H. Gotanda, K. Nobu, T. Koya, K. Kaneda, T. Ishibashi, and N. Haratani, “Permutation correction and speech extraction based on split spectrum through FastICA,” Proc. ICA2003, pp. 379-384, 2003.
[12] K. Kaneda, T. Koya, and H. Gotanda, “Permutation Resolution Based on Entropy of Split Spectrum,” Trans. IEICE, Vol.J87-A, No.7, pp. 1065-1069, 2004. (in Japanese)
[13] H. Sawada, S. Araki, R. Mukai, and S. Makino, “Blind extraction of a dominant soure signal from mixtures of many sources,” IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2005), Vol.III, pp. 61-64, 2005.
[14] T. Ishibashi, K. Inoue, H. Gotanda, and K. Kumamaru, “Permutation Correction in Frequency Domain ICA Using Propagation Characteristics in Real Environments,” Trans. ISCIE, Vol.10, No.12, pp. 469-476, 2006. (in Japanese)
[15] H. Saruwatari, S. Kurita, K. Takeda, F. Itakura, T. Nishikawa, and K. Shikano, “Blind Source Separation Combining Independent Component Analysis and Beamforming,” EURASIP J. on Applied Signal Processing, Vol.2003, No.11, pp. 1135-1146, 2003.
[16] E. Bingham and A. Hyvärinen, “A fast fixed-point algorithm for independent component analysis of complex valued signals,” Int. J. of Neural Systems, Vol.10, No.1, pp. 1-8, 2000.
[17] A. Hyvärinen and E. Oja, “Independent Component Analysis: Algorithms and applications,” Neural Networks, Vol.13, No.4-5, pp. 411-430, 2000.
[18] S. Araki, S. Makino, R. Aichner, T. Nishikawa, and H. Saruwatari, “Subband-based Blind Separation for Convolutive Mixtures of Speech,” Trans. IEICE, Vol.E88-A, No.12, pp. 3593-3603, 2005.
[19] A. J. Bell and T. J. Sejnowski, “An information maximization approach to blind separation and blind deconvolution,” Neural Computation, Vol.7, pp. 1129-1159, 1995.
[20] F. Nakagawa, S. Takase, H. Shiratsuchi, and H. Gotanda, “Estimation of OFDM Carrier Frequency Offset and Symbol Recovery by ICA,” Trans. IEICE, Vol.J91-A, No.4, 448-467, 2008. (in Japanese)
[21] BSS’99 SYNTHETIC BENCHMARKS,
http://sound.media.mit.edu/ica-bench/
[22] S. Itabashi, “Spoken Language and the DSR Projects Speech Corpus (PASL-DSR),” 1991. (in Japanese)
[23] Noisex-92 database,
http://spib.rice.edu/spib/

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. of the IEEE, Vol.77, No.2, pp. 257-286, 1989.

[2] [2] S. F. BOLL, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction,” IEEE Trans. Acoustic, Speech, Signal Processing, Vol.ASSP-27, pp. 113-120, 1979.

[3] [3] J.M. Gorriz, C. G. Puntonet, J. Ramirez, and J. C. Segura, “Bispectrum Estimators for Voice Activity Detection and Speech Recognition,” Lecture Notes in Artificial Intelligence, pp. 2595-2598, 2005.

[4] [4] M. Fujimoto and Y. Ariki, “Combination of GMM Based Speech Estimation Method and Temporal Domain SVD Based Speech Enhancement for Noise Robust Speech Recognition,” IEICE Trans., Vol.J88-D-II, No.2, pp. 250-265, 2005. (in Japanese)

[5] [5] Y. Kida and T. Kawahara, “Voice Activity Detection Based on Optimally Weighted Combination of Multiple Features,” IEICE Trans., Vol.J89-D, No.8, pp. 1820-1828, 2006. (in Japanese)

[6] [6] S. Araki, S. Makino, Y. Hinamoto, R. Mukai, T. Nishikawa, and H. Saruwatari “Equivalence between Frequency Domain Blind Source Separation and Frequency Domain Adaptive Beamforming for Convolutive Mixtures,” EURASIP J. on Applied Signal Processing, Vol.2003, No.11, pp. 1157-1166, 2003.

[7] [7] A. Cichocki and S. Amari, “Adaptive Blind Signal and Image Processing,” John Wiley & Sons, 2002.

[8] [8] Y. Nagata, T. Fujioka, and M. Abe, “Target Signal Detection System Using Two Directional Microphones,” Trans. IEICE, Vol.J83-A, No.12, pp. 1445-1454, 2000. (in Japanese)

[9] [9] C. Servière, “Separation of speech signals with segmentation of the impulse responses under reverberant conditions,” Proc. ICA2003, pp. 511-516, 2003.

[10] [10] N. Murata, S. Ikeda, and A. Ziehe, “An approach to blind source separation based on temporal structure of speech signals,” Neurocomputing, Vol.41, Issue 1-4, pp. 1-24, 2001.

[11] [11] H. Gotanda, K. Nobu, T. Koya, K. Kaneda, T. Ishibashi, and N. Haratani, “Permutation correction and speech extraction based on split spectrum through FastICA,” Proc. ICA2003, pp. 379-384, 2003.

[12] [12] K. Kaneda, T. Koya, and H. Gotanda, “Permutation Resolution Based on Entropy of Split Spectrum,” Trans. IEICE, Vol.J87-A, No.7, pp. 1065-1069, 2004. (in Japanese)

[13] [13] H. Sawada, S. Araki, R. Mukai, and S. Makino, “Blind extraction of a dominant soure signal from mixtures of many sources,” IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2005), Vol.III, pp. 61-64, 2005.

[14] [14] T. Ishibashi, K. Inoue, H. Gotanda, and K. Kumamaru, “Permutation Correction in Frequency Domain ICA Using Propagation Characteristics in Real Environments,” Trans. ISCIE, Vol.10, No.12, pp. 469-476, 2006. (in Japanese)

[15] [15] H. Saruwatari, S. Kurita, K. Takeda, F. Itakura, T. Nishikawa, and K. Shikano, “Blind Source Separation Combining Independent Component Analysis and Beamforming,” EURASIP J. on Applied Signal Processing, Vol.2003, No.11, pp. 1135-1146, 2003.

[16] [16] E. Bingham and A. Hyvärinen, “A fast fixed-point algorithm for independent component analysis of complex valued signals,” Int. J. of Neural Systems, Vol.10, No.1, pp. 1-8, 2000.

[17] [17] A. Hyvärinen and E. Oja, “Independent Component Analysis: Algorithms and applications,” Neural Networks, Vol.13, No.4-5, pp. 411-430, 2000.

[18] [18] S. Araki, S. Makino, R. Aichner, T. Nishikawa, and H. Saruwatari, “Subband-based Blind Separation for Convolutive Mixtures of Speech,” Trans. IEICE, Vol.E88-A, No.12, pp. 3593-3603, 2005.

[19] [19] A. J. Bell and T. J. Sejnowski, “An information maximization approach to blind separation and blind deconvolution,” Neural Computation, Vol.7, pp. 1129-1159, 1995.

[20] [20] F. Nakagawa, S. Takase, H. Shiratsuchi, and H. Gotanda, “Estimation of OFDM Carrier Frequency Offset and Symbol Recovery by ICA,” Trans. IEICE, Vol.J91-A, No.4, 448-467, 2008. (in Japanese)

[21] [21] BSS’99 SYNTHETIC BENCHMARKS,
http://sound.media.mit.edu/ica-bench/

[22] [22] S. Itabashi, “Spoken Language and the DSR Projects Speech Corpus (PASL-DSR),” 1991. (in Japanese)

[23] [23] Noisex-92 database,
http://spib.rice.edu/spib/

SN Ratio Estimation and Speech Segment Detection of Extracted Signals Through Independent Component Analysis

Takeshi Koya*, Nobuo Iwasaki**, Takaaki Ishibashi***, Go Hirano**, Hiroshi Shiratsuchi**, and Hiromu Gotanda**

Takeshi Koya^, Nobuo Iwasaki^, Takaaki Ishibashi^, Go Hirano^, Hiroshi Shiratsuchi^, and Hiromu Gotanda^**