VLSI Architecture for Robust Speech Recognition Systems and its Implementation on a Verification Platform

Shingo Yoshizawa; Noboru Hayasaka; Naoya Wada; Yoshikazu Miyanaga

doi:10.20965/jrm.2005.p0447

single-rb.php

« previous

JRM Vol.17 No.4 pp. 447-455

(2005)

doi: 10.20965/jrm.2005.p0447

Paper:

Views over last 60 days: 789

VLSI Architecture for Robust Speech Recognition Systems and its Implementation on a Verification Platform

Shingo Yoshizawa, Noboru Hayasaka, Naoya Wada,
and Yoshikazu Miyanaga

Graduate School of Information Science and Technology, Hokkaido University, N-14 W-9 Kita-Ku, Sapporo 060-0814, Japan

Received:

November 30, 2004

Accepted:

February 19, 2005

Published:

August 20, 2005

Keywords:

speech recognition, VLSI architecture, robust processing, FPGA

Abstract

This paper presents a VLSI architecture for a robust speech recognition system that enables high-speed, low-power operation. The proposed architecture improves recognition accuracy in noisy environments and realizes short-time response by implementing parallel and pipeline processing. We demonstrate improved processing time and power consumption by evaluating circuit performance in 0.25-μm CMOS technology. We also detail a verification platform that helps users implement our hardware-based robust speech recognition system. The verification platform facilitates software conversion to hardware and promptly provides testing environments on field-programmable gate arrays.

Cite this article as:

S. Yoshizawa, N. Hayasaka, N. Wada, and Y. Miyanaga, “VLSI Architecture for Robust Speech Recognition Systems and its Implementation on a Verification Platform,” J. Robot. Mechatron., Vol.17 No.4, pp. 447-455, 2005.

Data files:

References

[1] P. Gomez, A. Alvarez, R. Martinez, and M. Perez, “A DSP-based modular architecture for noise cancellation and speech recognition,” Proc. IEEE ISCAS98, Vol.5, pp. 178-181, June 1998.
[2] N. Hataoka, H. Kokubo, Y. Obuchi, and A. Amano, “Development of robust speech recognition middleware on microprocessor,” Proc. IEEE ICASSP98, Vol.2, pp. 837-840, May 1998.
[3] J. Pihl, T. Svendsen, and M. H. Johnsen, “A VLSI implementation of pdf computations in HMM based speech recognition,” Proc. IEEE TENCON’96, pp. 241-246, 1996.
[4] B. Park, K. Cho, and J. Cho, “Low power VLSI architecture of Viterbi scorer for HMM-based isolated word recognition,” International Symposium on Quality Electronic Design, pp. 235-239, March 2002.
[5] W. Han, K. Hon, and C. Chan, “An HMM-based speech recognition IC,” Proc. IEEE ISCAS2003, Vol.2, pp. 744-747, 2003.
[6] S. J. Melnikoff, S. Quigley, and M. J. Russell, “Implementing a simple continuous speech recognition system on an FPGA,” Proc. IEEE Symposium on FPGAs for Custom Computing Machines (FCCM 2002), pp. 275-276, 2002.
[7] F. Vargas, R. Fagundes, and D. Barros, “A FPGA-based Viterbi algorithm implementation for speech recognition systems,” Proc. IEEE ICASSP2001, Vol.2, pp. 1217-1220, May 2001.
[8] S. Yoshizawa, N. Hayasaka, N. Wada, and Y. Miyanaga, “Cepstral amplitude range normalization for noise robust speech recognition,” IEICE Trans. Inf & Syst., Vol.E87-D, No.8, pp. 2130-2137, Aug. 2004.
[9] A. Acero, and R. M. Stern, “Environmental robustness in automatic speech recognition,” Proc. ICASSP90, pp. 849-852, 1990.
[10] H. Hermansky, “RASTA processing of speech,” IEEE Trans. Speech Audio Processing, Vol.2, pp. 578-589, 1994.
[11] O. Viikki, D. Bye, and K. Lauria, “A recursive feature vector normalization approach for robust speech recognition in noise,” Proc. ICASSP98, pp. 733-736, 1998.
[12] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. on Acoust., Speech, Signal Processing, ASSP-33, Vol.27, pp. 113-120, 1979.
[13] J. Shen, W. Hwang, and L. Lee, “Robust speech recognition features based on temporal trajectory filtering of frequency band spectrum,” Proc. ICSLP96, pp. 881-884, 1996.
[14] N. Kanedera, T. Arai, and T. Funada, “Robust automatic speech recognition emphasizing important modulation spectrum,” IEICE Trans. Inf. & Syst., Vol.J84-D2, No.7, pp. 1261-1269, 2001.
[15] A. Varga, and H. J. M. Steenken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and experiment to study the effect of additive noise on speech recognition systems,” Speech Communication, Vol.12, No.3, pp. 247-251, 1993.
[16] L. R. Rabiner, and B. H. Juang, “Fundamentals of speech recognition,” Prentice Hall, Englewood Cliffs, N.J., 1993.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] P. Gomez, A. Alvarez, R. Martinez, and M. Perez, “A DSP-based modular architecture for noise cancellation and speech recognition,” Proc. IEEE ISCAS98, Vol.5, pp. 178-181, June 1998.

[B2] [2] N. Hataoka, H. Kokubo, Y. Obuchi, and A. Amano, “Development of robust speech recognition middleware on microprocessor,” Proc. IEEE ICASSP98, Vol.2, pp. 837-840, May 1998.

[B3] [3] J. Pihl, T. Svendsen, and M. H. Johnsen, “A VLSI implementation of pdf computations in HMM based speech recognition,” Proc. IEEE TENCON’96, pp. 241-246, 1996.

[B4] [4] B. Park, K. Cho, and J. Cho, “Low power VLSI architecture of Viterbi scorer for HMM-based isolated word recognition,” International Symposium on Quality Electronic Design, pp. 235-239, March 2002.

[B5] [5] W. Han, K. Hon, and C. Chan, “An HMM-based speech recognition IC,” Proc. IEEE ISCAS2003, Vol.2, pp. 744-747, 2003.

[B6] [6] S. J. Melnikoff, S. Quigley, and M. J. Russell, “Implementing a simple continuous speech recognition system on an FPGA,” Proc. IEEE Symposium on FPGAs for Custom Computing Machines (FCCM 2002), pp. 275-276, 2002.

[B7] [7] F. Vargas, R. Fagundes, and D. Barros, “A FPGA-based Viterbi algorithm implementation for speech recognition systems,” Proc. IEEE ICASSP2001, Vol.2, pp. 1217-1220, May 2001.

[B8] [8] S. Yoshizawa, N. Hayasaka, N. Wada, and Y. Miyanaga, “Cepstral amplitude range normalization for noise robust speech recognition,” IEICE Trans. Inf & Syst., Vol.E87-D, No.8, pp. 2130-2137, Aug. 2004.

[B9] [9] A. Acero, and R. M. Stern, “Environmental robustness in automatic speech recognition,” Proc. ICASSP90, pp. 849-852, 1990.

[B10] [10] H. Hermansky, “RASTA processing of speech,” IEEE Trans. Speech Audio Processing, Vol.2, pp. 578-589, 1994.

[B11] [11] O. Viikki, D. Bye, and K. Lauria, “A recursive feature vector normalization approach for robust speech recognition in noise,” Proc. ICASSP98, pp. 733-736, 1998.

[B12] [12] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. on Acoust., Speech, Signal Processing, ASSP-33, Vol.27, pp. 113-120, 1979.

[B13] [13] J. Shen, W. Hwang, and L. Lee, “Robust speech recognition features based on temporal trajectory filtering of frequency band spectrum,” Proc. ICSLP96, pp. 881-884, 1996.

[B14] [14] N. Kanedera, T. Arai, and T. Funada, “Robust automatic speech recognition emphasizing important modulation spectrum,” IEICE Trans. Inf. & Syst., Vol.J84-D2, No.7, pp. 1261-1269, 2001.

[B15] [15] A. Varga, and H. J. M. Steenken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and experiment to study the effect of additive noise on speech recognition systems,” Speech Communication, Vol.12, No.3, pp. 247-251, 1993.

[B16] [16] L. R. Rabiner, and B. H. Juang, “Fundamentals of speech recognition,” Prentice Hall, Englewood Cliffs, N.J., 1993.

VLSI Architecture for Robust Speech Recognition Systems and its Implementation on a Verification Platform

Shingo Yoshizawa, Noboru Hayasaka, Naoya Wada, and Yoshikazu Miyanaga

Shingo Yoshizawa, Noboru Hayasaka, Naoya Wada,
and Yoshikazu Miyanaga