JRM Vol.17 No.4 pp. 447-455
doi: 10.20965/jrm.2005.p0447


VLSI Architecture for Robust Speech Recognition Systems and its Implementation on a Verification Platform

Shingo Yoshizawa, Noboru Hayasaka, Naoya Wada,
and Yoshikazu Miyanaga

Graduate School of Information Science and Technology, Hokkaido University, N-14 W-9 Kita-Ku, Sapporo 060-0814, Japan

November 30, 2004
February 19, 2005
August 20, 2005
speech recognition, VLSI architecture, robust processing, FPGA
This paper presents a VLSI architecture for a robust speech recognition system that enables high-speed, low-power operation. The proposed architecture improves recognition accuracy in noisy environments and realizes short-time response by implementing parallel and pipeline processing. We demonstrate improved processing time and power consumption by evaluating circuit performance in 0.25-μm CMOS technology. We also detail a verification platform that helps users implement our hardware-based robust speech recognition system. The verification platform facilitates software conversion to hardware and promptly provides testing environments on field-programmable gate arrays.
Cite this article as:
S. Yoshizawa, N. Hayasaka, N. Wada, and Y. Miyanaga, “VLSI Architecture for Robust Speech Recognition Systems and its Implementation on a Verification Platform,” J. Robot. Mechatron., Vol.17 No.4, pp. 447-455, 2005.
Data files:
  1. [1] P. Gomez, A. Alvarez, R. Martinez, and M. Perez, “A DSP-based modular architecture for noise cancellation and speech recognition,” Proc. IEEE ISCAS98, Vol.5, pp. 178-181, June 1998.
  2. [2] N. Hataoka, H. Kokubo, Y. Obuchi, and A. Amano, “Development of robust speech recognition middleware on microprocessor,” Proc. IEEE ICASSP98, Vol.2, pp. 837-840, May 1998.
  3. [3] J. Pihl, T. Svendsen, and M. H. Johnsen, “A VLSI implementation of pdf computations in HMM based speech recognition,” Proc. IEEE TENCON’96, pp. 241-246, 1996.
  4. [4] B. Park, K. Cho, and J. Cho, “Low power VLSI architecture of Viterbi scorer for HMM-based isolated word recognition,” International Symposium on Quality Electronic Design, pp. 235-239, March 2002.
  5. [5] W. Han, K. Hon, and C. Chan, “An HMM-based speech recognition IC,” Proc. IEEE ISCAS2003, Vol.2, pp. 744-747, 2003.
  6. [6] S. J. Melnikoff, S. Quigley, and M. J. Russell, “Implementing a simple continuous speech recognition system on an FPGA,” Proc. IEEE Symposium on FPGAs for Custom Computing Machines (FCCM 2002), pp. 275-276, 2002.
  7. [7] F. Vargas, R. Fagundes, and D. Barros, “A FPGA-based Viterbi algorithm implementation for speech recognition systems,” Proc. IEEE ICASSP2001, Vol.2, pp. 1217-1220, May 2001.
  8. [8] S. Yoshizawa, N. Hayasaka, N. Wada, and Y. Miyanaga, “Cepstral amplitude range normalization for noise robust speech recognition,” IEICE Trans. Inf & Syst., Vol.E87-D, No.8, pp. 2130-2137, Aug. 2004.
  9. [9] A. Acero, and R. M. Stern, “Environmental robustness in automatic speech recognition,” Proc. ICASSP90, pp. 849-852, 1990.
  10. [10] H. Hermansky, “RASTA processing of speech,” IEEE Trans. Speech Audio Processing, Vol.2, pp. 578-589, 1994.
  11. [11] O. Viikki, D. Bye, and K. Lauria, “A recursive feature vector normalization approach for robust speech recognition in noise,” Proc. ICASSP98, pp. 733-736, 1998.
  12. [12] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. on Acoust., Speech, Signal Processing, ASSP-33, Vol.27, pp. 113-120, 1979.
  13. [13] J. Shen, W. Hwang, and L. Lee, “Robust speech recognition features based on temporal trajectory filtering of frequency band spectrum,” Proc. ICSLP96, pp. 881-884, 1996.
  14. [14] N. Kanedera, T. Arai, and T. Funada, “Robust automatic speech recognition emphasizing important modulation spectrum,” IEICE Trans. Inf. & Syst., Vol.J84-D2, No.7, pp. 1261-1269, 2001.
  15. [15] A. Varga, and H. J. M. Steenken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and experiment to study the effect of additive noise on speech recognition systems,” Speech Communication, Vol.12, No.3, pp. 247-251, 1993.
  16. [16] L. R. Rabiner, and B. H. Juang, “Fundamentals of speech recognition,” Prentice Hall, Englewood Cliffs, N.J., 1993.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Jun. 03, 2024