Research Paper:
A Study on Speech Recognition by a Neural Network Based on English Speech Feature Parameters
Congmin Mao and Sujing Liu
Huaxin College, Hebei GEO University
No.69 Wufan Road, Airport Industrial Park, Xinle, Shijiazhuang, Hebei 050700, China
Corresponding author
In this study, from the perspective of English speech feature parameters, two feature parameters, the mel-frequency cepstral coefficient (MFCC) and filter bank (Fbank), were selected to identify English speech. The algorithms used for recognition employed the classical back-propagation neural network (BPNN), recurrent neural network (RNN), and long short-term memory (LSTM) that were obtained by improving RNN. The three recognition algorithms were compared in the experiments, and the effects of the two feature parameters on the performance of the recognition algorithms were also compared. The LSTM model had the best identification performance among the three neural networks under different experimental environments; the neural network model using the MFCC feature parameter outperformed the neural network using the Fbank feature parameter; the LSTM model had the highest correct rate and the highest speed, while the RNN model ranked second, and the BPNN model ranked worst. The results confirm that the application of the LSTM model in combination with MFCC feature parameter extraction to English speech recognition can achieve higher speech recognition accuracy compared to other neural networks.
- [1] K. K. Bonthu et al., “A survey paper on emerging techniques used to translate audio or text to sign language,” 2023 Int. Conf. on Advances in Electronics, Communication, Computing and Intelligent Information Systems (ICAECIS), pp. 33-37, 2023. https://doi.org/10.1109/ICAECIS58353.2023.10170257
- [2] D. Ratnaningsih et al., “The influence of computer assisted language learning (Call) to improve English speaking skills,” Res. Soc. Dev., Vol.8, No.10, Article No.e438101413, 2019. https://doi.org/10.33448/rsd-v8i10.1413
- [3] R. Duan, Y. Wang, and H. Qin, “Artificial intelligence speech recognition model for correcting spoken English teaching,” J. Intell. Fuzzy Syst., Vol.40, No.2, pp. 3513-3524, 2021. https://doi.org/10.3233/JIFS-189388
- [4] H. Qian et al., “Intelligent model for speech recognition based on SVM: A case study on English language,” J. Intell. Fuzzy Syst., Vol.40, No.2, pp. 2721-2731, 2021. https://doi.org/10.3233/JIFS-189314
- [5] X. Lu and M. A. Shah, “Implementation of embedded unspecific continuous English speech recognition based on HMM,” Recent Adv. Electr. Electron. Eng., Vol.14, No.6, pp. 649-659, 2021. https://doi.org/10.2174/2352096514666210715144717
- [6] A. F. Isnawati and J. Hendry, “Implementasi filter Pre-Emphasis untuk transmisi sinyal audio pada sistem komunikasi FBMC-OQAM,” J. Nas. Tek. Elektro Teknol. Inf., Vol.8, No.4, pp. 340-346, 2019 (in Indonesian).
- [7] Y. Wu and G. Li, “Intelligent robot english speech recognition method based on online database,” J. Inf. Knowl. Manag., Vol.21, No.Supp02, Article No.2240012, 2022. https://doi.org/10.1142/S0219649222400123
- [8] S. Bijwadia et al., “Unified end-to-end speech recognition and endpointing for fast and efficient speech systems,” 2022 IEEE Spoken Language Technology Workshop (SLT), pp. 310-316, 2023. https://doi.org/10.1109/SLT54892.2023.10022338
- [9] T. Han et al., “Auditory perception speech signal endpoint feature detection based on temporal structure,” J. Jilin Univ. (Eng. Technol. Ed.), Vol.49, No.1, pp. 313-318, 2019. https://doi.org/10.13229/j.cnki.jdxbgxb20171102
- [10] N. Jiang and T. Liu, “An improved speech segmentation and clustering algorithm based on SOM and K-means,” Math. Probl. Eng., Vol.2020, Article No.3608286, 2020. https://doi.org/10.1155/2020/3608286
- [11] H. A. Elharati, M. Alshaari, and V. Z. Këpuska, “Arabic speech recognition system based on MFCC and HMMs,” J. Comput. Commun., Vol.8, No.3, pp. 28-34, 2020. https://doi.org/10.4236/jcc.2020.83003
- [12] N. Zerari et al., “Bidirectional deep architecture for Arabic speech recognition,” Open Comput. Sci., Vol.9, No.1, pp. 92-102, 2019. https://doi.org/10.1515/comp-2019-0004
- [13] N.-S. Pai et al., “Dual-input control interface for deep neural network based on image/speech recognition,” Sens. Mater., Vol.31, No.11, pp. 3451-3463, 2019. https://doi.org/10.18494/SAM.2019.2481
- [14] D. Palaz, M. Magimai-Doss, and R. Collobert, “End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition,” Speech Commun., Vol.108, pp. 15-32, 2019. https://doi.org/10.1016/j.specom.2019.01.004
- [15] B. R. Prasad and N. Deepa, “Classification of analyzed text in speech recognition using RNN-LSTM in comparison with convolutional neural network to improve precision for identification of keywords,” Rev. Gest. Inov. Tecnol., Vol.11, No.2, pp. 1097-1108, 2021.
- [16] K. Chouhan et al., “Speech recognition classification with ANN implementation using machine learning algorithm,” Linguist. Antverp., Vol.2021, No.1, pp. 2785-2796, 2021.
- [17] Z. Ning, “Research on handwritten Chinese character recognition based on BP neural network,” Mod. Electron. Technol., Vol.6, No.1, pp. 12-32, 2022. https://doi.org/10.26549/met.v6i1.11359
- [18] M. I. Khattak et al., “Regularized sparse features for noisy speech enhancement using deep neural networks,” Comput. Electr. Eng., Vol.100, Article No.107887, 2022. https://doi.org/10.1016/j.compeleceng.2022.107887
- [19] B. Cudequest, “Audio and Speech Processing with MATLAB, 1st Ed.,” J. Audio Eng. Soc., Vol.68, pp. 690-693, 2020.
- [20] L. A. Kumar et al., “Deep learning based assistive technology on audio visual speech recognition for hearing impaired,” Int. J. Cogn. Comput. Eng., Vol.3, pp. 24-30, 2022. https://doi.org/10.1016/j.ijcce.2022.01.003
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.