Paper:
An Improved Fully Convolutional Network Based on Post-Processing with Global Variance Equalization and Noise-Aware Training for Speech Enhancement
Wenlong Li, Kaoru Hirota, Yaping Dai, and Zhiyang Jia
School of Automation, Beijing Institute of Technology
No.5 Zhongguancun South Street, Haidian District, Beijing 100081, China
Corresponding author
An improved fully convolutional network based on post-processing with global variance (GV) equalization and noise-aware training (PN-FCN) for speech enhancement model is proposed. It aims at reducing the complexity of the speech improvement system, and it solves overly smooth speech signal spectrogram problem and poor generalization capability. The PN-FCN is fed with the noisy speech samples augmented with an estimate of the noise. In this way, the PN-FCN uses additional online noise information to better predict the clean speech. Besides, PN-FCN uses the global variance information, which improve the subjective score in a voice conversion task. Finally, the proposed framework adopts FCN, and the number of parameters is one-seventh of deep neural network (DNN). Results of experiments on the Valentini-Botinhaos dataset demonstrate that the proposed framework achieves improvements in both denoising effect and model training speed.
- [1] S. R. Park and J. Lee, “A Fully Convolutional Neural Network for Speech Enhancement,” Proc. of INTERSPEECH, pp. 1993-1997, 2017.
- [2] Y. Xu, J. Du, L. R. Dai et al., “A Regression Approach to Speech Enhancement Based on Deep Neural Networks,” IEEE/ACM Trans. on Audio, Speech, and Language Processing, Vol.23, No.1, pp. 7-19, 2015.
- [3] X. Lu, Y. Tsao, Y. Tsao, S. Matsuda, and C. Hori, “Speech enhancement based on deep denoising autoencoder,” Proc. of INTERSPEECH, pp. 436-440, 2013.
- [4] S. Pascual, A. Bonafonte, and S. Joan, “SEGAN: Speech Enhancement Generative Adversarial Network,” Proc. of INTERSPEECH, pp. 3642-3646, 2017.
- [5] S. Parveen and P. Green, “Speech enhancement with missing data techniques using recurrent neural networks,” Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 733-736, 2004.
- [6] C. Valentini-Botinhao, “Noisy speech database for training speech enhancement algorithms and TTS models,” University of Edinburgh, School of Informatics, Centre for Speech Technology Research (CSTR), doi: 10.7488/ds/1356, 2016.
- [7] Y. Xu, J. Du, L. R. Dai et al., “An Experimental Study on Speech Enhancement Based on Deep Neural Networks,” IEEE Signal Processing Letters, Vol.21, No.1, pp. 65-68, 2014.
- [8] K. Paliwal, B. Schwerin, and K. Wjcicki, “Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator,” Speech Communication, Vol.54, No.2 pp. 282-305, 2012.
- [9] “P.862.2: Wideband extension to Recommendation P.862 for the assessment of wideband telephone networks and speech codecs,” ITUT Std. P.862.2, 2007.
- [10] Y. Hu and P. C. Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Trans. on Audio, Speech, and Language Processing, Vol.16, No.1, pp. 229-238, Jan. 2008.
- [11] Y. Xu, J. Du, L. R. Dai et al., “An Experimental Study on Speech Enhancement Based on Deep Neural Networks,” IEEE Signal Processing Letters, Vol.21, No.1, pp. 65-68, 2014.
- [12] Z. Chen, S. Watanabe, H. Erdogan, and J. R. Hershey, “Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks,” Proc. of INTERSPEECH, pp. 3274-3278, 2015.
- [13] B. Xia and C. Bao, “Wiener filtering based speech enhancement with Weighted Denoising Auto-encoder and noise classification,” Speech Communication, Vol.60, pp. 13-29, 2014.
- [14] Y. S. Park and S. M. Lee, “Speech enhancement through voice activity detection using speech absence probability based on Teager energy,” J. of Central South University, Vol.20, No.2, pp. 424-432, 2013.
- [15] D. Michelsanti and Z.-H. Tan, “Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification,” Proc. of INTERSPEECH, pp. 2008-2012, 2017.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.