Research Paper:
Reconstruction of Missing Data Completely at Random for Trains Based on Improved GAN
Jing He*
, Xin Chen**, and Changfan Zhang**,

*College of Electrical and Information Engineering, Hunan University of Technology
Taishan West Road, Tianyuan District, Zhuzhou, Hunan 412007, China
**College of Railway Transportation, Hunan University of Technology
Taishan West Road, Tianyuan District, Zhuzhou, Hunan 412007, China
Corresponding author
Reconstruction of missing data for heavy-haul trains is critical to ensuring safe train operation. However, existing generative model training methods require a complete dataset, making it difficult for them to address the issue of missing data completely at random. To address this issue, this study proposes a new attention-generative adversarial network to reconstruct missing data. First, a mask matrix is designed to locate the missing data, and the gradient descent algorithm is applied in combination with the output probability matrix of the discriminator so that the mask matrix can still fill up the data well in the case of an incomplete data set. Subsequently, the prompt matrix is derived based on the mask matrix to solve the problem of model overfitting and accelerate the convergence. Finally, an attention mechanism is introduced into the entire generative adversarial network to improve the expression of data features using the feature extraction network. The experimental results show that the mean square error and mean absolute error percentage indexes of reconstruction accuracy can be maintained below 1.5 for measurement data at different missing rates, and the reconstructed data can also well conform to the distribution law of measurement data.

Missing data imputation
- [1] Z. Ma, H. Li, Y. Weng, E. Blasch, and X. Zheng, “Hd-Deep-EM: Deep expectation maximi-zation for dynamic hidden state recovery using heterogeneous data,” IEEE Trans. on Power Systems, Vol.39, No.22, pp. 3575-3587, 2024. https://doi.org/10.1109/TPWRS.2023.3288005
- [2] A. Aboutorabi and M. Brockmann, “Vehicle axle acceleration prediction: An interpolation approach,” 2024 IEEE 18th Int. Conf. on Advanced Motion Control (AMC), 2024. https://doi.org/10.1109/AMC58169.2024.10505675
- [3] A. B. P. Utama, A. P. Wibawa, A. N. Handayani, W. S. G. Irianto, Aripriharta, and A. Nyoto, “Improving time-series forecasting performance using imputation techniques in deep learning,” 2024 Int. Conf. on Smart Computing, IoT and Machine Learn-ing (SIML), pp. 232-238, 2024. https://doi.org/10.1109/SIML61815.2024.10578273
- [4] H.-Y. Sun, Y.-L. Li, Y.-F. Zi, and X. Han, “Accelerating EM missing data filling algorithm based on the K-Means,” 2018 4th Annual Int. Conf. on Network and Information Systems for Computers (ICNISC), pp. 401-406, 2018.
- [5] J. E. Smerdon, A. Kaplan, and D. Chang, “On the origin of the standardization sensitivity in RegEM climate field reconstructions,” J. of Climate, Vol.21, No.24, pp. 6710-6723, 2008. https://doi.org/10.1175/2008JCLI2182.1
- [6] M. Pazhoohesh, Z. Pourmirza, and S. Walker, “A comparison of methods for missing data treatment in building sensor data,” 2019 IEEE 7th Int. Conf. on Smart Energy Grid Engineering (SEGE), pp. 255-259, 2019. https://doi.org/10.1109/SEGE.2019.8859963
- [7] H. Yang, J. He, Z. Liu, and C. Zhang, “LLD-MFCOS: A multiscale anchor-free detector based on label localization distillation for wheelset tread defect detection,” IEEE Trans. on Instrumentation and Measurement, Vol.73, Article No.5003815, 2024. https://doi.org/10.1109/TIM.2023.3316214
- [8] X. Kong, W. Zhou, G. Shen, W. Zhang, N. Liu, and Y. Yang, “Dynamic graph convolutional recurrent imputation network for spatiotemporal traffic missing data,” Knowledge-Based Systems, Vol.261, Article No.110188, 2023. https://doi.org/10.1016/j.knosys.2022.110188
- [9] X. Liu and Z. Zhang, “A two-stage deep autoencoder-based missing data imputation method for wind farm SCADA data,” IEEE Sensors J., Vol.21, No.9, pp. 10933-10945, 2021. https://doi.org/10.1109/JSEN.2021.3061109
- [10] Y. Fan, C. Feng, R. Wu, C. Liu, and D. Jiang, “Multiscale-attention masked autoencoder for missing data imputation of wind turbines,” Knowledge-Based Systems, Vol.299, Article No.112114, 2024. https://doi.org/10.1016/j.knosys.2024.112114
- [11] Z. Sun, H. Li, W. Wang, J. Liu, and X. Liu, “DTIN: Dual transformer-based imputation nets for multivariate time series emitter missing data,” Knowledge-Based Systems, Vol.284, 2024.
- [12] W. Du, D. Cote, and Y. Liu, “SAITS: Self-attention-based imputation for time series,” Expert Systems with Applications, Vol.219, Issue C, 2023. https://doi.org/10.1016/j.knosys.2023.111270
- [13] R. Shahbazian and S. Greco, “Generative adversarial networks assist missing data imputa-tion: A comprehensive survey and evaluation,” IEEE Access, Vol.11, pp. 88908-88928, 2023. https://doi.org/10.1109/ACCESS.2023.3306721
- [14] C. Zhang, H. Chen, J. He, and H. Yang, “Reconstruction method for missing measurement data based on wasserstein generative adversarial network,” J. Adv. Comput. Intell. Intell. Inform., Vol.25, No.2, pp. 195-203, 2021. https://doi.org/10.20965/jaciii.2021.p0195
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.