Research Paper:
An Improved Parallel Clustering Method Based on K-Means for Electricity Consumption Patterns
Yuehua Yang and Yun Wu
School of Computer Science, Northeast Electric Power University
No.169 Changchun Road, Chuanying District, Jilin, Jilin 132012, China
Corresponding author
Electricity consumption pattern recognition is the foundation of intelligent electricity distribution data analysis. However, as the scale of electricity consumption data increases, traditional clustering analysis methods encounter bottlenecks such as low computation speed and processing efficiency. To meet the efficient mining needs of massive electricity consumption data, in this paper a parallel processing method of the density-based k-means clustering is presented. First, an initial cluster center selection method based on data sample density is proposed to avoid inaccurate initial cluster center point selection, leading to clustering falling into local optima. The dispersion degree of the data samples within the cluster is also used as an important reference for determining the number of clusters. Subsequently, parallelization of density calculation and clustering for data samples were achieved based on the MapReduce model. Through experiments conducted on Hadoop clusters, it has been shown that the proposed parallel processing method is efficient and feasible, and can provide favorable support for intelligent power allocation decisions.
- [1] K. Zhou et al., “Industrial park electric power load pattern recognition: An ensemble clustering-based framework,” Energy and Buildings, Vol.279, Article No.112687, 2023. https://doi.org/10.1016/j.enbuild.2022.112687
- [2] F. Ulloa-Vásquez et al., “Intelligent electrical pattern recognition of appliances consumption for home energy management using high resolution measurement,” IEEE Latin America Trans., Vol.20, No.2, pp. 326-334, 2022. https://doi.org/10.1109/TLA.2022.9661473
- [3] X. Cheng et al., “Short-term fast forecasting based on family behavior pattern recognition for small-scale users load,” Cluster Computing, Vol.25, No.3, pp. 2107-2123, 2022. https://doi.org/10.1007/s10586-021-03362-9
- [4] K. Yu et al., “Residential load forecasting based on electricity consumption pattern clustering,” Frontiers in Energy Research, Vol.10, Article No.1113733, 2023. https://doi.org/10.3389/fenrg.2022.1113733
- [5] M. B. Rasheed and M. D. R-Moreno, “Minimizing pricing policies based on user load profiles and residential demand responses in smart grids,” Applied Energy, Vol.310, Article No.118492, 2022. https://doi.org/10.1016/j.apenergy.2021.118492
- [6] J. Yang et al., “A model of customizing electricity retail prices based on load profile clustering analysis,” IEEE Trans. on Smart Grid, Vol.10, No.3, pp. 3374-3386, 2019. https://doi.org/10.1109/TSG.2018.2825335
- [7] P. Panagiotidis, A. Effraimis, and G. A. Xydis, “An R-based forecasting approach for efficient demand response strategies in autonomous micro-grids,” Energy & Environment, Vol.30, No.1, pp. 63-80, 2019. https://doi.org/10.1177/0958305X18787259
- [8] A. Rajabi et al., “A pattern recognition methodology for analyzing residential customers load data and targeting demand response applications,” Energy and Buildings, Vol.203, Article No.109455, 2019. https://doi.org/10.1016/j.enbuild.2019.109455
- [9] X. Liu et al., “A data mining-based framework for the identification of daily electricity usage patterns and anomaly detection in building electricity consumption data,” Energy and Buildings, Vol.231, Article No.110601, 2021. https://doi.org/10.1016/j.enbuild.2020.110601
- [10] C. Yan et al., “Adaptive electricity theft detection method based on load shape dictionary of customers,” Global Energy Interconnection, Vol.5, No.1, pp. 108-117, 2022. https://doi.org/10.1016/j.gloei.2022.04.009
- [11] H. Rafiq et al., “Analysis of residential electricity consumption patterns utilizing smart-meter data: Dubai as a case study,” Energy and Buildings, Vol.291, Article No.113103, 2023. https://doi.org/10.1016/j.enbuild.2023.113103
- [12] L. Botman et al., “A scalable ensemble approach to forecast the electricity consumption of households,” IEEE Trans. on Smart Grid, Vol.14, No.1, pp. 757-768, 2023. https://doi.org/10.1109/TSG.2022.3191399
- [13] K. Zhou, S. Yang, and Z. Shao, “Household monthly electricity consumption pattern mining: A fuzzy clustering-based model and a case study,” J. of Cleaner Production, Vol.141, pp. 900-908, 2017. https://doi.org/10.1016/j.jclepro.2016.09.165
- [14] A. M. Alonso et al., “Hierarchical clustering for smart meter electricity loads based on quantile autocovariances,” IEEE Trans. on Smart Grid, Vol.11, No.5, pp. 4522-4530, 2020. https://doi.org/10.1109/TSG.2020.2991316
- [15] T. Yang, M. Ren, and K. Zhou, “Identifying household electricity consumption patterns: A case study of Kunshan, China,” Renewable and Sustainable Energy Reviews, Vol.91, pp. 861-868, 2018. https://doi.org/10.1016/j.rser.2018.04.037
- [16] X. Zhang et al., “Electricity consumption pattern analysis beyond traditional clustering methods: A novel self-adapting semi-supervised clustering method and application case study,” Applied Energy, Vol.308, Article No.118335, 2022. https://doi.org/10.1016/j.apenergy.2021.118335
- [17] Y. Wang et al., “Federated clustering for electricity consumption pattern extraction,” IEEE Trans. on Smart Grid, Vol.13, No.3, pp. 2425-2439, 2022. https://doi.org/10.1109/TSG.2022.3146489
- [18] J. Li et al., “Survey of cluster analysis and its application in power system,” Modern Electric Power, Vol.36, No.3, pp. 1-10, 2019. https://doi.org/10.19725/j.cnki.1007-2322.20181130.001
- [19] L. Hao, T. Wang, and C. Guo, “Research on parallel association rule mining of big data based on an improved K-means clustering algorithm,” Int. J. of Autonomous and Adaptive Communications Systems, Vol.16, No.3, pp. 233-247, 2023. https://doi.org/10.1504/IJAACS.2023.131622
- [20] Y. Mao et al., “A MapReduce-based K-means clustering algorithm,” The J. of Supercomputing, Vol.78, No.4, pp. 5181-5202, 2022. https://doi.org/10.1007/s11227-021-04078-8
- [21] Apache Hadoop. https://hadoop.apache.org/ [Accessed July 1, 2022]
- [22] J. Dittrich and J.-A. Quiané-Ruiz, “Efficient big data processing in Hadoop MapReduce,” Proc. of the VLDB Endowment, Vol.5, No.12, pp. 2014-2015, 2012.
- [23] L. Wen, K. Zhou, and S. Yang, “A shape-based clustering method for pattern recognition of residential electricity consumption,” J. of Cleaner Production, Vol.212, pp. 475-488, 2019. https://doi.org/10.1016/j.jclepro.2018.12.067
- [24] Z. Gu et al., “A controllable clustering model of the electrical load curve based on variational mode decomposition and fast search of the density peak,” Power System Protection and Control, Vol.49, No.8, pp. 118-127, 2021 (in Chinese).
- [25] C. H. Jin et al., “A SOM clustering pattern sequence-based next symbol prediction method for day-ahead direct electricity load and price forecasting,” Energy Conversion and Management, Vol.90, pp. 84-92, 2015. https://doi.org/10.1016/j.enconman.2014.11.010
- [26] I. P. Panapakidis and G. C. Christoforidis, “Optimal selection of clustering algorithm via multi-criteria decision analysis (MCDA) for load profiling applications,” Applied Sciences, Vol.8, No.2, Article No.237, 2018. https://doi.org/10.3390/app8020237
- [27] M. Chen, M. Cao, and Y. Wen, “Cloud-based massive electricity data mining and consumption pattern discovery,” Web Information Systems Engineering (WISE) 2013 Proc., pp. 213-227, 2014. https://doi.org/10.1007/978-3-642-54370-8_18
- [28] E. Correa, E. Inga, J. Inga, and R. Hincapié, “Electrical consumption pattern base on meter data management system using big data techniques,” 2017 Int. Conf. on Information Systems and Computer Science (INCISCOS), pp. 334-339, 2017. https://doi.org/10.1109/INCISCOS.2017.19
- [29] S. Zhang et al., “Cloud computing-based analysis on residential electricity consumption behavior,” Power System Technology, Vol.37, No.6, pp. 1542-1546, 2013 (in Chinese). https://doi.org/10.13335/j.1000-3673.pst.2013.06.010
- [30] R. Pérez-Chacón et al., “Big data analytics for discovering electricity consumption patterns in smart cities,” Energies, Vol.11, No.3, Article No.683, 2018. https://doi.org/10.3390/en11030683
- [31] Y. Samadi, M. Zbakh, and C. Tadonki, “Performance comparison between Hadoop and Spark frameworks using HiBench benchmarks,” Concurrency and Computation: Practice and Experience, Vol.30, No.12, Article No.e4367, 2018. https://doi.org/10.1002/cpe.4367
- [32] Y. Benlachmi, A. E. Yazidi, and M. L. Hasnaoui, “A comparative analysis of Hadoop and Spark frameworks using word count algorithm,” Int. J. of Advanced Computer Science and Applications, Vol.12, No.4, pp. 778-788, 2021. https://doi.org/10.14569/IJACSA.2021.0120495
- [33] A. A. M. Jamel and B. Akay, “A survey and systematic categorization of parallel K-means and fuzzy-C-means algorithms,” Computer Systems Science and Engineering, Vol.34, No.5, pp. 259-281, 2019. https://doi.org/10.32604/csse.2019.34.259
- [34] W. Lu, “Improved K-means clustering algorithm for big data mining under Hadoop parallel framework,” J. of Grid Computing, Vol.18, No.2, pp. 239-250, 2020. https://doi.org/10.1007/s10723-019-09503-0
- [35] P. Liu et al., “Hybrid features based K-means clustering algorithm for use in electricity customer load pattern analysis,” 2018 37th Chinese Control Conf. (CCC), pp. 8851-8857, 2018. https://doi.org/10.23919/ChiCC.2018.8483451
- [36] B. Nepal et al., “Analysis of building electricity use pattern using K-means clustering algorithm by determination of better initial centroids and number of clusters,” Energies, Vol.12, No.12, Article No.2451, 2019. https://doi.org/10.3390/en12122451
- [37] J. Yang and C. Zhao, “Survey on K-means clustering algorithm,” Computer Engineering and Applications, Vol.55, No.23, pp. 7-14+63, 2019 (in Chinese).
- [38] A. Trindade, “ElectricityLoadDiagrams20112014,” 2015. https://doi.org/10.24432/C58C86
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.