Research Paper:
Small Object Detection Algorithm Based on Improved Attention Mechanism and Feature Fusion of YOLOv8
Mingxing Fang*1,*2,*3,, Xinyu Rui*1, Hongyu Cheng*1, Xinke Liu*1, Jinhua She*4, Youwu Du*5, and Haoran Tan*6
*1School of Physics and Electronic Information, Anhui Normal University
No.189 Jiuhua South Road, Yijiang District, Wuhu, Anhui 241002, China
*2Anhui Engineering Research Center on Information Fusion and Control of Intelligent Robot
No.189 Jiuhua South Road, Yijiang District, Wuhu, Anhui 241002, China
*3Anhui Provincial Joint Key Laboratory on Information Fusion of Intelligent Automotive Cabin
No.189 Jiuhua South Road, Yijiang District, Wuhu, Anhui 241002, China
*4School of Engineering, Tokyo University of Technology
1404-1 Katakuramachi, Hachioji, Tokyo 192-0982, Japan
*5School of Electrical and Information Engineering, Jiangsu University of Technology
No.1801 Zhongwu Road, Changzhou, Jiangsu 213001, China
*6National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University
No.2 Lushan South Road, Yuelu District, Changsha, Hunan 410082, China
Corresponding author
Addressing the challenges in small object detection, particularly the issue that small objects often suffer from a lack of sufficient semantic information and are highly susceptible to background noise, this paper proposes an innovative algorithm, namely YOLOv8-FE. Firstly, to enhance the network’s sensitivity to small object detection, a P2-scale detection layer specifically designed for small objects is integrated into the model. Secondly, addressing the potential information loss during downsampling in traditional convolutional layers, an innovative downsampling module named RFAC-SPD is designed, aiming to more effectively capture and utilize features of small objects, thereby assisting the model in improving performance. Additionally, to mitigate the interference from background noise and strengthen the network’s ability to focus on object information, the study builds the C2f-CBAM module based on the convolutional block attention module (CBAM). Moreover, to fully integrate low-level feature information, minimize the loss of underlying detail information, and further enhance the network’s representational capability, an enhanced path aggregation network is proposed, significantly improving the effectiveness of network feature fusion. Experiments on the dataset VisDrone2019 show that the YOLOv8-FE algorithm exhibits superior performance and detection efficiency. Compared to the baseline algorithm YOLOv8n, its mAP50 and mAP50-95 have increased by 8.3% and 5.3%, respectively. Furthermore, with an inference speed of 77 frames per second, YOLOv8-FE meets real-time requirements, thereby validating the advancement and effectiveness of the proposed improvement algorithm. Furthermore, generalization experiments conducted on the DOTA and Caltech Pedestrian datasets demonstrate that the improved model achieves an increase of 2.7% and 6.8% in mAP50, respectively, fully validating the generality of the proposed model.
- [1] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” Proc. of the 2001 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. I-511-I-518, 2001. https://doi.org/10.1109/CVPR.2001.990517
- [2] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Vol.1, pp. 886-893, 2005. https://doi.org/10.1109/CVPR.2005.177
- [3] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” 2014 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 580-587, 2014. https://doi.org/10.1109/CVPR.2014.81
- [4] R. Girshick, “Fast R-CNN,” 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 1440-1448, 2015. https://doi.org/10.1109/ICCV.2015.169
- [5] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.39, No.6, pp. 1137-1149, 2017. https://doi.org/10.1109/TPAMI.2016.2577031
- [6] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” 2017 IEEE Int. Conf. on Computer Vision, pp. 2980-2988, 2017. https://doi.org/10.1109/ICCV.2017.322
- [7] W. Liu et al., “SSD: Single shot MultiBox detector,” Proc. of the 14th European Conf. on Computer Vision, Part 1, pp. 21-37, 2016. https://doi.org/10.1007/978-3-319-46448-0_2
- [8] T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” Proc. of the 13th European Conf. on Computer Vision, Part 5, pp. 740-755, 2014. https://doi.org/10.1007/978-3-319-10602-1_48
- [9] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.37, No.9, pp. 1904-1916, 2015. https://doi.org/10.1109/TPAMI.2015.2389824
- [10] T.-Y. Lin et al., “Feature pyramid networks for object detection,” 2017 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 936-944, 2017. https://doi.org/10.1109/CVPR.2017.106
- [11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” 2016 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 779-788, 2016. https://doi.org/10.1109/CVPR.2016.91
- [12] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” 2017 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 6517-6525, 2017. https://doi.org/10.1109/CVPR.2017.690
- [13] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv:1804.02767, 2018. https://doi.org/10.48550/arXiv.1804.02767
- [14] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv:2004.10934, 2020. https://doi.org/10.48550/arXiv.2004.10934
- [15] C. Li et al., “YOLOv6: A single-stage object detection framework for industrial applications,” arXiv:2209.02976, 2022. https://doi.org/10.48550/arXiv.2209.02976
- [16] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 7464-7475, 2023. https://doi.org/10.1109/CVPR52729.2023.00721
- [17] Q. Wang et al., “M2YOLOF: Based on effective receptive fields and multiple-in-single-out encoder for object detection,” Expert Systems with Applications, Vol.213, Part A, Article No.118928, 2023. https://doi.org/10.1016/j.eswa.2022.118928
- [18] R. Jing et al., “An effective method for small object detection in low-resolution images,” Engineering Applications of Artificial Intelligence, Vol.127, Part A, Article No.107206, 2024. https://doi.org/10.1016/j.engappai.2023.107206
- [19] Z. Zhang, H.-H. Yi, and J. Zheng, “Focusing on small objects detector in aerial images,” Acta Electronica Sinica, Vol.51, No.4, pp. 944-955, 2023 (in Chinese).
- [20] S.-J. Ji, Q.-H. Ling, and F. Han, “An improved algorithm for small object detection based on YOLOv4 and multi-scale contextual information,” Computers and Electrical Engineering, Vol.105, Article No.108490, 2023. https://doi.org/10.1016/j.compeleceng.2022.108490
- [21] X. Zhu, S. Lyu, X. Wang, and Q. Zhao, “TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios,” 2021 IEEE/CVF Int. Conf. on Computer Vision Workshops, pp. 2778-2788, 2021. https://doi.org/10.1109/ICCVW54120.2021.00312
- [22] J. Li et al., “A novel small object detection algorithm for UAVs based on YOLOv5,” Physica Scripta, Vol.99, No.3, Article No.036001, 2024. https://doi.org/10.1088/1402-4896/ad2147
- [23] X. Wang et al., “YOLO-ERF: Lightweight object detector for UAV aerial images,” Multimedia Systems, Vol.29, No.6, pp. 3329-3339, 2023. https://doi.org/10.1007/s00530-023-01182-y
- [24] L. Meng, L. Zhou, and Y. Liu, “SODCNN: A convolutional neural network model for small object detection in drone-captured images,” Drones, Vol.7, No.10, Article No.615, 2023. https://doi.org/10.3390/drones7100615
- [25] Y. Yang, X. Gao, Y. Wang, and S. Song, “VAMYOLOX: An accurate and efficient object detection algorithm based on visual attention mechanism for UAV optical sensors,” IEEE Sensors J., Vol.23, No.11, pp. 11139-11155, 2023. https://doi.org/10.1109/JSEN.2022.3219199
- [26] Y. Wang and J. Zhang, “A modified YOLOv8 algorithm for UAV small object detection,” J. of Detection & Control, 2024 (in Chinese).
- [27] J. Sun et al., “Optimized and improved YOLOv8 target detection algorithm from UAV perspective,” Computer Engineering and Applicationsa, Vol.61, No.1, pp. 109-120, 2025 (in Chinese).
- [28] H. Wang et al., “A remote sensing image target detection algorithm based on improved YOLOv8,” Applied Sciences, Vol.14, No.4, Article No.1557, 2024. https://doi.org/10.3390/app14041557
- [29] M. Li et al., “MST-YOLO: Small object detection model for autonomous driving,” Sensors, Vol.24, No.22, Article No.7347, 2024. https://doi.org/10.3390/s24227347
- [30] X. Zhao and Y. Chen, “YOLO-DroneMS: Multi-scale object detection network for unmanned aerial vehicle (UAV) images,” Drones, Vol.8, No.11, Article No.609, 2024. https://doi.org/10.3390/drones8110609
- [31] T. Ning, W. Wu, and J. Zhang, “Small object detection based on YOLOv8 in UAV perspective,” Pattern Analysis and Applications, Vol.27, No.3, Article No.103, 2024. https://doi.org/10.1007/s10044-024-01323-7
- [32] H. Yi, B. Liu, B. Zhao, and E. Liu, “Small object detection algorithm based on improved YOLOv8 for remote sensing,” IEEE J. of Selected Topics in Applied Earth Observations and Remote Sensing, Vol.17, pp. 1734-1747, 2024. https://doi.org/10.1109/JSTARS.2023.3339235
- [33] S. Wu, X. Lu, C. Guo, and H. Guo, “Accurate UAV small object detection based on HRFPN and EfficentVMamba,” Sensors, Vol.24, No.15, Article No.4966, 2024. https://doi.org/10.3390/s24154966
- [34] Z. Zhang, “Drone-YOLO: An efficient neural network method for target detection in drone images,” Drones, Vol.7, No.8, Article No.526, 2023. https://doi.org/10.3390/drones7080526
- [35] R. Sunkara and T. Luo, “No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects,” Proc. of the European Conf. on Machine Learning and Knowledge Discovery in Databases, pp. 443-459, 2023. https://doi.org/10.1007/978-3-031-26409-2_27
- [36] X. Zhang et al., “RFAConv: Innovating spatial attention and standard convolutional operation,” arXiv:2304.03198, 2023. https://doi.org/10.48550/arXiv.2304.03198
- [37] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” Proc. of the 15th European Conf. on Computer Vision, Part 7, pp. 3-19, 2018. https://doi.org/10.1007/978-3-030-01234-2_1
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.