YOLO-CF: An Object Detection Model that Improves Feature Expression at Both Coarse-Grained and Fine-Grained Levels for Industrial Surface Image Defect Detection

Zhaowei Sun; Jintao Chen; Cong Lin; Xuebin Yue; Kuozhan Wang; Lin Meng

doi:10.20965/jaciii.2026.p0825

single-jc.php

« previous

JACIII Vol.30 No.3 pp. 825-838

(2026)

doi: 10.20965/jaciii.2026.p0825

Research Paper:

Views over last 60 days: 1,789

YOLO-CF: An Object Detection Model that Improves Feature Expression at Both Coarse-Grained and Fine-Grained Levels for Industrial Surface Image Defect Detection

Zhaowei Sun^1,†, Jintao Chen^2, Cong Lin^3, Xuebin Yue^4 , Kuozhan Wang^4, and Lin Meng^5

^*1Henan Huadong Industrial Control Technology Company Ltd.
No.5 Wutong West Street, High-New Technology Development Zone, Zhengzhou, Henan 450000, China

^†Corresponding author

^*2The First Affiliated Hospital of Henan University of Chinese Medicine
No.19 Renmin Road, Jinshui, Zhengzhou, Henan 450046, China

^*3Hanwei Electronics Group Corporation
No.169 Xuesong Road, National High-Tech Zone, Zhengzhou, Henan 450001, China

^*4School of Automation and Electrical Engineering, Zhongyuan University of Technology
No.41 Zhongyuan Middle Road, Zhongyuan, Zhengzhou, Henan 450007, China

^*5College of Science and Engineering, Ritsumeikan University
1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan

Received:

July 9, 2025

Accepted:

January 15, 2026

Published:

May 20, 2026

Keywords:

object detection, pooling operation, dimension reduction, multi-scale feature fusion

Abstract

In industrial surface defect detection, enhancing the model’s capability to express features at both coarse-grained and fine-grained levels is crucial. Accordingly, this paper proposes YOLO Coarse and Fine (YOLO-CF), a novel object detection model that significantly enhances feature expression by integrating innovative feature fusion strategies with an improved network architecture. YOLO-CF incorporates Res2Net and Res2Net block modules, notably improving the expression of fine-grained features without increasing computational complexity. Additionally, this model introduces a multi-scale feature fusion module, which seamlessly integrates coarse and fine-grained information by combining top-down and bottom-up pathways. This enhancement effectively expands the perceptual range and significantly improves the model’s generalization capability, making YOLO-CF a powerful tool for detecting diverse defects in complex industrial images. A hybrid downsampling module is introduced, combining max pooling, average pooling, and convolution operations with a stride of 2 to provide richer feature representations. In the GC10-DET dataset, YOLO-CF achieved a mean average precision (mAP) of 59.09%, surpassing the second-ranked RetinaNet by 3.41 percentage points. On the PCB, crack, NEU-DET and Dish-20 public datasets, YOLO-CF achieved mAPs of 97.38%, 84.72%, 73.35%, and 99.54%, at IoU =0.5, respectively. The experimental results indicate that by integrating feature extraction at both coarse-grained and fine-grained levels, YOLO-CF effectively enhances the model’s ability to detect objects of various sizes in complex scenes, demonstrating significant performance improvements. The code is available at http://www.ihpc.se.ritsumei.ac.jp/obidataset.html.

YOLO-CF dual-level feature framework

Full text

Cite this article as:

Z. Sun, J. Chen, C. Lin, X. Yue, K. Wang, and L. Meng, “YOLO-CF: An Object Detection Model that Improves Feature Expression at Both Coarse-Grained and Fine-Grained Levels for Industrial Surface Image Defect Detection,” J. Adv. Comput. Intell. Intell. Inform., Vol.30 No.3, pp. 825-838, 2026.

Data files:

References

[1] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 7464-7475, 2023. https://doi.org/10.1109/CVPR52729.2023.00721
[2] J. Qu, R. W. Liu, C. Zhao, Y. Guo, S. S.-D. Xu, F. Zhu, and Y. Lv, “Multi-task learning-enabled automatic vessel draft reading for intelligent maritime surveillance,” IEEE Trans. on Intelligent Transportation Systems, Vol.25, No.5, pp. 4552-4564, 2024. https://doi.org/10.1109/TITS.2023.3327824
[3] Y. Cai, T. Luan, H. Gao, H. Wang, L. Chen, Y. Li, M. A. Sotelo, and Z. Li, “YOLOv4-5D: An effective and efficient object detector for autonomous driving,” IEEE Trans. on Instrumentation and Measurement, Vol.70, pp. 1-13, 2021. https://doi.org/10.1109/TIM.2021.3065438
[4] N. Robinson, B. Tidd, D. Campbell, D. Kulić, and P. Corke, “Robotic vision for human–robot interaction and collaboration: A survey and systematic review,” ACM Trans. on Human-Robot Interaction, Vol.12, No.1, pp. 1-66, 2023. https://doi.org/10.1145/3570731
[5] Y. Ge, Z. Li, X. Yue, H. Li, Q. Li, and L. Meng, “IoT-based automatic deep learning model generation and the application on empty-dish recycling robots,” Internet of Things, Vol.25, Article No.101047, 2024. https://doi.org/10.1016/j.iot.2023.101047
[6] Y. Wang, S. Cang, and H. Yu, “A survey on wearable sensor modality centred human activity recognition in health care,” Expert Systems with Applications, Vol.137, pp. 167-190, 2019. https://doi.org/10.1016/j.eswa.2019.04.057
[7] Z. Li, Y. Ge, X. Wang, and L. Meng, “3D industrial anomaly detection via dual reconstruction network,” Applied Intelligence, Vol.54, pp. 9956-9970, 2024. https://doi.org/10.1007/s10489-024-05700-x
[8] Z. Li, Y. Ge, X. Yue, and L. Meng, “MCAD: Multi-classification anomaly detection with relational knowledge distillation,” Neural Computing and Applications, Vol.36, pp. 14543-14557, 2024. https://doi.org/10.1007/s00521-024-09838-0
[9] X. Yue, Z. Wang, R. Ishibashi, H. Kaneko, and L. Meng, “An unsupervised automatic organization method for Professor Shirakawa’s hand-notated documents of oracle bone inscriptions,” Int. J. on Document Analysis and Recognition, Vol.27, pp. 583-601, 2024. https://doi.org/10.1007/s10032-024-00463-0
[10] X. Yue, H. Li, Y. Fujikawa, and L. Meng, “Dynamic dataset augmentation for deep learning-based oracle bone inscriptions recognition,” J. Comput. Cult. Herit., Vol.15, No.4, 2022. https://doi.org/10.1145/3532868
[11] G. Wang, K. Jiang, K. Gu, H. Liu, H. Liu, and W. Zhang, “Coarse- and fine-grained fusion hierarchical network for hole filling in view synthesis,” IEEE Trans. on Image Processing, Vol.33, pp. 322-337, 2024. https://doi.org/10.1109/TIP.2023.3341303
[12] Y. Gan, F. Gao, J. Dong, and S. Chen, “Arbitrary-scale texture generation from coarse-grained control,” IEEE Trans. on Image Processing, Vol.31, pp. 5841-5855, 2022. https://doi.org/10.1109/TIP.2022.3201710
[13] X.-S. Wei, Y.-Z. Song, O. M. Aodha, J. Wu, Y. Peng, J. Tang, J. Yang, and S. Belongie, “Fine-grained image analysis with deep learning: A survey,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.44, No.12, pp. 8927-8948, 2022. https://doi.org/10.1109/TPAMI.2021.3126648
[14] Y. Lin, Y. Chang, X. Tong, J. Yu, A. Liotta, G. Huang, W. Song, D. Zeng, Z. Wu, Y. Wang, and W. Zhang, “A survey on RGB, 3D, and multimodal approaches for unsupervised industrial image anomaly detection,” Information Fusion, Vol.121, Article No.103139, 2025. https://doi.org/10.1016/j.inffus.2025.103139
[15] Z. Li, Y. Yan, X. Wang, Y. Ge, and L. Meng, “A survey of deep learning for industrial visual anomaly detection,” Artificial Intelligence Review, Vol.58, Article No.279, 2025. https://doi.org/10.1007/s10462-025-11287-7
[16] S. Gao, M. Cheng, K. Zhao, X. Zhang, M. Yang, and P. Torr, “Res2Net: A new multi-scale backbone architecture,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.43, No.2, pp. 652-662, 2021. https://doi.org/10.1109/TPAMI.2019.2938758
[17] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 580-587, 2014. https://doi.org/10.1109/CVPR.2014.81
[18] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. on pattern analysis and machine intelligence, Vol.39, No.6, pp. 1137-1149, 2016. https://doi.org/10.1109/TPAMI.2016.2577031
[19] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” Proc. of the IEEE Int. Conf. on Computer Vision (ICCV), Vol.42, No.2, pp. 318-327, 2017. https://doi.org/10.1109/TPAMI.2018.2858826
[20] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv:1804.02767, 2018. https://doi.org/10.48550/arXiv.1804.02767
[21] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv:2004.10934, 2020. https://doi.org/10.48550/arXiv.2004.10934
[22] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOx: Exceeding YOLO series in 2021,” arXiv:2107.08430, 2021. https://doi.org/10.48550/arXiv.2107.08430
[23] J. Liu, Q. Hou, Z. Liu, and M. Cheng, “PoolNet+: Exploring the potential of pooling for salient object detection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.45, No.1, pp. 887-904, 2023. https://doi.org/10.1109/TPAMI.2021.3140168
[24] Q. Hou, L. Zhang, M. Cheng, and J. Feng, “Strip pooling: Rethinking spatial pooling for scene parsing,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4002-4011, 2020. https://doi.org/10.1109/CVPR42600.2020.00406
[25] Y. Wu, Y. Liu, X. Zhan, and M. Cheng, “P2T: Pyramid pooling transformer for scene understanding,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.45, No.11, pp. 12760-12771, 2023. https://doi.org/10.1109/TPAMI.2022.3202765
[26] Z. Gao, L. Wang, and G. Wu, “LIP: Local importance-based pooling,” Proc. of the IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 3354-3363, 2019. https://doi.ieeecomputersociety.org/10.1109/ICCV.2019.00345
[27] H. Xu, J. Liu, Y. Shen, K. Lou, Y. Bao, R. Zhang, S. Zhou, H. Zhao, X. Zhu, and S. Wang, “Geometric pooling: Maintaining more representative information,” IEEE Access, Vol.12, pp. 54066-54072, 2024. https://doi.org/10.1109/ACCESS.2024.3387703
[28] Z. Chen, H. Ji, Y. Zhang, Z. Zhu, and Y. Li, “High-resolution feature pyramid network for small object detection on drone view,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.34, No.1, pp. 475-489, 2024. https://doi.org/10.1109/TCSVT.2023.3286896
[29] H. Li, H. Ma, Y. Che, and Z. Yang, “A two-way dense feature pyramid networks for object detection of remote sensing images,” Knowledge and Information Systems, Vol.65, No.11, pp. 4847-4871, 2023. https://doi.org/10.1007/s10115-023-01916-4
[30] J. Hu, C.-J. R. Shi, and J. Zhang, “Saliency-based YOLO for single target detection,” Knowledge and Information Systems, Vol.63, No.3, pp. 717-732, 2021. https://doi.org/10.1007/s10115-020-01538-0
[31] S. Xu, J. Fei, G. Zhao, X. Liu, and H. Li, “CCL-YOLO: Catenary components location based on YOLO and gather-distribute mechanism,” IEEE Access, Vol.13, pp. 9064-9072, 2025. https://doi.org/10.1109/ACCESS.2024.3403716
[32] X. Yue, H. Li, and L. Meng, “An ultralightweight object detection network for empty-dish recycling robots,” IEEE Trans. on Instrumentation and Measurement, Vol.72, pp. 1-12, 2023. https://doi.org/10.1109/TIM.2023.3241078
[33] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018. https://doi.org/10.1109/CVPR.2018.00913
[34] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.37, No.9, pp. 1904-1916, 2015. https://doi.org/10.1109/TPAMI.2015.2389824
[35] X. Yue and L. Meng, “YOLO-MSA: A multiscale stereoscopic attention network for empty-dish recycling robots,” IEEE Trans. on Instrumentation and Measurement, Vol.72, pp. 1-14, 2023. https://doi.org/10.1109/TIM.2023.3315355
[36] X. Yue, H. Li, M. Shimizu, S. Kawamura, and L. Meng, “YOLO-GD: A deep learning-based object detection algorithm for empty-dish recycling robots,” Machines, Vol.10, No.5, Article No.294, 2022. https://doi.org/10.3390/machines10050294
[37] X. Lv, F. Duan, J.-j. Jiang, X. Fu, and L. Gan, “Deep metallic surface defect detection: The new benchmark and detection network,” Sensors, Vol.20, No.6, Article No.1562, 2020. https://doi.org/10.3390/s20061562
[38] W. Huang and P. Wei, “A PCB dataset for defects detection and classification,” arXiv:1901.08204, 2019. https://doi.org/10.48550/arXiv.1901.08204
[39] K. Liu, X. Han, and B. M. Chen, “Deep learning based automatic crack detection and segmentation for unmanned aerial vehicle inspections,” 2019 IEEE Int. Conf. on Robotics and Biomimetics (ROBIO), pp. 381-387, 2019. https://doi.org/10.1109/ROBIO49542.2019.8961534
[40] Y. Bao, K. Song, J. Liu, Y. Wang, Y. Yan, H. Yu, and X. Li, “Triplet-graph reasoning network for few-shot metal generic surface defect segmentation,” IEEE Trans. on Instrumentation and Measurement, Vol.70, pp. 1-11, 2021. https://doi.org/10.1109/TIM.2021.3083561
[41] M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 10778-10787, 2020. https://doi.org/10.1109/CVPR42600.2020.01079
[42] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” 14th European Conf. on Computer Vision (ECCV), pp. 21-37, 2016. https://doi.org/10.1007/978-3-319-46448-0_2

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 7464-7475, 2023. https://doi.org/10.1109/CVPR52729.2023.00721

[B2] [2] J. Qu, R. W. Liu, C. Zhao, Y. Guo, S. S.-D. Xu, F. Zhu, and Y. Lv, “Multi-task learning-enabled automatic vessel draft reading for intelligent maritime surveillance,” IEEE Trans. on Intelligent Transportation Systems, Vol.25, No.5, pp. 4552-4564, 2024. https://doi.org/10.1109/TITS.2023.3327824

[B3] [3] Y. Cai, T. Luan, H. Gao, H. Wang, L. Chen, Y. Li, M. A. Sotelo, and Z. Li, “YOLOv4-5D: An effective and efficient object detector for autonomous driving,” IEEE Trans. on Instrumentation and Measurement, Vol.70, pp. 1-13, 2021. https://doi.org/10.1109/TIM.2021.3065438

[B4] [4] N. Robinson, B. Tidd, D. Campbell, D. Kulić, and P. Corke, “Robotic vision for human–robot interaction and collaboration: A survey and systematic review,” ACM Trans. on Human-Robot Interaction, Vol.12, No.1, pp. 1-66, 2023. https://doi.org/10.1145/3570731

[B5] [5] Y. Ge, Z. Li, X. Yue, H. Li, Q. Li, and L. Meng, “IoT-based automatic deep learning model generation and the application on empty-dish recycling robots,” Internet of Things, Vol.25, Article No.101047, 2024. https://doi.org/10.1016/j.iot.2023.101047

[B6] [6] Y. Wang, S. Cang, and H. Yu, “A survey on wearable sensor modality centred human activity recognition in health care,” Expert Systems with Applications, Vol.137, pp. 167-190, 2019. https://doi.org/10.1016/j.eswa.2019.04.057

[B7] [7] Z. Li, Y. Ge, X. Wang, and L. Meng, “3D industrial anomaly detection via dual reconstruction network,” Applied Intelligence, Vol.54, pp. 9956-9970, 2024. https://doi.org/10.1007/s10489-024-05700-x

[B8] [8] Z. Li, Y. Ge, X. Yue, and L. Meng, “MCAD: Multi-classification anomaly detection with relational knowledge distillation,” Neural Computing and Applications, Vol.36, pp. 14543-14557, 2024. https://doi.org/10.1007/s00521-024-09838-0

[B9] [9] X. Yue, Z. Wang, R. Ishibashi, H. Kaneko, and L. Meng, “An unsupervised automatic organization method for Professor Shirakawa’s hand-notated documents of oracle bone inscriptions,” Int. J. on Document Analysis and Recognition, Vol.27, pp. 583-601, 2024. https://doi.org/10.1007/s10032-024-00463-0

[B10] [10] X. Yue, H. Li, Y. Fujikawa, and L. Meng, “Dynamic dataset augmentation for deep learning-based oracle bone inscriptions recognition,” J. Comput. Cult. Herit., Vol.15, No.4, 2022. https://doi.org/10.1145/3532868

[B11] [11] G. Wang, K. Jiang, K. Gu, H. Liu, H. Liu, and W. Zhang, “Coarse- and fine-grained fusion hierarchical network for hole filling in view synthesis,” IEEE Trans. on Image Processing, Vol.33, pp. 322-337, 2024. https://doi.org/10.1109/TIP.2023.3341303

[B12] [12] Y. Gan, F. Gao, J. Dong, and S. Chen, “Arbitrary-scale texture generation from coarse-grained control,” IEEE Trans. on Image Processing, Vol.31, pp. 5841-5855, 2022. https://doi.org/10.1109/TIP.2022.3201710

[B13] [13] X.-S. Wei, Y.-Z. Song, O. M. Aodha, J. Wu, Y. Peng, J. Tang, J. Yang, and S. Belongie, “Fine-grained image analysis with deep learning: A survey,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.44, No.12, pp. 8927-8948, 2022. https://doi.org/10.1109/TPAMI.2021.3126648

[B14] [14] Y. Lin, Y. Chang, X. Tong, J. Yu, A. Liotta, G. Huang, W. Song, D. Zeng, Z. Wu, Y. Wang, and W. Zhang, “A survey on RGB, 3D, and multimodal approaches for unsupervised industrial image anomaly detection,” Information Fusion, Vol.121, Article No.103139, 2025. https://doi.org/10.1016/j.inffus.2025.103139

[B15] [15] Z. Li, Y. Yan, X. Wang, Y. Ge, and L. Meng, “A survey of deep learning for industrial visual anomaly detection,” Artificial Intelligence Review, Vol.58, Article No.279, 2025. https://doi.org/10.1007/s10462-025-11287-7

[B16] [16] S. Gao, M. Cheng, K. Zhao, X. Zhang, M. Yang, and P. Torr, “Res2Net: A new multi-scale backbone architecture,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.43, No.2, pp. 652-662, 2021. https://doi.org/10.1109/TPAMI.2019.2938758

[B17] [17] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 580-587, 2014. https://doi.org/10.1109/CVPR.2014.81

[B18] [18] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. on pattern analysis and machine intelligence, Vol.39, No.6, pp. 1137-1149, 2016. https://doi.org/10.1109/TPAMI.2016.2577031

[B19] [19] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” Proc. of the IEEE Int. Conf. on Computer Vision (ICCV), Vol.42, No.2, pp. 318-327, 2017. https://doi.org/10.1109/TPAMI.2018.2858826

[B20] [20] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv:1804.02767, 2018. https://doi.org/10.48550/arXiv.1804.02767

[B21] [21] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv:2004.10934, 2020. https://doi.org/10.48550/arXiv.2004.10934

[B22] [22] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOx: Exceeding YOLO series in 2021,” arXiv:2107.08430, 2021. https://doi.org/10.48550/arXiv.2107.08430

[B23] [23] J. Liu, Q. Hou, Z. Liu, and M. Cheng, “PoolNet+: Exploring the potential of pooling for salient object detection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.45, No.1, pp. 887-904, 2023. https://doi.org/10.1109/TPAMI.2021.3140168

[B24] [24] Q. Hou, L. Zhang, M. Cheng, and J. Feng, “Strip pooling: Rethinking spatial pooling for scene parsing,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4002-4011, 2020. https://doi.org/10.1109/CVPR42600.2020.00406

[B25] [25] Y. Wu, Y. Liu, X. Zhan, and M. Cheng, “P2T: Pyramid pooling transformer for scene understanding,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.45, No.11, pp. 12760-12771, 2023. https://doi.org/10.1109/TPAMI.2022.3202765

[B26] [26] Z. Gao, L. Wang, and G. Wu, “LIP: Local importance-based pooling,” Proc. of the IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 3354-3363, 2019. https://doi.ieeecomputersociety.org/10.1109/ICCV.2019.00345

[B27] [27] H. Xu, J. Liu, Y. Shen, K. Lou, Y. Bao, R. Zhang, S. Zhou, H. Zhao, X. Zhu, and S. Wang, “Geometric pooling: Maintaining more representative information,” IEEE Access, Vol.12, pp. 54066-54072, 2024. https://doi.org/10.1109/ACCESS.2024.3387703

[B28] [28] Z. Chen, H. Ji, Y. Zhang, Z. Zhu, and Y. Li, “High-resolution feature pyramid network for small object detection on drone view,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.34, No.1, pp. 475-489, 2024. https://doi.org/10.1109/TCSVT.2023.3286896

[B29] [29] H. Li, H. Ma, Y. Che, and Z. Yang, “A two-way dense feature pyramid networks for object detection of remote sensing images,” Knowledge and Information Systems, Vol.65, No.11, pp. 4847-4871, 2023. https://doi.org/10.1007/s10115-023-01916-4

[B30] [30] J. Hu, C.-J. R. Shi, and J. Zhang, “Saliency-based YOLO for single target detection,” Knowledge and Information Systems, Vol.63, No.3, pp. 717-732, 2021. https://doi.org/10.1007/s10115-020-01538-0

[B31] [31] S. Xu, J. Fei, G. Zhao, X. Liu, and H. Li, “CCL-YOLO: Catenary components location based on YOLO and gather-distribute mechanism,” IEEE Access, Vol.13, pp. 9064-9072, 2025. https://doi.org/10.1109/ACCESS.2024.3403716

[B32] [32] X. Yue, H. Li, and L. Meng, “An ultralightweight object detection network for empty-dish recycling robots,” IEEE Trans. on Instrumentation and Measurement, Vol.72, pp. 1-12, 2023. https://doi.org/10.1109/TIM.2023.3241078

[B33] [33] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018. https://doi.org/10.1109/CVPR.2018.00913

[B34] [34] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.37, No.9, pp. 1904-1916, 2015. https://doi.org/10.1109/TPAMI.2015.2389824

[B35] [35] X. Yue and L. Meng, “YOLO-MSA: A multiscale stereoscopic attention network for empty-dish recycling robots,” IEEE Trans. on Instrumentation and Measurement, Vol.72, pp. 1-14, 2023. https://doi.org/10.1109/TIM.2023.3315355

[B36] [36] X. Yue, H. Li, M. Shimizu, S. Kawamura, and L. Meng, “YOLO-GD: A deep learning-based object detection algorithm for empty-dish recycling robots,” Machines, Vol.10, No.5, Article No.294, 2022. https://doi.org/10.3390/machines10050294

[B37] [37] X. Lv, F. Duan, J.-j. Jiang, X. Fu, and L. Gan, “Deep metallic surface defect detection: The new benchmark and detection network,” Sensors, Vol.20, No.6, Article No.1562, 2020. https://doi.org/10.3390/s20061562

[B38] [38] W. Huang and P. Wei, “A PCB dataset for defects detection and classification,” arXiv:1901.08204, 2019. https://doi.org/10.48550/arXiv.1901.08204

[B39] [39] K. Liu, X. Han, and B. M. Chen, “Deep learning based automatic crack detection and segmentation for unmanned aerial vehicle inspections,” 2019 IEEE Int. Conf. on Robotics and Biomimetics (ROBIO), pp. 381-387, 2019. https://doi.org/10.1109/ROBIO49542.2019.8961534

[B40] [40] Y. Bao, K. Song, J. Liu, Y. Wang, Y. Yan, H. Yu, and X. Li, “Triplet-graph reasoning network for few-shot metal generic surface defect segmentation,” IEEE Trans. on Instrumentation and Measurement, Vol.70, pp. 1-11, 2021. https://doi.org/10.1109/TIM.2021.3083561

[B41] [41] M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 10778-10787, 2020. https://doi.org/10.1109/CVPR42600.2020.01079

[B42] [42] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” 14th European Conf. on Computer Vision (ECCV), pp. 21-37, 2016. https://doi.org/10.1007/978-3-319-46448-0_2

YOLO-CF: An Object Detection Model that Improves Feature Expression at Both Coarse-Grained and Fine-Grained Levels for Industrial Surface Image Defect Detection

Zhaowei Sun*1,†, Jintao Chen*2, Cong Lin*3, Xuebin Yue*4 , Kuozhan Wang*4, and Lin Meng*5

Zhaowei Sun^1,†, Jintao Chen^2, Cong Lin^3, Xuebin Yue^4 , Kuozhan Wang^4, and Lin Meng^5