Research Paper:
RCT-YOLOv8: A Tuna Detection Model for Distant-Water Fisheries Based on Improved YOLOv8
Qingyi Zhou and Yuqing Liu
College of Engineering Science and Technology, Shanghai Ocean University
No.999 Hucheng Ring Road, Pudong New Area, Shanghai  201306, China
Corresponding author
With the development of distant-water fisheries, ship fishing and fish catch detection are now vital to modern fishing. Existing manual detection methods are prone to issues such as missed detections and false detections. Deep learning has enabled the deployment of detection models on shipboard devices, offering a new solution. However, many existing models are hindered by large parameters and computational complexity, making them unsuitable for shipboard use due to limited resources and costs onboard ships. To address these challenges, we propose the RCT-YOLOv8 model for tuna catch detection in this paper. Specifically, we adopt YOLOv8 as the base model and replace the network backbone with RepVGG network, which employs re-parameterized convolutions to enhance detection accuracy. Additionally, we incorporate coordinate attention at the end of the backbone to better aggregate channel-wise information. In the neck part, we introduce the contextual transformer (CoT) attention and propose the C2F-CoT model, which combines convolutional neural network with Transformer to capture global features, thereby improving detection accuracy and the effectiveness of feature propagation. We test multiple loss functions and select efficient intersection over union, which is more suitable for our algorithm. Furthermore, to adapt to devices with limited computational resources, we utilize the dependency-graph-based pruning method to compress the network model. Compared to the base network, the pruned model achieves a 9.8% increase in detection accuracy while reducing parameters and computational complexity by 40% and 35.8%, respectively. Compared to various algorithms, the pruned model demonstrates the highest detection accuracy, lowest parameter count, and lowest computational complexity, achieving optimal results at all fronts.
- [1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, Vol.521, No.7553, pp. 436-444, 2015. https://doi.org/10.1038/nature14539
- [2] J. Wang, X. Yin, and G. Li, “A real-time lightweight detection algorithm for deck crew and the use of fishing nets based on improved YOLOv5s network,” Fishes, Vol.8, No.7, Article No.376, 2023. https://doi.org/10.3390/fishes8070376
- [3] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical guidelines for efficient CNN architecture design,” Proc. of the 15th European Conf. on Computer Vision (ECCV 2018), Part 14, pp. 122-138, 2018. https://doi.org/10.1007/978-3-030-01264-9_8
- [4] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” Proc. of the 15th European Conf. on Computer Vision (ECCV), Part 7, pp. 3-19, 2018. https://doi.org/10.1007/978-3-030-01234-2_1
- [5] J. Li, C. Liu, X. Lu, and B. Wu, “CME-YOLOv5: An efficient object detection network for densely spaced fish and small targets,” Water, Vol.14, No.15, Article No.2412, 2022. https://doi.org/10.3390/w14152412
- [6] Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 13708-13717, 2021.
- [7] Y.-F. Zhang et al., “Focal and efficient IOU loss for accurate bounding box regression,” Neurocomputing, Vol.506, pp. 146-157, 2022. https://doi.org/10.1016/j.neucom.2022.07.042
- [8] H. Rezatofighi et al., “Generalized intersection over union: A metric and a loss for bounding box regression,” 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 658-666, 2019. https://doi.org/10.1109/CVPR.2019.00075
- [9] Y. Liu et al., “An improved Tuna-YOLO model based on YOLO v3 for real-time tuna detection considering lightweight deployment,” J. of Marine Science and Engineering, Vol.11, No.3, Article No.542, 2023. https://doi.org/10.3390/jmse11030542
- [10] A. Howard et al., “Searching for MobileNetV3,” 2019 IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 1314-1324, 2019. https://doi.org/10.1109/ICCV.2019.00140
- [11] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 7132-7141, 2018. https://doi.org/10.1109/CVPR.2018.00745
- [12] K. Chen et al., “AP-loss for accurate one-stage object detection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.43, No.11, pp. 3782-3798, 2021. https://doi.org/10.1109/TPAMI.2020.2991457
- [13] J. Luo, Z. Yang, S. Li, and Y. Wu, “FPCB surface defect detection: A decoupled two-stage object detection framework,” IEEE Trans. on Instrumentation and Measurement, Vol.70, Article No.5012311, 2021. https://doi.org/10.1109/TIM.2021.3092510
- [14] X. Li, M. Shang, H. Qin, and L. Chen, “Fast accurate fish detection and recognition of underwater images with Fast R-CNN,” Proc. of OCEANS 2015, 2015. https://doi.org/10.23919/OCEANS.2015.7404464
- [15] R. Girshick, “Fast R-CNN,” 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 1440-1448, 2015. https://doi.org/10.1109/ICCV.2015.169
- [16] S. C. Mana and T. Sasipraba, “An intelligent deep learning enabled marine fish species detection and classification model,” Int. J. on Artificial Intelligence Tools, Vol.31, No.1, Article No.2250017, 2022. https://doi.org/10.1142/S0218213022500178
- [17] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” 2017 IEEE Int. Conf. on Computer Vision (ICCV), pp. 2980-2988, 2017. https://doi.org/10.1109/ICCV.2017.322
- [18] J. C. Ovalle, C. Vilas, and L. T. Antelo, “On the use of deep learning for fish species recognition and quantification on board fishing vessels,” Marine Policy, Vol.139, Article No.105015, 2022. https://doi.org/10.1016/j.marpol.2022.105015
- [19] Y. Nan, J. Ju, Q. Hua, H. Zhang, and B. Wang, “A-MobileNet: An approach of facial expression recognition,” Alexandria Engineering J., Vol.61, No.6, pp. 4435-4444, 2022. https://doi.org/10.1016/j.aej.2021.09.066
- [20] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016. https://doi.org/10.1109/CVPR.2016.91
- [21] V. Kandimalla et al., “Automated detection, classification and counting of fish in fish passages with deep learning,” Frontiers in Marine Science, Vol.8, Article No.823173, 2022. https://doi.org/10.3389/fmars.2021.823173
- [22] A. Jalal, A. Salman, A. Mian, M. Shortis, and F. Shafait, “Fish detection and species classification in underwater environments using deep learning with temporal information,” Ecological Informatics, Vol.57, Article No.101088, 2020. https://doi.org/10.1016/j.ecoinf.2020.101088
- [23] S. Li, L. Yang, H. Yu, and Y. Chen, “Underwater fish species identification model and real-time identification system,” Smart Agriculture, Vol.4, No.1, pp. 130-139, 2022 (in Chinese). https://doi.org/10.12133/j.smartag.SA202202006
- [24] N. Hasan, Y. Bao, A. Shawon, and Y. Huang, “DenseNet convolutional neural networks application for predicting COVID-19 using CT image,” SN Computer Science, Vol.2, No.5, Article No.389, 2021. https://doi.org/10.1007/s42979-021-00782-7
- [25] K. M. Knausgård et al., “Temperate fish detection and classification: A deep learning based approach,” Applied Intelligence, Vol.52, No.6, pp. 6988-7001, 2022. https://doi.org/10.1007/s10489-020-02154-9
- [26] K. Cai et al., “A modified YOLOv3 model for fish detection based on MobileNetv1 as backbone,” Aquacultural Engineering, Vol.91, Article No.102117, 2020. https://doi.org/10.1016/j.aquaeng.2020.102117
- [27] M. Sohan, T. S. Ram, and C. V. R. Reddy, “A review on YOLOv8 and its advancements,” Proc. of the Int. Conf. on Data Intelligence and Cognitive Informatics (ICDICI 2023), pp. 529-545, 2024. https://doi.org/10.1007/978-981-99-7962-2_39
- [28] X. Ding et al., “RepVGG: Making VGG-style ConvNets great again,” 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 13728-13737, 2021. https://doi.org/10.1109/CVPR46437.2021.01352
- [29] J. Wei et al., “Chain-of-thought prompting elicits reasoning in large language models,” Proc. of the 36th Int. Conf. on Neural Information Processing Systems (NIPS’22), pp. 24824-24837, 2022.
- [30] Z. Zheng et al., “Distance-IoU loss: Faster and better learning for bounding box regression,” Proc. of the 34th AAAI Conf. on Artificial Intelligence (AAAI-20), pp. 12993-13000, 2020.
- [31] Y. Li, T. Yao, Y. Pan, and T. Mei, “Contextual Transformer networks for visual recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.45, No.2, pp. 1489-1500, 2023. https://doi.org/10.1109/TPAMI.2022.3164083
- [32] Y. Jin, X. Tian, Z. Zhang, P. Liu, and X. Tang, “C2F: An effective coarse-to-fine network for video summarization,” Image and Vision Computing, Vol.144, Article No.104962, 2024. https://doi.org/10.1016/j.imavis.2024.104962
- [33] Z. Zheng et al., “Enhancing geometric factors in model learning and inference for object detection and instance segmentation,” IEEE Trans. on Cybernetics, Vol.52, No.8, pp. 8574-8586, 2022. https://doi.org/10.1109/TCYB.2021.3095305
- [34] K. Han et al., “GhostNet: More features from cheap operations,” 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1577-1586, 2020. https://doi.org/10.1109/CVPR42600.2020.00165
- [35] M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” Proc. of the 36th Int. Conf. on Machine Learning, pp. 6105-6114, 2019.
- [36] S. Du, B. Zhang, and P. Zhang, “Scale-sensitive IOU loss: An improved regression loss function in remote sensing object detection,” IEEE Access, Vol.9, pp. 141258-141272, 2021. https://doi.org/10.1109/ACCESS.2021.3119562
				 This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.
				 This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License. 
			
