Combining Instance Segmentation and Background Subtraction Models for Small Object Detection

Wisan Dhammatorn; Seiya Ito; Naoshi Kaneko; Kazuhiko Sumi

doi:10.20965/ijat.2025.p0237

single-au.php

« previous

IJAT Vol.19 No.3 pp. 237-247

doi: 10.20965/ijat.2025.p0237

(2025)

Research Paper:

Views over last 60 days: 171

Combining Instance Segmentation and Background Subtraction Models for Small Object Detection

Wisan Dhammatorn^1,†, Seiya Ito^2 , Naoshi Kaneko^3 , and Kazuhiko Sumi^4

^*1Graduate School of Science and Engineering, Aoyama Gakuin University
5-10-1 Fuchinobe, Chuo-ku, Sagamihara, Kanagawa 252-5258, Japan

^†Corresponding author

^*2Advanced Reality Technology Laboratory, Universal Communication Research Institute, National Institute of Information and Communications Technology (NICT)
Tokyo, Japan

^*3Department of Information Systems and Multimedia Design, School of Science and Technology for Future Life, Tokyo Denki University
Tokyo, Japan

^*4Department of Integrated Information Technology, College of Science and Engineering, Aoyama Gakuin University
Sagamihara, Japan

Received:

November 29, 2024

Accepted:

January 15, 2025

Published:

May 5, 2025

Keywords:

object detection, image segmentation, vehicle detection, pedestrian detection, background subtraction

Abstract

Object detection is a fundamental problem in computer vision that has been extensively investigated over the past decades. Although deep neural networks (DNNs) improve object detection, they cannot effectively recognize small objects. Detecting small objects remains challenging owing to several factors, such as low resolution and scale variance. These challenges are particularly evident in object detection using surveillance cameras, where small objects typically appear in cluttered environments and at varying distances. In object detection using surveillance cameras, the static nature of the background has not been fully exploited in DNN-based object detection methods, although it can be an important cue for detection. In this study, we propose a simple yet effective method to enhance object detection in areas of small objects not detectable by state-of-the-art DNN-based instance segmentation methods. The proposed method extracts foreground regions using a background subtraction model and classifies them, thereby enabling the identification of small objects. In our experiments, we evaluate two real-world scenarios: detecting a person walking on campus and identifying vehicles in road-surveillance footage. The results show that our method improves the detection of small objects and performs better than baseline methods.

Cite this article as:

W. Dhammatorn, S. Ito, N. Kaneko, and K. Sumi, “Combining Instance Segmentation and Background Subtraction Models for Small Object Detection,” Int. J. Automation Technol., Vol.19 No.3, pp. 237-247, 2025.

Data files:

References

[1] Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object Detection in 20 Years: A Survey,” Proc. of the IEEE, Vol.111, No.3, pp. 257-276, 2023. https://doi.org/10.1109/JPROC.2023.3238524
[2] X. Zou, “A Review of Object Detection Techniques,” Proc. of the 2019 Int. Conf. on Smart Grid and Electrical Automation (ICSGEA), pp. 251-254, 2019. https://doi.org/10.1109/ICSGEA.2019.00065
[3] Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, “Object Detection With Deep Learning: A Review,” IEEE Trans. on Neural Networks and Learning Systems, Vol.30, No.11, pp. 3212-3232, 2018. https://doi.org/10.1109/TNNLS.2018.2876865
[4] S. Nigam, M. Khare, R. K. Srivastava, and A. Khare, “An Effective Local Feature Descriptor for Object Detection in Real Scenes,” Proc. of the 2013 IEEE Conf. on Information & Communication Technologies (ICT), pp. 244-248, 2013. https://doi.org/10.1109/CICT.2013.6558098
[5] E. Guzmán-Ramírez, A. García, E. Guerrero-Ramírez, A. Orantes-Molina, O. Ramírez-Cárdenas, and I. Arroyo-Fernández, “Multi-Object Recognition Using a Feature Descriptor and Neural Classifier,” Vision Sensors-Recent Advances, 2022. https://doi.org/10.5772/intechopen.106754
[6] L. Hou, Q. Liu, Z. Chen, and J. Xu, “Human Detection in Intelligent Video Surveillance: A Review,” J. Adv. Comput. Intell. Intell. Inform., Vol.22, No.7, pp. 1056-1064, 2018. https://doi.org/10.20965/jaciii.2018.p1056
[7] F. Guo, J. Tang, H. Peng, and B. Zou, “Temporal-Spatial Filtering for Enhancement of Low-Light Surveillance Video,” J. Adv. Comput. Intell. Intell. Inform., Vol.20, No.4, pp. 652-661, 2016. https://doi.org/10.20965/jaciii.2016.p0652
[8] Y. Hatakeyama, A. Mitsuta, and K. Hirota, “Detection Algorithm for Real Surveillance Cameras Using Geographic Information,” J. Adv. Comput. Intell. Intell. Inform., Vol.12, No.1, pp. 4-9, 2008. 10.20965/jaciii.2008.p0004
[9] F. Sultana, A. Sufian, and P. Dutta, “A Review of Object Detection Models Based on Convolutional Neural Network,” Intelligent Computing: Image Processing Based Applications, pp. 1-16, 2020. https://doi.org/10.1007/978-981-15-4288-6_1
[10] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single Shot MultiBox Detector,” Proc. of the 14th European Conf. on Computer Vision (ECCV), pp. 21-37, 2016. https://doi.org/10.1007/978-3-319-46448-0_2
[11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016. https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.91
[12] Y. Li and F. Ren, “Light-Weight RetinaNet for Object Detection,” arXiv:1905.10011, 2019. https://doi.org/10.48550/arXiv.1905.10011
[13] R. Girshick, “Fast R-CNN,” Proc. of the 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 1440-1448, 2015. https://doi.org/10.1109/ICCV.2015.169
[14] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.39, No.6, pp. 1137-1149, 2017. https://doi.ieeecomputersociety.org/10.1109/TPAMI.2016.2577031
[15] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” Proc. of the 2017 IEEE Int. Conf. on Computer Vision (ICCV), pp. 2980-2988, 2017. https://doi.org/10.1109/ICCV.2017.322
[16] Y. Liu, P. Sun, N. Wergeles, and Y. Shang, “A Survey and Performance Evaluation of Deep Learning Methods for Small Object Detection,” Expert Systems with Applications, Vol.172, Article No.114602, 2021. https://doi.org/10.1016/j.eswa.2021.114602
[17] G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, X. Xie, and J. Han, “Towards Large-Scale Small Object Detection: Survey and Benchmarks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.45, No.11, pp. 13467-13488, 2023. https://doi.org/10.1109/TPAMI.2023.3290594
[18] K. Tong, Y. Wu, and F. Zhou, “Recent Advances in Small Object Detection Based on Deep Learning: A Review,” Image and Vision Computing, Vol.97, Article No.103910, 2020. https://doi.org/10.1016/j.imavis.2020.103910
[19] B. Mirzaei, H. Nezamabadi-Pour, A. Raoof, and R. Derakhshani, “Small Object Detection and Tracking: A Comprehensive Review,” Sensors, Vol.23, No.15, Article No.6887, 2023. https://doi.org/10.3390/s23156887
[20] F. C. Akyon, S. O. Altinuc, and A. Temizel, “Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection,” Proc. of the 2022 IEEE Int. Conf. on Image Processing (ICIP), pp. 966-970, 2022. https://doi.org/10.1109/ICIP46576.2022.9897990
[21] Z.-W. Sun, Z.-X. Hua, H.-C. Li, and H.-Y. Zhong, “Flying Bird Object Detection Algorithm in Surveillance Video Based on Motion Information,” IEEE Trans. on Instrumentation and Measurement, Vol.73, 2024. https://doi.org/10.1109/TIM.2023.3334348
[22] F. Özge Ünel, B. O. Özkalayci, and C. Çiğla, “The Power of Tiling for Small Object Detection,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 582-591, 2019. https://doi.org/10.1109/CVPRW.2019.00084
[23] Y. Bai, Y. Zhang, M. Ding, and B. Ghanem, “SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network,” Proc. of the 15th European Conf. on Computer Vision (ECCV), pp. 210-226, 2018. https://doi.org/10.1007/978-3-030-01261-8_13
[24] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End Object Detection with Transformers,” Proc. of the 16th European Conf. on Computer Vision (ECCV), pp. 213-229, 2020. https://doi.org/10.1007/978-3-030-58452-8_13
[25] H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum, “DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection,” arXiv:2203.03605, 2022. https://doi.org/10.48550/arXiv.2203.03605
[26] A. Gupta, S. Narayan, K. J. Joseph, S. Khan, F. S. Khan, and M. Shah, “OW-DETR: Open-world Detection Transformer,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 9235-9244, 2022. https://doi.org/10.1109/CVPR52688.2022.00902
[27] L. He and S. Todorovic, “DESTR: Object Detection with Split Transformer,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 9377-9386, 2022. https://doi.org/10.1109/CVPR52688.2022.00916
[28] H. Nakai, “Non-Parameterized Bayes Decision Method for Moving Object Detection,” Proc. of the 2nd Asian Conf. on Computer Vision (ACCV), pp. 447-451, 1995.
[29] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Principles and Practice of Background Maintenance,” Proc. of the Seventh IEEE Int. Conf. on Computer Vision (ICCV), pp. 255-261, 1999. https://doi.org/10.1109/ICCV.1999.791228
[30] C. Stauffer and W. E. L. Grimson, “Adaptive Background Mixture Models for Real-Time Tracking,” Proc. of the 1999 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 246-252, 1999. https://doi.org/10.1109/CVPR.1999.784637
[31] A. Darwich, P. Hébert, A. Bigand, and Y. Mohanna, “Background Subtraction Based on a New Fuzzy Mixture of Gaussians for Moving Object Detection,” J. of Imaging, Vol.4, No.7, Article No.92, 2018. https://doi.org/10.3390/jimaging4070092
[32] S. Rakesh, N. P. Hegde, M. V. Gopalachari, D. Jayaram, B. Madhu, M. A. Hameed, R. Vankdothu, and L. K. S. Kumar, “Moving object detection using modified GMM based background subtraction,” Measurement: Sensors, Vol.30, Article No.100898, 2023. https://doi.org/10.1016/j.measen.2023.100898
[33] A. Nurhadiyatna, W. Jatmiko, B. Hardjono, A. Wibisono, I. Sina, and P. Mursanto, “Background Subtraction Using Gaussian Mixture Model Enhanced by Hole Filling Algorithm (GMMHF),” Proc. of the 2013 IEEE Int. Conf. on Systems, Man, and Cybernetics, pp. 4006-4011, 2013. https://doi.org/10.1109/SMC.2013.684
[34] R. Rifat, J. R. Mou, R. Shahariar, and A. Ahsan, “A New Approach of Moving Object Detection Using Background Subtraction Method,” Proc. of the 3rd Int. Conf. on Electrical, Computer & Telecommunication Engineering (ICECTE), pp. 256-259, 2019. https://doi.org/10.1109/ICECTE48615.2019.9303552
[35] L. A. Lim and H. Y. Keles, “Learning Multi-scale Features for Foreground Segmentation,” Pattern Analysis and Applications, Vol.23, No.3, pp. 1369-1380, 2020. https://doi.org/10.1007/s10044-019-00845-9
[36] P. W. Patil and S. Murala, “MSFgNet: A Novel Compact End-to-End Deep Network for Moving Object Detection,” IEEE Trans. on Intelligent Transportation Systems, Vol.20, No.11, pp. 4066-4077, 2019. https://doi.org/10.1109/TITS.2018.2880096
[37] P. W. Patil, S. Murala, A. Dhall, and S. Chaudhary, “MsEDNet: Multi-Scale Deep Saliency Learning for Moving Object Detection,” Proc. of the 2018 IEEE Int. Conf. on Systems, Man, and Cybernetics (SMC), pp. 1670-1675, 2018. https://doi.org/10.1109/SMC.2018.00289
[38] S. Alejandro, T. Minematsu, A. Shimada, T. Shibata, R. Taniguchi, E. Kaneko, and H. Miyano, “Semi-Automatic Learning Framework Combining Object Detection and Background Subtraction,” Proc. of the 15th Int. Joint Conf. on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP), pp. 96-106, 2020. https://doi.org/10.5220/0008941200960106
[39] T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” Proc. of the 13th European Conf. on Computer Vision (ECCV), 2014. https://doi.org/10.1007/978-3-319-10602-1_48
[40] C. Grana, D. Borghesani, and R. Cucchiara, “Optimized Block-based Connected Components Labeling with Decision Trees,” IEEE Trans. on Image Processing, Vol.19, No.6, pp. 1596-1609, 2010. https://doi.org/10.1109/tip.2010.2044963
[41] M. Naphade, S. Wang, D. C. Anastasiu, Z. Tang, M.-C. Chang, X. Yang, Y. Yao, L. Zheng, P. Chakraborty, C. E. Lopez, A. Sharma, Q. Feng, V. Ablavsky, and S. Sclaroff, “The 5th AI City Challenge,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 4263-4273, 2021. https://doi.ieeecomputersociety.org/10.1109/CVPRW53098.2021.00482
[42] Z. Tang, M. Naphade, M.-Y. Liu, X. Yang, S. Birchfield, S. Wang, R. Kumar, D. Anastasiu, and J.-N. Hwang, “CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 8797-8806, 2019. https://doi.org/10.1109/CVPR.2019.00900
[43] L. Leal-Taixé, A. Milan, I. Reid, S. Roth, and K. Schindler, “MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking,” arXiv:1504.01942, 2015. https://doi.org/10.48550/arXiv.1504.01942
[44] G. Jocher, “YOLOv5 by Ultralytics,” 2020. https://github.com/ultralytics/yolov5 [Accessed January 16, 2024]
[45] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors,” 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 7464-7475, 2023. https://doi.org/10.1109/CVPR52729.2023.00721
[46] T. Kurbiel and S. Khaleghian, “Training of Deep Neural Networks Based on Distance Measures Using RMSProp,” arXiv:1708.01911, 2017. https://doi.org/10.48550/arXiv.1708.01911
[47] M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” Proc. of the 36th Int. Conf. on Machine Learning (ICML), pp. 6105-6114, 2020.
[48] M. Naphade, Z. Tang, M.-C. Chang, D. C. Anastasiu, A. Sharma, R. Chellappa, S. Wang, P. Chakraborty, T. Huang, J.-N. Hwang, and S. Lyu, “The 2019 AI City Challenge,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 452-460, 2019.
[49] S. Zhang, G. Wu, J. P. Costeira, and J. F. Moura, “Understanding Traffic Density from Large-Scale Web Camera Data,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4264-4273, 2017. https://doi.org/10.1109/CVPR.2017.454
[50] Y. Deng, P. Luo, C. C. Loy, and X. Tang, “Pedestrian Attribute Recognition At Far Distance,” Proc. of the ACM Int. Conf. on Multimedia (MM), pp. 789-792, 2014.
[51] V. Tsakanikas and T. Dagiuklas, “Video Surveillance Systems-Current Status and Future Trends,” Computers & Electrical Engineering, Vol.70, pp. 736-753, 2018. https://doi.org/10.1016/j.compeleceng.2017.11.011

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object Detection in 20 Years: A Survey,” Proc. of the IEEE, Vol.111, No.3, pp. 257-276, 2023. https://doi.org/10.1109/JPROC.2023.3238524

[2] [2] X. Zou, “A Review of Object Detection Techniques,” Proc. of the 2019 Int. Conf. on Smart Grid and Electrical Automation (ICSGEA), pp. 251-254, 2019. https://doi.org/10.1109/ICSGEA.2019.00065

[3] [3] Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, “Object Detection With Deep Learning: A Review,” IEEE Trans. on Neural Networks and Learning Systems, Vol.30, No.11, pp. 3212-3232, 2018. https://doi.org/10.1109/TNNLS.2018.2876865

[4] [4] S. Nigam, M. Khare, R. K. Srivastava, and A. Khare, “An Effective Local Feature Descriptor for Object Detection in Real Scenes,” Proc. of the 2013 IEEE Conf. on Information & Communication Technologies (ICT), pp. 244-248, 2013. https://doi.org/10.1109/CICT.2013.6558098

[5] [5] E. Guzmán-Ramírez, A. García, E. Guerrero-Ramírez, A. Orantes-Molina, O. Ramírez-Cárdenas, and I. Arroyo-Fernández, “Multi-Object Recognition Using a Feature Descriptor and Neural Classifier,” Vision Sensors-Recent Advances, 2022. https://doi.org/10.5772/intechopen.106754

[6] [6] L. Hou, Q. Liu, Z. Chen, and J. Xu, “Human Detection in Intelligent Video Surveillance: A Review,” J. Adv. Comput. Intell. Intell. Inform., Vol.22, No.7, pp. 1056-1064, 2018. https://doi.org/10.20965/jaciii.2018.p1056

[7] [7] F. Guo, J. Tang, H. Peng, and B. Zou, “Temporal-Spatial Filtering for Enhancement of Low-Light Surveillance Video,” J. Adv. Comput. Intell. Intell. Inform., Vol.20, No.4, pp. 652-661, 2016. https://doi.org/10.20965/jaciii.2016.p0652

[8] [8] Y. Hatakeyama, A. Mitsuta, and K. Hirota, “Detection Algorithm for Real Surveillance Cameras Using Geographic Information,” J. Adv. Comput. Intell. Intell. Inform., Vol.12, No.1, pp. 4-9, 2008. 10.20965/jaciii.2008.p0004

[9] [9] F. Sultana, A. Sufian, and P. Dutta, “A Review of Object Detection Models Based on Convolutional Neural Network,” Intelligent Computing: Image Processing Based Applications, pp. 1-16, 2020. https://doi.org/10.1007/978-981-15-4288-6_1

[10] [10] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single Shot MultiBox Detector,” Proc. of the 14th European Conf. on Computer Vision (ECCV), pp. 21-37, 2016. https://doi.org/10.1007/978-3-319-46448-0_2

[11] [11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016. https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.91

[12] [12] Y. Li and F. Ren, “Light-Weight RetinaNet for Object Detection,” arXiv:1905.10011, 2019. https://doi.org/10.48550/arXiv.1905.10011

[13] [13] R. Girshick, “Fast R-CNN,” Proc. of the 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 1440-1448, 2015. https://doi.org/10.1109/ICCV.2015.169

[14] [14] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.39, No.6, pp. 1137-1149, 2017. https://doi.ieeecomputersociety.org/10.1109/TPAMI.2016.2577031

[15] [15] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” Proc. of the 2017 IEEE Int. Conf. on Computer Vision (ICCV), pp. 2980-2988, 2017. https://doi.org/10.1109/ICCV.2017.322

[16] [16] Y. Liu, P. Sun, N. Wergeles, and Y. Shang, “A Survey and Performance Evaluation of Deep Learning Methods for Small Object Detection,” Expert Systems with Applications, Vol.172, Article No.114602, 2021. https://doi.org/10.1016/j.eswa.2021.114602

[17] [17] G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, X. Xie, and J. Han, “Towards Large-Scale Small Object Detection: Survey and Benchmarks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.45, No.11, pp. 13467-13488, 2023. https://doi.org/10.1109/TPAMI.2023.3290594

[18] [18] K. Tong, Y. Wu, and F. Zhou, “Recent Advances in Small Object Detection Based on Deep Learning: A Review,” Image and Vision Computing, Vol.97, Article No.103910, 2020. https://doi.org/10.1016/j.imavis.2020.103910

[19] [19] B. Mirzaei, H. Nezamabadi-Pour, A. Raoof, and R. Derakhshani, “Small Object Detection and Tracking: A Comprehensive Review,” Sensors, Vol.23, No.15, Article No.6887, 2023. https://doi.org/10.3390/s23156887

[20] [20] F. C. Akyon, S. O. Altinuc, and A. Temizel, “Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection,” Proc. of the 2022 IEEE Int. Conf. on Image Processing (ICIP), pp. 966-970, 2022. https://doi.org/10.1109/ICIP46576.2022.9897990

[21] [21] Z.-W. Sun, Z.-X. Hua, H.-C. Li, and H.-Y. Zhong, “Flying Bird Object Detection Algorithm in Surveillance Video Based on Motion Information,” IEEE Trans. on Instrumentation and Measurement, Vol.73, 2024. https://doi.org/10.1109/TIM.2023.3334348

[22] [22] F. Özge Ünel, B. O. Özkalayci, and C. Çiğla, “The Power of Tiling for Small Object Detection,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 582-591, 2019. https://doi.org/10.1109/CVPRW.2019.00084

[23] [23] Y. Bai, Y. Zhang, M. Ding, and B. Ghanem, “SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network,” Proc. of the 15th European Conf. on Computer Vision (ECCV), pp. 210-226, 2018. https://doi.org/10.1007/978-3-030-01261-8_13

[24] [24] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End Object Detection with Transformers,” Proc. of the 16th European Conf. on Computer Vision (ECCV), pp. 213-229, 2020. https://doi.org/10.1007/978-3-030-58452-8_13

[25] [25] H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum, “DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection,” arXiv:2203.03605, 2022. https://doi.org/10.48550/arXiv.2203.03605

[26] [26] A. Gupta, S. Narayan, K. J. Joseph, S. Khan, F. S. Khan, and M. Shah, “OW-DETR: Open-world Detection Transformer,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 9235-9244, 2022. https://doi.org/10.1109/CVPR52688.2022.00902

[27] [27] L. He and S. Todorovic, “DESTR: Object Detection with Split Transformer,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 9377-9386, 2022. https://doi.org/10.1109/CVPR52688.2022.00916

[28] [28] H. Nakai, “Non-Parameterized Bayes Decision Method for Moving Object Detection,” Proc. of the 2nd Asian Conf. on Computer Vision (ACCV), pp. 447-451, 1995.

[29] [29] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Principles and Practice of Background Maintenance,” Proc. of the Seventh IEEE Int. Conf. on Computer Vision (ICCV), pp. 255-261, 1999. https://doi.org/10.1109/ICCV.1999.791228

[30] [30] C. Stauffer and W. E. L. Grimson, “Adaptive Background Mixture Models for Real-Time Tracking,” Proc. of the 1999 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 246-252, 1999. https://doi.org/10.1109/CVPR.1999.784637

[31] [31] A. Darwich, P. Hébert, A. Bigand, and Y. Mohanna, “Background Subtraction Based on a New Fuzzy Mixture of Gaussians for Moving Object Detection,” J. of Imaging, Vol.4, No.7, Article No.92, 2018. https://doi.org/10.3390/jimaging4070092

[32] [32] S. Rakesh, N. P. Hegde, M. V. Gopalachari, D. Jayaram, B. Madhu, M. A. Hameed, R. Vankdothu, and L. K. S. Kumar, “Moving object detection using modified GMM based background subtraction,” Measurement: Sensors, Vol.30, Article No.100898, 2023. https://doi.org/10.1016/j.measen.2023.100898

[33] [33] A. Nurhadiyatna, W. Jatmiko, B. Hardjono, A. Wibisono, I. Sina, and P. Mursanto, “Background Subtraction Using Gaussian Mixture Model Enhanced by Hole Filling Algorithm (GMMHF),” Proc. of the 2013 IEEE Int. Conf. on Systems, Man, and Cybernetics, pp. 4006-4011, 2013. https://doi.org/10.1109/SMC.2013.684

[34] [34] R. Rifat, J. R. Mou, R. Shahariar, and A. Ahsan, “A New Approach of Moving Object Detection Using Background Subtraction Method,” Proc. of the 3rd Int. Conf. on Electrical, Computer & Telecommunication Engineering (ICECTE), pp. 256-259, 2019. https://doi.org/10.1109/ICECTE48615.2019.9303552

[35] [35] L. A. Lim and H. Y. Keles, “Learning Multi-scale Features for Foreground Segmentation,” Pattern Analysis and Applications, Vol.23, No.3, pp. 1369-1380, 2020. https://doi.org/10.1007/s10044-019-00845-9

[36] [36] P. W. Patil and S. Murala, “MSFgNet: A Novel Compact End-to-End Deep Network for Moving Object Detection,” IEEE Trans. on Intelligent Transportation Systems, Vol.20, No.11, pp. 4066-4077, 2019. https://doi.org/10.1109/TITS.2018.2880096

[37] [37] P. W. Patil, S. Murala, A. Dhall, and S. Chaudhary, “MsEDNet: Multi-Scale Deep Saliency Learning for Moving Object Detection,” Proc. of the 2018 IEEE Int. Conf. on Systems, Man, and Cybernetics (SMC), pp. 1670-1675, 2018. https://doi.org/10.1109/SMC.2018.00289

[38] [38] S. Alejandro, T. Minematsu, A. Shimada, T. Shibata, R. Taniguchi, E. Kaneko, and H. Miyano, “Semi-Automatic Learning Framework Combining Object Detection and Background Subtraction,” Proc. of the 15th Int. Joint Conf. on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP), pp. 96-106, 2020. https://doi.org/10.5220/0008941200960106

[39] [39] T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” Proc. of the 13th European Conf. on Computer Vision (ECCV), 2014. https://doi.org/10.1007/978-3-319-10602-1_48

[40] [40] C. Grana, D. Borghesani, and R. Cucchiara, “Optimized Block-based Connected Components Labeling with Decision Trees,” IEEE Trans. on Image Processing, Vol.19, No.6, pp. 1596-1609, 2010. https://doi.org/10.1109/tip.2010.2044963

[41] [41] M. Naphade, S. Wang, D. C. Anastasiu, Z. Tang, M.-C. Chang, X. Yang, Y. Yao, L. Zheng, P. Chakraborty, C. E. Lopez, A. Sharma, Q. Feng, V. Ablavsky, and S. Sclaroff, “The 5th AI City Challenge,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 4263-4273, 2021. https://doi.ieeecomputersociety.org/10.1109/CVPRW53098.2021.00482

[42] [42] Z. Tang, M. Naphade, M.-Y. Liu, X. Yang, S. Birchfield, S. Wang, R. Kumar, D. Anastasiu, and J.-N. Hwang, “CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 8797-8806, 2019. https://doi.org/10.1109/CVPR.2019.00900

[43] [43] L. Leal-Taixé, A. Milan, I. Reid, S. Roth, and K. Schindler, “MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking,” arXiv:1504.01942, 2015. https://doi.org/10.48550/arXiv.1504.01942

[44] [44] G. Jocher, “YOLOv5 by Ultralytics,” 2020. https://github.com/ultralytics/yolov5 [Accessed January 16, 2024]

[45] [45] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors,” 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 7464-7475, 2023. https://doi.org/10.1109/CVPR52729.2023.00721

[46] [46] T. Kurbiel and S. Khaleghian, “Training of Deep Neural Networks Based on Distance Measures Using RMSProp,” arXiv:1708.01911, 2017. https://doi.org/10.48550/arXiv.1708.01911

[47] [47] M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” Proc. of the 36th Int. Conf. on Machine Learning (ICML), pp. 6105-6114, 2020.

[48] [48] M. Naphade, Z. Tang, M.-C. Chang, D. C. Anastasiu, A. Sharma, R. Chellappa, S. Wang, P. Chakraborty, T. Huang, J.-N. Hwang, and S. Lyu, “The 2019 AI City Challenge,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 452-460, 2019.

[49] [49] S. Zhang, G. Wu, J. P. Costeira, and J. F. Moura, “Understanding Traffic Density from Large-Scale Web Camera Data,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4264-4273, 2017. https://doi.org/10.1109/CVPR.2017.454

[50] [50] Y. Deng, P. Luo, C. C. Loy, and X. Tang, “Pedestrian Attribute Recognition At Far Distance,” Proc. of the ACM Int. Conf. on Multimedia (MM), pp. 789-792, 2014.

[51] [51] V. Tsakanikas and T. Dagiuklas, “Video Surveillance Systems-Current Status and Future Trends,” Computers & Electrical Engineering, Vol.70, pp. 736-753, 2018. https://doi.org/10.1016/j.compeleceng.2017.11.011

Combining Instance Segmentation and Background Subtraction Models for Small Object Detection

Wisan Dhammatorn*1,†, Seiya Ito*2 , Naoshi Kaneko*3 , and Kazuhiko Sumi*4

Wisan Dhammatorn^1,†, Seiya Ito^2 , Naoshi Kaneko^3 , and Kazuhiko Sumi^4