single-au.php

IJAT Vol.19 No.3 pp. 237-247
doi: 10.20965/ijat.2025.p0237
(2025)

Research Paper:

Combining Instance Segmentation and Background Subtraction Models for Small Object Detection

Wisan Dhammatorn*1,†, Seiya Ito*2 ORCID Icon, Naoshi Kaneko*3 ORCID Icon, and Kazuhiko Sumi*4 ORCID Icon

*1Graduate School of Science and Engineering, Aoyama Gakuin University
5-10-1 Fuchinobe, Chuo-ku, Sagamihara, Kanagawa 252-5258, Japan

Corresponding author

*2Advanced Reality Technology Laboratory, Universal Communication Research Institute, National Institute of Information and Communications Technology (NICT)
Tokyo, Japan

*3Department of Information Systems and Multimedia Design, School of Science and Technology for Future Life, Tokyo Denki University
Tokyo, Japan

*4Department of Integrated Information Technology, College of Science and Engineering, Aoyama Gakuin University
Sagamihara, Japan

Received:
November 29, 2024
Accepted:
January 15, 2025
Published:
May 5, 2025
Keywords:
object detection, image segmentation, vehicle detection, pedestrian detection, background subtraction
Abstract

Object detection is a fundamental problem in computer vision that has been extensively investigated over the past decades. Although deep neural networks (DNNs) improve object detection, they cannot effectively recognize small objects. Detecting small objects remains challenging owing to several factors, such as low resolution and scale variance. These challenges are particularly evident in object detection using surveillance cameras, where small objects typically appear in cluttered environments and at varying distances. In object detection using surveillance cameras, the static nature of the background has not been fully exploited in DNN-based object detection methods, although it can be an important cue for detection. In this study, we propose a simple yet effective method to enhance object detection in areas of small objects not detectable by state-of-the-art DNN-based instance segmentation methods. The proposed method extracts foreground regions using a background subtraction model and classifies them, thereby enabling the identification of small objects. In our experiments, we evaluate two real-world scenarios: detecting a person walking on campus and identifying vehicles in road-surveillance footage. The results show that our method improves the detection of small objects and performs better than baseline methods.

Cite this article as:
W. Dhammatorn, S. Ito, N. Kaneko, and K. Sumi, “Combining Instance Segmentation and Background Subtraction Models for Small Object Detection,” Int. J. Automation Technol., Vol.19 No.3, pp. 237-247, 2025.
Data files:
References
  1. [1] Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object Detection in 20 Years: A Survey,” Proc. of the IEEE, Vol.111, No.3, pp. 257-276, 2023. https://doi.org/10.1109/JPROC.2023.3238524
  2. [2] X. Zou, “A Review of Object Detection Techniques,” Proc. of the 2019 Int. Conf. on Smart Grid and Electrical Automation (ICSGEA), pp. 251-254, 2019. https://doi.org/10.1109/ICSGEA.2019.00065
  3. [3] Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, “Object Detection With Deep Learning: A Review,” IEEE Trans. on Neural Networks and Learning Systems, Vol.30, No.11, pp. 3212-3232, 2018. https://doi.org/10.1109/TNNLS.2018.2876865
  4. [4] S. Nigam, M. Khare, R. K. Srivastava, and A. Khare, “An Effective Local Feature Descriptor for Object Detection in Real Scenes,” Proc. of the 2013 IEEE Conf. on Information & Communication Technologies (ICT), pp. 244-248, 2013. https://doi.org/10.1109/CICT.2013.6558098
  5. [5] E. Guzmán-Ramírez, A. García, E. Guerrero-Ramírez, A. Orantes-Molina, O. Ramírez-Cárdenas, and I. Arroyo-Fernández, “Multi-Object Recognition Using a Feature Descriptor and Neural Classifier,” Vision Sensors-Recent Advances, 2022. https://doi.org/10.5772/intechopen.106754
  6. [6] L. Hou, Q. Liu, Z. Chen, and J. Xu, “Human Detection in Intelligent Video Surveillance: A Review,” J. Adv. Comput. Intell. Intell. Inform., Vol.22, No.7, pp. 1056-1064, 2018. https://doi.org/10.20965/jaciii.2018.p1056
  7. [7] F. Guo, J. Tang, H. Peng, and B. Zou, “Temporal-Spatial Filtering for Enhancement of Low-Light Surveillance Video,” J. Adv. Comput. Intell. Intell. Inform., Vol.20, No.4, pp. 652-661, 2016. https://doi.org/10.20965/jaciii.2016.p0652
  8. [8] Y. Hatakeyama, A. Mitsuta, and K. Hirota, “Detection Algorithm for Real Surveillance Cameras Using Geographic Information,” J. Adv. Comput. Intell. Intell. Inform., Vol.12, No.1, pp. 4-9, 2008. 10.20965/jaciii.2008.p0004
  9. [9] F. Sultana, A. Sufian, and P. Dutta, “A Review of Object Detection Models Based on Convolutional Neural Network,” Intelligent Computing: Image Processing Based Applications, pp. 1-16, 2020. https://doi.org/10.1007/978-981-15-4288-6_1
  10. [10] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single Shot MultiBox Detector,” Proc. of the 14th European Conf. on Computer Vision (ECCV), pp. 21-37, 2016. https://doi.org/10.1007/978-3-319-46448-0_2
  11. [11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016. https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.91
  12. [12] Y. Li and F. Ren, “Light-Weight RetinaNet for Object Detection,” arXiv:1905.10011, 2019. https://doi.org/10.48550/arXiv.1905.10011
  13. [13] R. Girshick, “Fast R-CNN,” Proc. of the 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 1440-1448, 2015. https://doi.org/10.1109/ICCV.2015.169
  14. [14] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.39, No.6, pp. 1137-1149, 2017. https://doi.ieeecomputersociety.org/10.1109/TPAMI.2016.2577031
  15. [15] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” Proc. of the 2017 IEEE Int. Conf. on Computer Vision (ICCV), pp. 2980-2988, 2017. https://doi.org/10.1109/ICCV.2017.322
  16. [16] Y. Liu, P. Sun, N. Wergeles, and Y. Shang, “A Survey and Performance Evaluation of Deep Learning Methods for Small Object Detection,” Expert Systems with Applications, Vol.172, Article No.114602, 2021. https://doi.org/10.1016/j.eswa.2021.114602
  17. [17] G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, X. Xie, and J. Han, “Towards Large-Scale Small Object Detection: Survey and Benchmarks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.45, No.11, pp. 13467-13488, 2023. https://doi.org/10.1109/TPAMI.2023.3290594
  18. [18] K. Tong, Y. Wu, and F. Zhou, “Recent Advances in Small Object Detection Based on Deep Learning: A Review,” Image and Vision Computing, Vol.97, Article No.103910, 2020. https://doi.org/10.1016/j.imavis.2020.103910
  19. [19] B. Mirzaei, H. Nezamabadi-Pour, A. Raoof, and R. Derakhshani, “Small Object Detection and Tracking: A Comprehensive Review,” Sensors, Vol.23, No.15, Article No.6887, 2023. https://doi.org/10.3390/s23156887
  20. [20] F. C. Akyon, S. O. Altinuc, and A. Temizel, “Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection,” Proc. of the 2022 IEEE Int. Conf. on Image Processing (ICIP), pp. 966-970, 2022. https://doi.org/10.1109/ICIP46576.2022.9897990
  21. [21] Z.-W. Sun, Z.-X. Hua, H.-C. Li, and H.-Y. Zhong, “Flying Bird Object Detection Algorithm in Surveillance Video Based on Motion Information,” IEEE Trans. on Instrumentation and Measurement, Vol.73, 2024. https://doi.org/10.1109/TIM.2023.3334348
  22. [22] F. Özge Ünel, B. O. Özkalayci, and C. Çiğla, “The Power of Tiling for Small Object Detection,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 582-591, 2019. https://doi.org/10.1109/CVPRW.2019.00084
  23. [23] Y. Bai, Y. Zhang, M. Ding, and B. Ghanem, “SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network,” Proc. of the 15th European Conf. on Computer Vision (ECCV), pp. 210-226, 2018. https://doi.org/10.1007/978-3-030-01261-8_13
  24. [24] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End Object Detection with Transformers,” Proc. of the 16th European Conf. on Computer Vision (ECCV), pp. 213-229, 2020. https://doi.org/10.1007/978-3-030-58452-8_13
  25. [25] H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum, “DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection,” arXiv:2203.03605, 2022. https://doi.org/10.48550/arXiv.2203.03605
  26. [26] A. Gupta, S. Narayan, K. J. Joseph, S. Khan, F. S. Khan, and M. Shah, “OW-DETR: Open-world Detection Transformer,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 9235-9244, 2022. https://doi.org/10.1109/CVPR52688.2022.00902
  27. [27] L. He and S. Todorovic, “DESTR: Object Detection with Split Transformer,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 9377-9386, 2022. https://doi.org/10.1109/CVPR52688.2022.00916
  28. [28] H. Nakai, “Non-Parameterized Bayes Decision Method for Moving Object Detection,” Proc. of the 2nd Asian Conf. on Computer Vision (ACCV), pp. 447-451, 1995.
  29. [29] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Principles and Practice of Background Maintenance,” Proc. of the Seventh IEEE Int. Conf. on Computer Vision (ICCV), pp. 255-261, 1999. https://doi.org/10.1109/ICCV.1999.791228
  30. [30] C. Stauffer and W. E. L. Grimson, “Adaptive Background Mixture Models for Real-Time Tracking,” Proc. of the 1999 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 246-252, 1999. https://doi.org/10.1109/CVPR.1999.784637
  31. [31] A. Darwich, P. Hébert, A. Bigand, and Y. Mohanna, “Background Subtraction Based on a New Fuzzy Mixture of Gaussians for Moving Object Detection,” J. of Imaging, Vol.4, No.7, Article No.92, 2018. https://doi.org/10.3390/jimaging4070092
  32. [32] S. Rakesh, N. P. Hegde, M. V. Gopalachari, D. Jayaram, B. Madhu, M. A. Hameed, R. Vankdothu, and L. K. S. Kumar, “Moving object detection using modified GMM based background subtraction,” Measurement: Sensors, Vol.30, Article No.100898, 2023. https://doi.org/10.1016/j.measen.2023.100898
  33. [33] A. Nurhadiyatna, W. Jatmiko, B. Hardjono, A. Wibisono, I. Sina, and P. Mursanto, “Background Subtraction Using Gaussian Mixture Model Enhanced by Hole Filling Algorithm (GMMHF),” Proc. of the 2013 IEEE Int. Conf. on Systems, Man, and Cybernetics, pp. 4006-4011, 2013. https://doi.org/10.1109/SMC.2013.684
  34. [34] R. Rifat, J. R. Mou, R. Shahariar, and A. Ahsan, “A New Approach of Moving Object Detection Using Background Subtraction Method,” Proc. of the 3rd Int. Conf. on Electrical, Computer & Telecommunication Engineering (ICECTE), pp. 256-259, 2019. https://doi.org/10.1109/ICECTE48615.2019.9303552
  35. [35] L. A. Lim and H. Y. Keles, “Learning Multi-scale Features for Foreground Segmentation,” Pattern Analysis and Applications, Vol.23, No.3, pp. 1369-1380, 2020. https://doi.org/10.1007/s10044-019-00845-9
  36. [36] P. W. Patil and S. Murala, “MSFgNet: A Novel Compact End-to-End Deep Network for Moving Object Detection,” IEEE Trans. on Intelligent Transportation Systems, Vol.20, No.11, pp. 4066-4077, 2019. https://doi.org/10.1109/TITS.2018.2880096
  37. [37] P. W. Patil, S. Murala, A. Dhall, and S. Chaudhary, “MsEDNet: Multi-Scale Deep Saliency Learning for Moving Object Detection,” Proc. of the 2018 IEEE Int. Conf. on Systems, Man, and Cybernetics (SMC), pp. 1670-1675, 2018. https://doi.org/10.1109/SMC.2018.00289
  38. [38] S. Alejandro, T. Minematsu, A. Shimada, T. Shibata, R. Taniguchi, E. Kaneko, and H. Miyano, “Semi-Automatic Learning Framework Combining Object Detection and Background Subtraction,” Proc. of the 15th Int. Joint Conf. on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP), pp. 96-106, 2020. https://doi.org/10.5220/0008941200960106
  39. [39] T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” Proc. of the 13th European Conf. on Computer Vision (ECCV), 2014. https://doi.org/10.1007/978-3-319-10602-1_48
  40. [40] C. Grana, D. Borghesani, and R. Cucchiara, “Optimized Block-based Connected Components Labeling with Decision Trees,” IEEE Trans. on Image Processing, Vol.19, No.6, pp. 1596-1609, 2010. https://doi.org/10.1109/tip.2010.2044963
  41. [41] M. Naphade, S. Wang, D. C. Anastasiu, Z. Tang, M.-C. Chang, X. Yang, Y. Yao, L. Zheng, P. Chakraborty, C. E. Lopez, A. Sharma, Q. Feng, V. Ablavsky, and S. Sclaroff, “The 5th AI City Challenge,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 4263-4273, 2021. https://doi.ieeecomputersociety.org/10.1109/CVPRW53098.2021.00482
  42. [42] Z. Tang, M. Naphade, M.-Y. Liu, X. Yang, S. Birchfield, S. Wang, R. Kumar, D. Anastasiu, and J.-N. Hwang, “CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 8797-8806, 2019. https://doi.org/10.1109/CVPR.2019.00900
  43. [43] L. Leal-Taixé, A. Milan, I. Reid, S. Roth, and K. Schindler, “MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking,” arXiv:1504.01942, 2015. https://doi.org/10.48550/arXiv.1504.01942
  44. [44] G. Jocher, “YOLOv5 by Ultralytics,” 2020. https://github.com/ultralytics/yolov5 [Accessed January 16, 2024]
  45. [45] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors,” 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 7464-7475, 2023. https://doi.org/10.1109/CVPR52729.2023.00721
  46. [46] T. Kurbiel and S. Khaleghian, “Training of Deep Neural Networks Based on Distance Measures Using RMSProp,” arXiv:1708.01911, 2017. https://doi.org/10.48550/arXiv.1708.01911
  47. [47] M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” Proc. of the 36th Int. Conf. on Machine Learning (ICML), pp. 6105-6114, 2020.
  48. [48] M. Naphade, Z. Tang, M.-C. Chang, D. C. Anastasiu, A. Sharma, R. Chellappa, S. Wang, P. Chakraborty, T. Huang, J.-N. Hwang, and S. Lyu, “The 2019 AI City Challenge,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 452-460, 2019.
  49. [49] S. Zhang, G. Wu, J. P. Costeira, and J. F. Moura, “Understanding Traffic Density from Large-Scale Web Camera Data,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4264-4273, 2017. https://doi.org/10.1109/CVPR.2017.454
  50. [50] Y. Deng, P. Luo, C. C. Loy, and X. Tang, “Pedestrian Attribute Recognition At Far Distance,” Proc. of the ACM Int. Conf. on Multimedia (MM), pp. 789-792, 2014.
  51. [51] V. Tsakanikas and T. Dagiuklas, “Video Surveillance Systems-Current Status and Future Trends,” Computers & Electrical Engineering, Vol.70, pp. 736-753, 2018. https://doi.org/10.1016/j.compeleceng.2017.11.011

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on May. 08, 2025