Anno-Mate: Human–Machine Collaboration Features for Fast Annotation

John Anthony C. Jose; Meygen D. Cruz; Jefferson James U. Keh; Maverick Rivera; Edwin Sybingco; Elmer P. Dadios

doi:10.20965/jaciii.2021.p0404

single-jc.php

« previous

JACIII Vol.25 No.4 pp. 404-409

(2021)

doi: 10.20965/jaciii.2021.p0404

Paper:

Views over last 60 days: 1,355

Anno-Mate: Human–Machine Collaboration Features for Fast Annotation

John Anthony C. Jose^†, Meygen D. Cruz, Jefferson James U. Keh, Maverick Rivera, Edwin Sybingco, and Elmer P. Dadios

De La Salle University
2401 Taft Avenue, Manila 1004, Philippines

^†Corresponding author

Received:

February 10, 2021

Accepted:

April 15, 2021

Published:

July 20, 2021

Keywords:

video annotations, object detection, object tracking, annotations, human–machine collaboration

Abstract

Large annotated datasets are crucial for training deep machine learning models, but they are expensive and time-consuming to create. There are already numerous public datasets, but a vast amount of unlabeled data, especially video data, can still be annotated and leveraged to further improve the performance and accuracy of machine learning models. Therefore, it is essential to reduce the time and effort required to annotate a dataset to prevent bottlenecks in the development of this field. In this study, we propose Anno-Mate, a pair of features integrated into the Computer Vision Annotation Tool (CVAT). It facilitates human–machine collaboration and reduces the required human effort. Anno-Mate comprises Auto-Fit, which uses an EfficientDet-D0 backbone to tighten an existing bounding box around an object, and AutoTrack, which uses a channel and spatial reliability tracking (CSRT) tracker to draw a bounding box on the target object as it moves through the video frames. Both features exhibit a good speed and accuracy trade-off. Auto-Fit garnered an overall accuracy of 87% and an average processing time of 0.47 s, whereas the AutoTrack feature exhibited an overall accuracy of 74.29% and could process 18.54 frames per second. When combined, these features are proven to reduce the time required to annotate a minute of video by 26.56%.

Cite this article as:

J. Jose, M. Cruz, J. Keh, M. Rivera, E. Sybingco, and E. Dadios, “Anno-Mate: Human–Machine Collaboration Features for Fast Annotation,” J. Adv. Comput. Intell. Intell. Inform., Vol.25 No.4, pp. 404-409, 2021.

Data files:

References

[1] M. A. Rosales et al., “Artificial Intelligence: The Technology Adoption and Impact in the Philippines,” 2020 IEEE 12th Int. Conf. on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), doi: 10.1109/HNICEM51456.2020.9400025, 2020.
[2] H. L. Aquino et al., “Trend Forecasting of Computer Vision Application in Aquaponic Cropping Systems Industry,” 2020 IEEE 12th Int. Conf. on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), doi: 10.1109/HNICEM51456.2020.9400070, 2020.
[3] A. Vahdat, “Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks,” Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS), pp. 5601-5610, 2017.
[4] B. Sekachev, M. Nikita, and Z. Andrey, “Computer vision annotation tool: A universal approach to data annotation,” 2019, https://software.intel.com/content/www/us/en/develop/articles/computer-vision-annotation-tool-a-universal-approach-to-data-annotation.html [accessed October 4, 2020]
[5] T.-N. Le et al., “Toward Interactive Self-Annotation For Video Object Bounding Box: Recurrent Self-Learning And Hierarchical Annotation Based Framework,” 2020 IEEE Winter Conf. on Applications of Computer Vision (WACV), pp. 3220-3229, doi: 10.1109/WACV45572.2020.9093398, 2020.
[6] M. Andriluka, J. R. R. Uijlings, and V. Ferrari, “Fluid Annotation,” Proc. of the 26th ACM Int. Conf. on Multimedia (MM ’18), pp. 1957-1966, doi: 10.1145/3240508.3241916, 2018.
[7] C. Vondrick and D. Ramanan, “Video Annotation and Tracking with Active Learning,” Proc. of the 24th Int. Conf. on Neural Information Processing Systems (NIPS), pp. 28-36, 2011.
[8] J. J. Keh et al., “Video-Based Gender Profiling on Challenging Camera Viewpoint for Restaurant Data Analytics,” 2020 IEEE 12th Int. Conf. on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), doi: 10.1109/HNICEM51456.2020.9400115, 2020.
[9] R. G. de Luna et al., “Tomato Fruit Image Dataset for Deep Transfer Learning-based Defect Detection,” 2019 IEEE Int. Conf. on Cybernetics and Intelligent Systems (CIS) and IEEE Conf. on Robotics, Automation and Mechatronics (RAM), pp. 356-361, doi: 10.1109/CIS-RAM47153.2019.9095778, 2019.
[10] R. R. N. M. I. Tobias et al., “CNN-based Deep Learning Model for Chest X-ray Health Classification Using TensorFlow,” 2020 RIVF Int. Conf. on Computing and Communication Technologies (RIVF), doi: 10.1109/RIVF48685.2020.9140733, 2020.
[11] M. Guillermo et al., “Detection and Classification of Public Security Threats in the Philippines Using Neural Networks,” 2020 IEEE 2nd Global Conference on Life Sciences and Technologies (LifeTech), pp. 320-324, doi: 10.1109/LifeTech48969.2020.1570619075, 2020.
[12] Y. Zhou and Y. Wu, “Analyses on Influence of Training Data Set to Neural Network Supervised Learning Performance,” D. Jin and S. Lin (Eds.), “Advances in Computer Science, Intelligent System and Environment,” pp. 19-25, Springer, doi: 10.1007/978-3-642-23753-9_4, 2011.
[13] R. K. C. Billones et al., “Visual Percepts Quality Recognition Using Convolutional Neural Networks,” Proc. of the 2019 Computer Vision Conf. (CVC), pp. 652-665, 2019.
[14] M. S. Ibrahim et al., “Semi-Supervised Semantic Image Segmentation With Self-Correcting Networks,” 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), doi: 10.1109/CVPR42600.2020.01273, 2020.
[15] O. Russakovsky, L.-J. Li, and L. Fei-Fei, “Best of both worlds: Human-machine collaboration for object annotation,” 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), doi: 10.1109/CVPR.2015.7298824, 2015.
[16] M. Cruz et al., “Auto-Fit: A Human-Machine Collaboration Feature for Fitting Bounding Box Annotations,” 2020 IEEE 12th Int. Conf. on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), doi: 10.1109/HNICEM51456.2020.9400067, 2020.
[17] J. Keh et al., “AutoTrack: Interactive Visual Object Tracking for Efficient Object Annotations,” 2020 IEEE 12th Int. Conf. on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), doi: 10.1109/HNICEM51456.2020.9400096, 2020.
[18] O. T. Openvinotoolkit, “Powerful and efficient Computer Vision Annotation Tool (CVAT),” GitHub, https://github.com/openvinotoolkit/cvat [accessed October 4, 2020]
[19] N. Manovich, “CVAT 0.4.0: semi-automatic segmentation vs manual,” 2019, https://youtu.be/vnqXZ-Z-VTQ [accessed October 4, 2020]
[20] C. Rother, V. Kolmogorov, and A. Blake, “GrabCut: interactive foreground extraction using iterated graph cuts,” ACM SIGGRAPH 2004 Papers (SIGGRAPH ’04), pp. 309-314, doi: 10.1145/1186562.1015720, 2004.
[21] OpenCV, “Interactive Foreground Extraction using GrabCut Algorithm,” http://docs.opencv.org/master/d8/d83/tutorial_py_grabcut.html [accessed October 4, 2020]
[22] M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 10781-10790, doi: 10.1109/CVPR42600.2020.01079, 2020.
[23] M. Kristan et al., “The Visual Object Tracking VOT2017 Challenge Results,” 2017 IEEE Int. Conf. on Computer Vision Workshops (ICCVW), pp. 1949-1972, doi: 10.1109/ICCVW.2017.230, 2017.
[24] J. F. Henriques et al., “High-speed tracking with kernelized correlation filters,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.37, No.3, pp. 583-596, doi: 10.1109/TPAMI.2014.2345390, 2015.
[25] A. Lukežič et al., “Discriminative Correlation Filter Tracker with Channel and Spatial Reliability,” Int. J. of Computer Vision, Vol.126, No.7, pp. 671-688, doi: 10.1007/s11263-017-1061-3, 2018.
[26] M. Danelljan et al., “ATOM: Accurate Tracking by Overlap Maximization,” 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), doi: 10.1109/CVPR.2019.00479, 2019.
[27] M. Kristan et al., “The Eighth Visual Object Tracking VOT2020 Challenge Results,” Computer Vision – ECCV 2020 Workshops, pp. 547-601, 2020.
[28] M. Kristan et al., “The Seventh Visual Object Tracking VOT2019 Challenge Results,” 2019 IEEE/CVF Int. Conf. on Computer Vision Workshop (ICCVW), doi: 10.1109/ICCVW.2019.00276, 2019.
[29] L. Čehovin, A. Leonardis, and M. Kristan, “Visual object tracking performance measures revisited,” IEEE Trans. on Image Processing, Vol.25, No.3, pp. 1261-1274, doi: 10.1109/TIP.2016.2520370, 2016.
[30] M. Guillermo et al., “Implementation of Automated Annotation through Mask RCNN Object Detection model in CVAT using AWS EC2 Instance,” 2020 IEEE Region 10 Conf. (TENCON), pp. 708-713, doi: 10.1109/TENCON50793.2020.9293906, 2020.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] M. A. Rosales et al., “Artificial Intelligence: The Technology Adoption and Impact in the Philippines,” 2020 IEEE 12th Int. Conf. on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), doi: 10.1109/HNICEM51456.2020.9400025, 2020.

[B2] [2] H. L. Aquino et al., “Trend Forecasting of Computer Vision Application in Aquaponic Cropping Systems Industry,” 2020 IEEE 12th Int. Conf. on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), doi: 10.1109/HNICEM51456.2020.9400070, 2020.

[B3] [3] A. Vahdat, “Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks,” Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS), pp. 5601-5610, 2017.

[B4] [4] B. Sekachev, M. Nikita, and Z. Andrey, “Computer vision annotation tool: A universal approach to data annotation,” 2019, https://software.intel.com/content/www/us/en/develop/articles/computer-vision-annotation-tool-a-universal-approach-to-data-annotation.html [accessed October 4, 2020]

[B5] [5] T.-N. Le et al., “Toward Interactive Self-Annotation For Video Object Bounding Box: Recurrent Self-Learning And Hierarchical Annotation Based Framework,” 2020 IEEE Winter Conf. on Applications of Computer Vision (WACV), pp. 3220-3229, doi: 10.1109/WACV45572.2020.9093398, 2020.

[B6] [6] M. Andriluka, J. R. R. Uijlings, and V. Ferrari, “Fluid Annotation,” Proc. of the 26th ACM Int. Conf. on Multimedia (MM ’18), pp. 1957-1966, doi: 10.1145/3240508.3241916, 2018.

[B7] [7] C. Vondrick and D. Ramanan, “Video Annotation and Tracking with Active Learning,” Proc. of the 24th Int. Conf. on Neural Information Processing Systems (NIPS), pp. 28-36, 2011.

[B8] [8] J. J. Keh et al., “Video-Based Gender Profiling on Challenging Camera Viewpoint for Restaurant Data Analytics,” 2020 IEEE 12th Int. Conf. on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), doi: 10.1109/HNICEM51456.2020.9400115, 2020.

[B9] [9] R. G. de Luna et al., “Tomato Fruit Image Dataset for Deep Transfer Learning-based Defect Detection,” 2019 IEEE Int. Conf. on Cybernetics and Intelligent Systems (CIS) and IEEE Conf. on Robotics, Automation and Mechatronics (RAM), pp. 356-361, doi: 10.1109/CIS-RAM47153.2019.9095778, 2019.

[B10] [10] R. R. N. M. I. Tobias et al., “CNN-based Deep Learning Model for Chest X-ray Health Classification Using TensorFlow,” 2020 RIVF Int. Conf. on Computing and Communication Technologies (RIVF), doi: 10.1109/RIVF48685.2020.9140733, 2020.

[B11] [11] M. Guillermo et al., “Detection and Classification of Public Security Threats in the Philippines Using Neural Networks,” 2020 IEEE 2nd Global Conference on Life Sciences and Technologies (LifeTech), pp. 320-324, doi: 10.1109/LifeTech48969.2020.1570619075, 2020.

[B12] [12] Y. Zhou and Y. Wu, “Analyses on Influence of Training Data Set to Neural Network Supervised Learning Performance,” D. Jin and S. Lin (Eds.), “Advances in Computer Science, Intelligent System and Environment,” pp. 19-25, Springer, doi: 10.1007/978-3-642-23753-9_4, 2011.

[B13] [13] R. K. C. Billones et al., “Visual Percepts Quality Recognition Using Convolutional Neural Networks,” Proc. of the 2019 Computer Vision Conf. (CVC), pp. 652-665, 2019.

[B14] [14] M. S. Ibrahim et al., “Semi-Supervised Semantic Image Segmentation With Self-Correcting Networks,” 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), doi: 10.1109/CVPR42600.2020.01273, 2020.

[B15] [15] O. Russakovsky, L.-J. Li, and L. Fei-Fei, “Best of both worlds: Human-machine collaboration for object annotation,” 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), doi: 10.1109/CVPR.2015.7298824, 2015.

[B16] [16] M. Cruz et al., “Auto-Fit: A Human-Machine Collaboration Feature for Fitting Bounding Box Annotations,” 2020 IEEE 12th Int. Conf. on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), doi: 10.1109/HNICEM51456.2020.9400067, 2020.

[B17] [17] J. Keh et al., “AutoTrack: Interactive Visual Object Tracking for Efficient Object Annotations,” 2020 IEEE 12th Int. Conf. on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), doi: 10.1109/HNICEM51456.2020.9400096, 2020.

[B18] [18] O. T. Openvinotoolkit, “Powerful and efficient Computer Vision Annotation Tool (CVAT),” GitHub, https://github.com/openvinotoolkit/cvat [accessed October 4, 2020]

[B19] [19] N. Manovich, “CVAT 0.4.0: semi-automatic segmentation vs manual,” 2019, https://youtu.be/vnqXZ-Z-VTQ [accessed October 4, 2020]

[B20] [20] C. Rother, V. Kolmogorov, and A. Blake, “GrabCut: interactive foreground extraction using iterated graph cuts,” ACM SIGGRAPH 2004 Papers (SIGGRAPH ’04), pp. 309-314, doi: 10.1145/1186562.1015720, 2004.

[B21] [21] OpenCV, “Interactive Foreground Extraction using GrabCut Algorithm,” http://docs.opencv.org/master/d8/d83/tutorial_py_grabcut.html [accessed October 4, 2020]

[B22] [22] M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 10781-10790, doi: 10.1109/CVPR42600.2020.01079, 2020.

[B23] [23] M. Kristan et al., “The Visual Object Tracking VOT2017 Challenge Results,” 2017 IEEE Int. Conf. on Computer Vision Workshops (ICCVW), pp. 1949-1972, doi: 10.1109/ICCVW.2017.230, 2017.

[B24] [24] J. F. Henriques et al., “High-speed tracking with kernelized correlation filters,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.37, No.3, pp. 583-596, doi: 10.1109/TPAMI.2014.2345390, 2015.

[B25] [25] A. Lukežič et al., “Discriminative Correlation Filter Tracker with Channel and Spatial Reliability,” Int. J. of Computer Vision, Vol.126, No.7, pp. 671-688, doi: 10.1007/s11263-017-1061-3, 2018.

[B26] [26] M. Danelljan et al., “ATOM: Accurate Tracking by Overlap Maximization,” 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), doi: 10.1109/CVPR.2019.00479, 2019.

[B27] [27] M. Kristan et al., “The Eighth Visual Object Tracking VOT2020 Challenge Results,” Computer Vision – ECCV 2020 Workshops, pp. 547-601, 2020.

[B28] [28] M. Kristan et al., “The Seventh Visual Object Tracking VOT2019 Challenge Results,” 2019 IEEE/CVF Int. Conf. on Computer Vision Workshop (ICCVW), doi: 10.1109/ICCVW.2019.00276, 2019.

[B29] [29] L. Čehovin, A. Leonardis, and M. Kristan, “Visual object tracking performance measures revisited,” IEEE Trans. on Image Processing, Vol.25, No.3, pp. 1261-1274, doi: 10.1109/TIP.2016.2520370, 2016.

[B30] [30] M. Guillermo et al., “Implementation of Automated Annotation through Mask RCNN Object Detection model in CVAT using AWS EC2 Instance,” 2020 IEEE Region 10 Conf. (TENCON), pp. 708-713, doi: 10.1109/TENCON50793.2020.9293906, 2020.

Anno-Mate: Human–Machine Collaboration Features for Fast Annotation

John Anthony C. Jose†, Meygen D. Cruz, Jefferson James U. Keh, Maverick Rivera, Edwin Sybingco, and Elmer P. Dadios

John Anthony C. Jose^†, Meygen D. Cruz, Jefferson James U. Keh, Maverick Rivera, Edwin Sybingco, and Elmer P. Dadios