A Method of Constructing a Food Classification Image Dataset by Cleansing Web-Crawling Data

Kazuki Kiryu; Masaki Miyamoto; Akio Nakamura

doi:10.20965/ijat.2025.p0608

single-au.php

« previous

IJAT Vol.19 No.4 pp. 608-617

doi: 10.20965/ijat.2025.p0608

(2025)

Technical Paper:

Views over last 60 days: 4,453

A Method of Constructing a Food Classification Image Dataset by Cleansing Web-Crawling Data

Kazuki Kiryu^†, Masaki Miyamoto, and Akio Nakamura

Tokyo Denki University
5 Senju-Asahi-cho, Adachi-ku, Tokyo 120-8551, Japan

^†Corresponding author

Received:

November 29, 2024

Accepted:

February 5, 2025

Published:

July 5, 2025

Keywords:

food recognition, convolutional neural network, web crawling, data cleansing

Abstract

We propose the construction of image datasets via data cleansing for food recognition using a convolutional neural network (CNN). A dataset was constructed by collecting food images and classes from web crawling sites that post cooking recipes. The collected images included images that cannot be effectively learned by the CNN. Examples include images of foods that look extremely similar to other foods, or images with mismatched foods and classes. Here, these images were termed “content and description discrepancy images.” The number of images was reduced using two criteria based on the food recognition results obtained using CNNs. The first criterion was a threshold for the difference in the estimated probabilities, and the second was whether the estimated class and food class matched. These criteria were applied using multiple classifiers. Based on the results, the dataset size was reduced and a new image dataset was constructed. A CNN was trained on the constructed image dataset, and the food recognition accuracy was calculated and compared using a test dataset. The results showed that the accuracy using the dataset constructed using the proposed method was 7.4% higher than that of the case using web crawling. This study demonstrates that the proposed method can efficiently construct a food image dataset, demonstrating the data-cleansing effect of the two selected criteria.

Cite this article as:

K. Kiryu, M. Miyamoto, and A. Nakamura, “A Method of Constructing a Food Classification Image Dataset by Cleansing Web-Crawling Data,” Int. J. Automation Technol., Vol.19 No.4, pp. 608-617, 2025.

Data files:

References

[1] D. Sahoo, W. Hao, S. Ke, W. Xiongwei, H. Le, P. Achananuparp, E.-P. Lim, and S. C. H. Hoi, “FoodAI: Food image recognition via deep learning for smart food logging,” Proc. of the 25th ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining (KDD’19), pp. 2260-2268, 2019. https://doi.org/10.1145/3292500.3330734
[2] S. M. Ahmed, D. Joshitha, A. Swathika, S. Chandana, Sahhas, and V. K. Gunjan, “Dietary assessment by food image logging based on food calorie estimation implemented using deep learning,” Int. Conf. on Communications and Cyber Physical Engineering (ICCCE), pp. 1141-1148, 2023. https://doi.org/10.1007/978-981-19-8086-2_107
[3] T. Oduru, A. Jordan, and A. Park, “Healthy vs. unhealthy food images: Image classification of Twitter images,” Int. J. Environ. Res. Public Health, Vol.19, No.2, Article No.923, 2022. https://doi.org/10.3390/ijerph19020923
[4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” Computer Vision and Pattern Recognition (CVPR), pp. 248-255, 2009. https://doi.org/10.1109/CVPR.2009.5206848
[5] L. Bossard, M. Cuillaumin, and L. V. Gool, “Food-101 mining discriminative components with random forests,” European Conf. on Computer Vision (ECCV), pp. 446-461, 2014. https://doi.org/10.1007/978-3-319-10599-4_29
[6] W. Min, Z. Wang, Y. Liu, M. Luo, L. Kang, X. Wei, X. Wei, and S. Jiang, “Large scale visual food recognition,” Trans. on Pattern Analysis and Machine Intelligence (TPAMI), Vol.45, pp. 9932-9949, 2023. https://doi.org/10.1109/TPAMI.2023.3237871
[7] X. Zhu and X. Wu, “Class noise vs. attribute noise: A quantitative study,” Artificial Intelligence Review, Vol.22, pp. 177-201, 2004. https://doi.org/10.1007/s10462-004-0751-8
[8] J. A. Sáez, M. Galar, J. Lugengo, and F. Herrera, “INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control,” ELSEVIER Information Fusion, Vol.27, pp. 19-32, 2016. https://doi.org/10.1016/j.inffus.2015.04.002
[9] P. Kaur, K. Sikka, W. Wang, S. Belongie, and A. Divakaran, “FoodX-251: A dataset for fine-grained food classification,” arXiv:1907.06167, 2019. https://doi.org/10.48550/arXiv.1907.06167
[10] K. Okamoto and K. Yanai, “UEC-FoodPix Complete: A large-scale food image segmentation dataset,” Int. Conf. on Pattern Recognition (ICPR), pp. 647-659, 2021. https://doi.org/10.1007/978-3-030-68821-9_51
[11] J. Qin, F. P.-W. Lo, Y. Sun, S. Wang, and B. Lo, “Mining discriminative food regions for accurate food recognition,” arXiv:2207.03692, 2019. https://doi.org/10.48550/arXiv.2207.03692
[12] X. Chen, Y. Zhu, H. Zhou, L. Diao, and D. Wang, “ChineseFoodNet: A large-scale image dataset for Chinese food recognition,” arXiv:1705.02743, 2017. https://doi.org/10.48550/arXiv.1705.02743
[13] J. Jeon, V. Lavrenko, and R. Manmatha, “Automatic image annotation and retrieval using cross-media relevance models,” Proc. of the 26th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR’03), pp. 119-126, 2003. https://doi.org/10.1145/860435.860459
[14] A. Nech and I. Kemelmacher-Shilizerman, “Level playing field for million scale face recognition,” Computer Vision and Pattern Recognition (CVPR), pp. 7044-7053, 2017. https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.363
[15] Z. Zhu, G. Huang, J. Deng, Y. Ye, J. Huang, X. Chen, J. Zhu, T. Yang, J. Lu, D. Du, and J. Zhou, “WebFace260M: A benchmark unveiling the power of million-scale deep face recognition,” Computer Vision and Pattern Recognition (CVPR), pp. 10492-10502, 2021. http://doi.org/10.1109/CVPR46437.2021.01035
[16] W. Min, L. Liu, Z. Wang, Z. Luo, X. Wei, X. Wei, and S. Jiang, “ISIA Food-500: A dataset for large-scale food recognition via stacked global-local attention network,” Proc. of the 28th ACM Int. Conf. on Multimedia (MM’20), pp. 393-401, 2020. https://doi.org/10.1145/3394171.3414031
[17] M. Miyamoto and A. Nakamura, “Study on constructing a dataset by web crawling for food recognition,” Vision Engineering Workshop (ViEW), pp. 1-5, 2020 (in Japanese).
[18] Cookpad Official Website. https://cookpad.com/jp [Accessed May 28, 2020]
[19] M. Sakurayama, K. Watabe, M. Miyamoto, S. Morita, and A. Nakamura, “Proposal for a dataset construction method in food recognition,” Dynamic Image Processing for Real Application Workshop (DIA), 2023 (in Japanese).
[20] Y. Kawano and K. Yanai, “Automatic expansion of a food image dataset leveraging existing categories with domain adaptation,” European Conf. on Computer Vision (ECCV), pp. 3-17, 2014. https://doi.org/10.1007/978-3-319-16199-0_1
[21] M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” Proc. of the 36th Int. Conf. Machine Learning Research (PMLR), pp. 6105-6114, 2019. https://doi.org/10.48550/arXiv.1905.11946
[22] T. He, Z. Zhang, H. Zhang, J. Xie, and M. Li, “Bag of tricks for image classification with convolutional neural networks,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 558-567, 2019. http://doi.org/10.1109/CVPR.2019.00065
[23] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1251-1258, 2017. https://doi.org/10.1109/CVPR.2017.195

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] D. Sahoo, W. Hao, S. Ke, W. Xiongwei, H. Le, P. Achananuparp, E.-P. Lim, and S. C. H. Hoi, “FoodAI: Food image recognition via deep learning for smart food logging,” Proc. of the 25th ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining (KDD’19), pp. 2260-2268, 2019. https://doi.org/10.1145/3292500.3330734

[2] [2] S. M. Ahmed, D. Joshitha, A. Swathika, S. Chandana, Sahhas, and V. K. Gunjan, “Dietary assessment by food image logging based on food calorie estimation implemented using deep learning,” Int. Conf. on Communications and Cyber Physical Engineering (ICCCE), pp. 1141-1148, 2023. https://doi.org/10.1007/978-981-19-8086-2_107

[3] [3] T. Oduru, A. Jordan, and A. Park, “Healthy vs. unhealthy food images: Image classification of Twitter images,” Int. J. Environ. Res. Public Health, Vol.19, No.2, Article No.923, 2022. https://doi.org/10.3390/ijerph19020923

[4] [4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” Computer Vision and Pattern Recognition (CVPR), pp. 248-255, 2009. https://doi.org/10.1109/CVPR.2009.5206848

[5] [5] L. Bossard, M. Cuillaumin, and L. V. Gool, “Food-101 mining discriminative components with random forests,” European Conf. on Computer Vision (ECCV), pp. 446-461, 2014. https://doi.org/10.1007/978-3-319-10599-4_29

[6] [6] W. Min, Z. Wang, Y. Liu, M. Luo, L. Kang, X. Wei, X. Wei, and S. Jiang, “Large scale visual food recognition,” Trans. on Pattern Analysis and Machine Intelligence (TPAMI), Vol.45, pp. 9932-9949, 2023. https://doi.org/10.1109/TPAMI.2023.3237871

[7] [7] X. Zhu and X. Wu, “Class noise vs. attribute noise: A quantitative study,” Artificial Intelligence Review, Vol.22, pp. 177-201, 2004. https://doi.org/10.1007/s10462-004-0751-8

[8] [8] J. A. Sáez, M. Galar, J. Lugengo, and F. Herrera, “INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control,” ELSEVIER Information Fusion, Vol.27, pp. 19-32, 2016. https://doi.org/10.1016/j.inffus.2015.04.002

[9] [9] P. Kaur, K. Sikka, W. Wang, S. Belongie, and A. Divakaran, “FoodX-251: A dataset for fine-grained food classification,” arXiv:1907.06167, 2019. https://doi.org/10.48550/arXiv.1907.06167

[10] [10] K. Okamoto and K. Yanai, “UEC-FoodPix Complete: A large-scale food image segmentation dataset,” Int. Conf. on Pattern Recognition (ICPR), pp. 647-659, 2021. https://doi.org/10.1007/978-3-030-68821-9_51

[11] [11] J. Qin, F. P.-W. Lo, Y. Sun, S. Wang, and B. Lo, “Mining discriminative food regions for accurate food recognition,” arXiv:2207.03692, 2019. https://doi.org/10.48550/arXiv.2207.03692

[12] [12] X. Chen, Y. Zhu, H. Zhou, L. Diao, and D. Wang, “ChineseFoodNet: A large-scale image dataset for Chinese food recognition,” arXiv:1705.02743, 2017. https://doi.org/10.48550/arXiv.1705.02743

[13] [13] J. Jeon, V. Lavrenko, and R. Manmatha, “Automatic image annotation and retrieval using cross-media relevance models,” Proc. of the 26th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR’03), pp. 119-126, 2003. https://doi.org/10.1145/860435.860459

[14] [14] A. Nech and I. Kemelmacher-Shilizerman, “Level playing field for million scale face recognition,” Computer Vision and Pattern Recognition (CVPR), pp. 7044-7053, 2017. https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.363

[15] [15] Z. Zhu, G. Huang, J. Deng, Y. Ye, J. Huang, X. Chen, J. Zhu, T. Yang, J. Lu, D. Du, and J. Zhou, “WebFace260M: A benchmark unveiling the power of million-scale deep face recognition,” Computer Vision and Pattern Recognition (CVPR), pp. 10492-10502, 2021. http://doi.org/10.1109/CVPR46437.2021.01035

[16] [16] W. Min, L. Liu, Z. Wang, Z. Luo, X. Wei, X. Wei, and S. Jiang, “ISIA Food-500: A dataset for large-scale food recognition via stacked global-local attention network,” Proc. of the 28th ACM Int. Conf. on Multimedia (MM’20), pp. 393-401, 2020. https://doi.org/10.1145/3394171.3414031

[17] [17] M. Miyamoto and A. Nakamura, “Study on constructing a dataset by web crawling for food recognition,” Vision Engineering Workshop (ViEW), pp. 1-5, 2020 (in Japanese).

[18] [18] Cookpad Official Website. https://cookpad.com/jp [Accessed May 28, 2020]

[19] [19] M. Sakurayama, K. Watabe, M. Miyamoto, S. Morita, and A. Nakamura, “Proposal for a dataset construction method in food recognition,” Dynamic Image Processing for Real Application Workshop (DIA), 2023 (in Japanese).

[20] [20] Y. Kawano and K. Yanai, “Automatic expansion of a food image dataset leveraging existing categories with domain adaptation,” European Conf. on Computer Vision (ECCV), pp. 3-17, 2014. https://doi.org/10.1007/978-3-319-16199-0_1

[21] [21] M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” Proc. of the 36th Int. Conf. Machine Learning Research (PMLR), pp. 6105-6114, 2019. https://doi.org/10.48550/arXiv.1905.11946

[22] [22] T. He, Z. Zhang, H. Zhang, J. Xie, and M. Li, “Bag of tricks for image classification with convolutional neural networks,” Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 558-567, 2019. http://doi.org/10.1109/CVPR.2019.00065

[23] [23] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1251-1258, 2017. https://doi.org/10.1109/CVPR.2017.195

A Method of Constructing a Food Classification Image Dataset by Cleansing Web-Crawling Data

Kazuki Kiryu†, Masaki Miyamoto, and Akio Nakamura

Kazuki Kiryu^†, Masaki Miyamoto, and Akio Nakamura