Storage-Optimization Method for Massive Small Files of Agricultural Resources Based on Hadoop

Jun Liu

doi:10.20965/jaciii.2019.p0634

single-jc.php

« previous

JACIII Vol.23 No.4 pp. 634-640

(2019)

doi: 10.20965/jaciii.2019.p0634

Paper:

Views over last 60 days: 7,030

Storage-Optimization Method for Massive Small Files of Agricultural Resources Based on Hadoop

Jun Liu

Facility Horticulture Laboratory of Universities in Shandong, Weifang University of Science and Technology
Weifang, Shandong 262700, China

Received:

July 16, 2018

Accepted:

January 4, 2019

Published:

July 20, 2019

Keywords:

Hadoop, massive small files, merged small files, prefetching and cache

Abstract

The main function of Hadoop is the storage and processing of big data, especially the processing of large datasets. However, in practice, there are numerous small files, and Hadoop has many flaws when dealing with these small files. A storage-optimization method for numerous agricultural resource small files based on Hadoop is proposed, using the precursor and subsequent relationship between different small files of agricultural resources to merge small files. By accessing small files and performing metadata caching through an index mechanism, as well as the prefetching mechanism of associated small files, the storage-optimization method improves the reading efficiency. Experimental results show that this method reduces the memory consumption of the Hadoop name node and improves the performance of the system.

Cite this article as:

J. Liu, “Storage-Optimization Method for Massive Small Files of Agricultural Resources Based on Hadoop,” J. Adv. Comput. Intell. Intell. Inform., Vol.23 No.4, pp. 634-640, 2019.

Data files:

References

[1] J. H. Lu, “Hadoop in Action,” Mechanical Industry Press, 2012.
[2] D. Borthakur, “The Hadoop distributed file system: architecture and design,” https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.10.0/docs/hdfs_design.pdf [accessed May 6, 2018]
[3] T. White, “The small files problem,” http://www.cloudera.com/blog/2009/02/the-small-files-problem [accessed May 6, 2018]
[4] F. Yang, H. Wu, H, Zhu et al., “Hadoop - based massive agricultural data resource management platform,” Computer Engineering, Vol.37, No.12, pp. 242-244, 2011.
[5] L. J. Li, “Research and Optimization of Small Files Processing Techniques in Hadoop,” Hebei University, 2011.
[6] S. L. Zhang, D. J. Yang, and Y. B. Han, “Optimization of Reception and Storage for Massive Small Files,” J. of Chinese Computer Systems,Vol.8, pp. 1747-1751, 2015.
[7] J. Ding, F. Zheng, Y. Li, Y. Luo, and W. Cao, “Method of distributed multi-level storage of massive small files of air logistics based on NoSQL,” Computer Application Research, Vol.5, pp. 1-7, 2017.
[8] L. Guo, W. Wang, H. Wang, X. Li, Y. Yang, and Z. Sun, “Innovation of China’s Grass-Root Agricultural Extension Team With ICTs,” Cross-Cultural Communication, Vol.10, No.5, pp. 44-48, 2014.
[9] M. Li, S. Cao, and Z. Qin, “Storage Optimization Method of Small Files Based on Hadoop,” J. of University of Electronic Science and Technology of China, Vol.45, No.1, pp. 141-145, 2016.
[10] S. Chandrasekar, R. Dakshinamurthy, P. G. Seshakumar et al., “A Novel Indexing Scheme for Efficient Handling of Small Files in Hadoop Distributed File System,” Int. Conf. on Computer Communication and Informatics (ICCCI), pp. 1-8, 2013.
[11] N-W. Qian, W. B. Guo, and G-S. Fan, “Approach of Distributed Small File Storage Based on Association Rule Mining,” J. of East China University of Science and Technology, Vol.5, pp. 708-714, 2016.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] J. H. Lu, “Hadoop in Action,” Mechanical Industry Press, 2012.

[B2] [2] D. Borthakur, “The Hadoop distributed file system: architecture and design,” https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.10.0/docs/hdfs_design.pdf [accessed May 6, 2018]

[B3] [3] T. White, “The small files problem,” http://www.cloudera.com/blog/2009/02/the-small-files-problem [accessed May 6, 2018]

[B4] [4] F. Yang, H. Wu, H, Zhu et al., “Hadoop - based massive agricultural data resource management platform,” Computer Engineering, Vol.37, No.12, pp. 242-244, 2011.

[B5] [5] L. J. Li, “Research and Optimization of Small Files Processing Techniques in Hadoop,” Hebei University, 2011.

[B6] [6] S. L. Zhang, D. J. Yang, and Y. B. Han, “Optimization of Reception and Storage for Massive Small Files,” J. of Chinese Computer Systems,Vol.8, pp. 1747-1751, 2015.

[B7] [7] J. Ding, F. Zheng, Y. Li, Y. Luo, and W. Cao, “Method of distributed multi-level storage of massive small files of air logistics based on NoSQL,” Computer Application Research, Vol.5, pp. 1-7, 2017.

[B8] [8] L. Guo, W. Wang, H. Wang, X. Li, Y. Yang, and Z. Sun, “Innovation of China’s Grass-Root Agricultural Extension Team With ICTs,” Cross-Cultural Communication, Vol.10, No.5, pp. 44-48, 2014.

[B9] [9] M. Li, S. Cao, and Z. Qin, “Storage Optimization Method of Small Files Based on Hadoop,” J. of University of Electronic Science and Technology of China, Vol.45, No.1, pp. 141-145, 2016.

[B10] [10] S. Chandrasekar, R. Dakshinamurthy, P. G. Seshakumar et al., “A Novel Indexing Scheme for Efficient Handling of Small Files in Hadoop Distributed File System,” Int. Conf. on Computer Communication and Informatics (ICCCI), pp. 1-8, 2013.

[B11] [11] N-W. Qian, W. B. Guo, and G-S. Fan, “Approach of Distributed Small File Storage Based on Association Rule Mining,” J. of East China University of Science and Technology, Vol.5, pp. 708-714, 2016.