how to combine small files in hadoop


Merging Small Files Into Avro File

This post is a continuation for previous post on working with small files issue. In previous we have merged huge number of small files on HDFS directory into sequencefile and in this post we will merge huge number of small files on local file system into avro file on HDFS output directory. We will store the file names as keys and file contents as values in Avro file. We will […]


Merging Small Files into SequenceFile 7

In this post, we will discuss one of the famous use case of SequenceFiles, where we will merge large number of small files into SequenceFile. We will get to this requirement mainly due to the lack efficient processing of large number of small files in hadoop or mapreduce. Need For Merging Small Files: As hadoop stores all the HDFS files metadata in namenode’s main memory(which is a limited value) for […]


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.