Daily Archives: April 13, 2014

HDFS Rebalance 4

Whenever a new data node is added to the existing HDFS cluster or a data node is removed from the cluster then some of the data nodes in the cluster will have more/less blocks compared to other data nodes. In this unbalanced cluster, data read/write requests become very busy on some data nodes and some data nodes are under utilized. In such cases, to make all the data nodes space […]

HAR Files – Hadoop Archive Files 1

Hadoop Archive Files Hadoop archive files or HAR files are facility to pack HDFS files into archives. This is the best option for storing large number of small sized files in HDFS as storing large number of small sized files directly in HDFS is not very efficient. The advantage of har files is that, these files can be directly used as input files in Mapreduce jobs. HAR Files Creation Hadoop […]

dfsadmin – HDFS Administration Command

The Syntax for Hadoop commands is $ hadoop [–config confdir]  [Command]  [Generic_Options]  [Command_Options] here –config parameter is used for overwriting the default configuration directory. Commands can be either user commands or administrator commands. Below are the details of the useful administrator command dfsadmin. dfsadmin: dfsadmin (distributed file system administration) command is used for file system administration activities like getting file system report, enter/leave safemode, refreshing nodes in the cluster and […]