Daily Archives: April 13, 2014

HDFS Rebalance 4

Whenever a new data node is added to the existing HDFS cluster or a data node is removed from the cluster then some of the data nodes in the cluster will have more/less blocks compared to other data nodes. In this unbalanced cluster, data read/write requests become very busy on some data nodes and some data nodes are under utilized. In such cases, to make all the data nodes space […]

HAR Files – Hadoop Archive Files 1

Hadoop Archive Files Hadoop archive files or HAR files are facility to pack HDFS files into archives. This is the best option for storing large number of small sized files in HDFS as storing large number of small sized files directly in HDFS is not very efficient. The advantage of har files is that, these files can be directly used as input files in Mapreduce jobs. HAR Files Creation Hadoop […]

dfsadmin – HDFS Administration Command

The Syntax for Hadoop commands is $ hadoop [–config confdir]  [Command]  [Generic_Options]  [Command_Options] here –config parameter is used for overwriting the default configuration directory. Commands can be either user commands or administrator commands. Below are the details of the useful administrator command dfsadmin. dfsadmin: dfsadmin (distributed file system administration) command is used for file system administration activities like getting file system report, enter/leave safemode, refreshing nodes in the cluster and […]

Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017