Monthly Archives: April 2015

Hadoop Performance Tuning 4

Hadoop Performance Tuning There are many ways to improve the performance of Hadoop jobs. In this post, we will provide a few MapReduce properties that can be used at various mapreduce phases to improve the performance tuning. There is no one-size-fits-all technique for tuning Hadoop jobs, because of the architecture of Hadoop, achieving balance among resources is often more effective than addressing a single problem. Depending on the type of job […]

Hadoop Best Practices

Hadoop Best Practices Avoiding small files (sized less than 1 HDFS block, typically 128MB) with one map processing a single small file. Maintain Optimal HDFS Block size, generally >= 128 MB, to avoid tens of thousands of map tasks in processing large data sets. Usage of Combiners wherever applicable/suitable to reduce the network traffic from mapper nodes to reducer nodes. Applications processing large data-sets with optimal number of reducers and […]

Oozie Share Lib does not exist error

In this post we will discuss about one of common/frequent error message faced by many oozie beginners due to incorrect/incomplete oozie setup. Oozie Share Lib does not exist error is received if oozie sharelib is not installed properly. Error Scenario: File /user/user/share/lib does not exist This error message we will get when we submit any oozie job with below property set to true in file

and if you […]

Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017