Hadoop Performance Tuning 4

Hadoop Performance Tuning There are many ways to improve the performance of Hadoop jobs. In this post, we will provide a few MapReduce properties that can be used at various mapreduce phases to improve the performance tuning. There is no one-size-fits-all technique for tuning Hadoop jobs, because of the architecture of Hadoop, achieving balance among resources is often more effective than addressing a single problem. Depending on the type of job […]

Hadoop Best Practices

Hadoop Best Practices Avoiding small files (sized less than 1 HDFS block, typically 128MB) with one map processing a single small file. Maintain Optimal HDFS Block size, generally >= 128 MB, to avoid tens of thousands of map tasks in processing large data sets. Usage of Combiners wherever applicable/suitable to reduce the network traffic from mapper nodes to reducer nodes. Applications processing large data-sets with optimal number of reducers and […]

Oozie Share Lib does not exist error

In this post we will discuss about one of common/frequent error message faced by many oozie beginners due to incorrect/incomplete oozie setup. Oozie Share Lib does not exist error is received if oozie sharelib is not installed properly. Error Scenario: File /user/user/share/lib does not exist This error message we will get when we submit any oozie job with below property set to true in file

and if you […]

