Apache Oozie Installation on Ubuntu-14.04 13


In this post we will discuss about the Apache Oozie Installation on Ubuntu machine and we will run some sample mapreduce jobs on oozie scheduler.

Apache Oozie Installation on Ubuntu

We are building the oozie distribution tar ball by downloading the source code from apache and building the tar ball with the help of Maven.

Prerequisite

  • If we plan to install Oozie-4.0.1 or prior version Jdk-1.6 is required on our machine. If our jdk version on Ubuntu is greater than or equal to jdk-1.7 then we need to make some changes in pom.xml file.
  • But if we are going to install oozie-4.1.0 or later versions then Jdk-1.7 on Ubuntu will not cause any issues.
  • Hadoop-2 installed on our machine.
  • Mapreduce Job History Server should be configured and started successfully and remaining hadoop & yarn daemons should be running fine.

Procedure

  • Create oozie installation directory under preferred location, usually in /usr/lib/oozie and provide full permissions and download the oozie source code into that folder.

  • Once the build is successful, copy the binary distribution, from /usr/lib/oozie/oozie-4.1.0/distro/target/oozie-4.1.0-distro/oozie-4.1.0 to our convenient location so that no need to traverse longer directory structure to access oozie, suppose into /usr/lib/oozie/oozie-4.1/.

  • Add the bin directory of oozie-bin into .bashrc file as shown below. $ gedit ~/.bashrc

Oozie Web Console:
  • For enabling Web Console for Oozie we need ext-*.*.zip library present in Oozie distribution. By default this will not be available with Oozie distribution, we have to download it separately from extjs site.
  • Now lets create library directory under Oozie binary distribution and add required jars to it and later lets download required extjs zip files into it.

Now we are almost ready to start Oozie web console but we need to create user and groups specific to Oozie as shown below

  • Add these two lines to “core-site.xml” under “$HADOOP_CONF_DIR” or $HADOOP_HOME/etc/hadoop. Here, USERNAME should be replaced with appropriate value, in our case it is user only (user@ubuntu-1:)

  • Now we can start Oozie from terminal as shown below, after preparing Oozie-war file and setup.

  • Prepare sharelib in HDFS

Why Use the ShareLib?

Suppose we have an Oozie workflow that runs a MapReduce action. we want to specify our own Mapper and Reducer classes, but how does Oozie know where to find those two classes?

There are two ways to let Oozie know about Mapper and Reducer classes or any other additional JARs required by our workflow. The first approach is based on the fact that a workflow typically consists of a job.properties file, a workflow.xml file, and an optional lib folder (and perhaps other files such as Pig scripts). Oozie will take any of the JARs that we put in that lib folder and automatically add them to our workflow’s classpath when it’s executed.  This is the simplest approach.

Alternatively, we can use the oozie.libpath property in our job.properties file to specify additional HDFS directories (multiple directories can be separated by a comma) that contain JARs. The advantage of using this property over the lib folder discussed above is in cases where we have many workflows all using the same set of JARs.

The ShareLib behaves very similarly to oozie.libpath, except that it’s specific to the aforementioned actions and their required JARs.

Install and Use the ShareLib

By default, the ShareLib should be placed in the home folder in HDFS of the user who started the Oozie web server; this is not necessarily the same user as the one submitting a job. The property in oozie-site.xml for setting the location of the ShareLib is called oozie.service.WorkflowAppService.system.libpath and its default value is /user/${user.name}/share/lib, where ${user.name} gets resolved to the user who started the Oozie server. Hence, the default location to install the ShareLib is /user/${user}/share/lib.

To enable a workflow to use the ShareLib, we should simply specify oozie.use.system.libpath=true in the job.properties file and Oozie will know to include the jars in the ShareLib with the necessary actions in our job.

We need to update two properties in oozie-site.xml under OOZIE_CONF_DIR to setup share lib correctly. Here provide HADOOP_CONF_DIR value to first property.

  • Create the Oozie DB for our oozie version

Start Oozie Service:

We can start the oozie service with below command. As shown below Oozie automatically builds/determines all the required environment variables (OOZIE_*).

Now we can verify the status of Oozie service with the below command.


If we get the status message as NORMAL then we can see the Oozie web console on browser as shown below.

oozie web console

Run Examples in Oozie

By default Oozie distribution ships with a few example workflows with different engines (shell, mr, hive, etc…) inside oozie-examples.tar.gz file. Lets extract this tarball and change the job.properties files appropriately.
Extract Examples Tarball file
Edit job.properties files
Under echo apps/*/job.properties below are the lines of code present. For example, under map-reduce apps, as we have used 9000 port in hdfs://localhost:9000 in $HADOOP_CONF_DIR/core-site.xml file, so, we have to specify the same value in nameNode property in the below file. And by default jobTracker=localhost:8021 but in Yarn Architecture it jobTracker runs on 8032 port, so the same is updated below.

Copy these examples into HDFS

We have to submit the oozie job from the relative path to examples directory in LFS.

We have to submit the oozie job from the relative path to examples directory in LFS.

Now we can see the status of the running mapreduce job in the web console at http://localhost:11000/oozie or via the below command also.

 


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

13 thoughts on “Apache Oozie Installation on Ubuntu-14.04

  • jpbordi

    Hello,

    I install hadoop 2.6.0, pig 0.14,hue 0.37,oozie 4.0.1,

    oozie job task pig was working fine on master, but fail on slave node, because job.xml not evaluated the resourcemanager.hostname, for resourcemanager.scheduler.address and take always value 0.0.0.0:8030 default and not myIpAdressMasterServer:8030

    I try to solve this issue with the install 4.1.0, but get another one, the pig version is 0.12 and when i launch oozie job. it fail on no such field reflexion .

    015-04-10 14:15:37,827 [main] WARN  org.apache.pig.backend.hadoop20.PigJobControl  – falling back to default JobControl (not using hadoop 0.20 ?)
    java.lang.NoSuchFieldException: runnerState
    at java.lang.Class.getDeclaredField(Class.java:1953)

    oozie 4.1.0 seem not compatible with hadoop 2.6.0, because oozie sharelib contain pig 0.12 and not 0.14.  but i don t have this kind of error with oozie 4.0.1.

    I found not fast solution, i try to solve it, if nothing i must rollback to oozie 4.0.1

    If somebody have a idea how i can solve that, i will be very happy

    Thanks for support 
    KR
    JP

    logs job task 
    apache Pig version 0.12.1 (r1585011) 
    compiled Apr 07 2014, 12:19:58

    Run pig script using PigRunner.run() for Pig version 0.8+
    2015-04-10 14:15:36,601 [main] INFO  org.apache.pig.Main  – Apache Pig version 0.12.1 (r1585011) compiled Apr 07 2014, 12:19:58
    2015-04-10 14:15:36,601 [main] INFO  org.apache.pig.Main  – Logging error messages to: /tmp/hadoop-hduser-hue/nm-local-dir/usercache/hduser/appcache/application_1428663948609_0008/container_1428663948609_0008_01_000002/pig-job_1428663948609_0008.log
    2015-04-10 14:15:36,649 [main] INFO  org.apache.pig.impl.util.Utils  – Default bootup file /home/hduser/.pigbootup not found
    2015-04-10 14:15:36,744 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  – mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-04-10 14:15:36,744 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  – fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-04-10 14:15:36,744 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  – Connecting to hadoop file system at: hdfs://stargate:9000
    2015-04-10 14:15:36,750 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  – Connecting to map-reduce job tracker at: stargate:8032
    2015-04-10 14:15:37,316 [main] INFO  org.apache.pig.tools.pigstats.ScriptState  – Pig features used in the script: GROUP_BY,ORDER_BY,FILTER
    2015-04-10 14:15:37,351 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer  – {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
    2015-04-10 14:15:37,378 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  – mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator
    2015-04-10 14:15:37,474 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler  – File concatenation threshold: 100 optimistic? false
    2015-04-10 14:15:37,546 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer  – Choosing to move algebraic foreach to combiner
    2015-04-10 14:15:37,572 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  – MR plan size before optimization: 3
    2015-04-10 14:15:37,572 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  – MR plan size after optimization: 3
    2015-04-10 14:15:37,636 [main] INFO  org.apache.hadoop.yarn.client.RMProxy  – Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-04-10 14:15:37,827 [main] WARN  org.apache.pig.backend.hadoop20.PigJobControl  – falling back to default JobControl (not using hadoop 0.20 ?)
    java.lang.NoSuchFieldException: runnerState
    at java.lang.Class.getDeclaredField(Class.java:1953)
    at org.apache.pig.backend.hadoop20.PigJobControl.(PigJobControl.java:51)
    at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:98)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:289)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:191)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1324)

  • jpbordi

    suite of previous mail,

    i just confirm, it is working with hadoop 2.6.0,

    but it use pig 0.10 and hadoop 2.6.0 dont have any trouble, but sharelib was not install, may i try to force pig 0.10 in 4.1.0 in oozie-site.xml

    thx

    KR

    JP

     

     

  • jpbordi

    hello,

    It is me again, for conclusion

    i got it

    But, when i use your procedure, that doesnt work by me, i try three time.

    it seem compiler for hadoop 1.1.1 i dont know why

    I search another solution and i found link

    http://mockus.in/forum/viewtopic.php?t=48

    For i what i can understand, the compilation depend of hadoop version,

    last oozie version 4.1.0 know at last the hadoop 2.3.0 version

    I note two things different with your procedure,  i must modify pom.xml set hadoop from 1.1.1 to 2.3.0

    –lookout for
    <hadoop.version>1.1.1</hadoop.version>
    –Replace it with
    <hadoop.version>2.3.0</hadoop.version>

    and the i must install and use maven 3.2.1 instead of mvn 2.2.1 because i get maven fail

    org.apache.oozie:oozie-hadoop’ is duplicated in the reactor

    maven clean package assembly:single -P hadoop-2 -DskipTests

    After  oozie 4.1.0 work with hadoop 2.6.0, but use pig 0.12, i dont get any more exeception noSuchFieldExecption on jobcontrol

    i can continu learning, think for your article, it was very helpfull

    KR

    JP

     

     

     

     

     

     

     

     

  • Abdullah Khan

    java.library.path=/home/user/bigdata/hadoop-2.6.0/lib/native. Which jar files should I include in the ‘native’ folder? While starting oozie,  java.library.path=    is empty. Apparently ${JAVA_LIBRARY_PATH} in oozied.sh file could not be resolved. Help me, please.

  • rajesh

    Hi,

    I followed the steps what you mentioned in the above post but I am unable starting the oozie-it will giving connection refused error.

    Can u plz provide the appropriate steps to execute the oozie.

    Thanks for advanced.

  • M Shafi

    Hi Good Article.

    I installed it as suggested here. I am not able to get connected to oozie-server

    Connection refused ]. Trying after 1 sec. Retry count = 1
    Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2
    Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3
    Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4
    Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection ref

    pls help

  • Dhivya

    Hi shiva,

    I’m totally new to hadoop and oozie. I have installed oozie based on ur tutorial. Though it started fine without any error, Goes to failed state always with the error code JA017. Please help me to solve this issue. I’m stuck for two weeks with the below exception.

     

    Error starting action [mr-node]. ErrorType [FAILED], ErrorCode [JA017], Message [JA017: Unknown hadoop job [job_local1495137452_0001] associated with action [0000000-160728110800709-oozie-user-W@mr-node]. Failing this action!]
    org.apache.oozie.action.ActionExecutorException: JA017: Unknown hadoop job [job_local1495137452_0001] associated with action [0000000-160728110800709-oozie-user-W@mr-node]. Failing this action!
    at org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:1199)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1136)
    at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
    at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
    at org.apache.oozie.command.XCommand.call(XCommand.java:281)
    at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
    at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
    at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

     

  • valli n

    Iam getting the following error when trying to install oozie. Can you please help me in debugging this error.

    cannot stat ‘/home/wcbdd/apache/oozie-3.3.1/distro/src/main/oozie.war’: No such file or directory

    Regards
    Valli N

  • astro

    I am still getting Error: HTTP error code: 500 : Internal Server Error while running an ssh job.

     

    sharelib update gives: oozie admin -sharelibupdate -oozie http://localhost:11000/oozie
    Error: HTTP error code: 500 : Internal Server Error

    sharelib list  gives:

    oozie admin -shareliblist -oozie http://localhost:11000/oozie

    [Available ShareLib]

     

    Following error persist in logs:

    ERROR ShareLibService:517 – SERVER[thinkpad] org.apache.oozie.service.ServiceException: E0104: Could not fully initialize service [org.apache.oozie.service.ShareLibService], Not able to cache sharelib. An Admin needs to install the sharelib with oozie-setup.sh and issue the ‘oozie admin’ CLI command to update the sharelib


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

.