Eclipse Configuration for Hadoop 11


Eclipse is a powerful IDE for java development. Since Hadoop and Mapreduce programming is done in java, it would be better to do our programming in a well-featured Integrated Development Environment (IDE). So, In this post, we are going to learn how to install eclipse on Ubuntu machine and configure it for Hadoop and Mapreduce programming. Let’s start with downloading and installing Eclipse on ubuntu machine.

1. Install Eclipse:

  1. Download latest version of Eclipse IDE for java EE developers from Eclipse downloads page http://www.eclipse.org/downloads/. In this post, we have described installation of Eclipse Kepler which is latest version at the time of writing this post.
  2. Extract the *.tar.gz file into your preferred location of installation directory. Usually into /opt/eclipse.
  3. Set up environment variable ECLIPSE_HOME in .bashrc file with installation directory and add the installation directory into existing of directories in PATH environment variable.

Below are the useful terminal commands to perform above actions in the same sequence. Eclipse Install Add the below two entries into .bashrc file.

Now we can start eclipse from terminal with $ eclipse command.

2. Eclipse Configuration for Hadoop/Mapreduce:

Eclipse configuration for Hadoop can be done in two methods. One by creating eclipse plugin for the currently using hadoop version and copying it into eclipse plugins folder. And another way by installing Maven plugin for integration of eclipse with hadoop and performing necessary setup.

Creation of Hadoop Eclipse Plugin:

For creation of customized hadoop eclipse plugin for hadoop version currently being used. In this post, we have created plugin for hadoop-2.3.0 release. 

Prerequisites:

    1. ant – We need ant building tool to be installed on our machine to create plugin jar file. To install ant on Ubuntu machine use the below command.

2.   git – git needs to be installed on our machine to clone the source code required to build the jar file from github. git can be installed with below command.

 Plugin creation:

    1. Download the the required source code from git hub into our preferred location.
    2. The following path has some customized source files to create plugin for hadoop-2.3.0 release which is the latest version at the time of writing this post. https://github.com/siva535/hadoop-eclipse-plugin-2.3.0/releases/download/1.0/hadoop-eclipse-plugin.zip
    3. Extract the source files from the above zip file and change directory into $ cd Downloads/hadoop-eclipse-plugin/src/contrib/eclipse-plugin.
    4. Compile the source code and build jar file with the below command.

ant jar

ant jar2

Note:

Here in the above ant jar command, -Dversion=2.3.0 property is provided to specify the version number of hadoop release. It is specific to hadoop-2.3.0 release. The same source files can be used for other releases as well by changing the version number in this parameter and providing appropriate hadoop’s home directory.

In this example, hadoop’s home directory is mentioned with

-Dhadoop.home=/usr/lib/hadoop/hadoop-2.3.0/ property. This can be changed as per your hadoop installation directory.

Also we have changed libraries.properties file in hadoop-eclipse-plugin/ivy/ directory to avoid the version mismatch errors.(required version files are not present in hadoop home directory).

For building eclipse-plugin for hadoop-2.3.0 release, the above source code and commands work pretty well. No changes are needed for hadoop-2.3.0. Changes will be needed accordingly only if we needed to generate plugin for other versions.

5.   Now copy this plugin jar file from hadoop-eclipse-plugin/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.3.0.jar to /opt/eclipse/plugins directory.

6.   After restart of Eclipse, the Map/Reduce perspective will be available.

Maven plugin for Integration of Eclipse with Hadoop:

Prerequisites:

  1. For this option, maven needs to be installed on our machine and this can be done with the below command if it is not installed already.

Setup:

  1. We need to setup classpath variables for ant and maven installations. So, start eclipse and go to Window –> Preferences. Open Java –> Build Path –> Classpath Variables. Add entries for ANT_HOME as /usr/share/ant (our ant installation path) and M2_REPO with maven installation directory.

classpath variables

2.  Install m2e plugin by navigating through Help –> Install New Software. As shown in below screen, enter http://download.eclipse.org/technology/m2e/releases into “Work with” box and select the plugin and click next button and complete the installation.

eclipse m2e

3.  For configuration of hadoop, eclipse need external jar’s from JAVA_HOME/lib directory. Here JAVA_HOME is our java installation directory. From this JAVA_HOME/lib, we need to add tools.jar file as external jar file.

    1.  Go to Window –> Preferences –> Java –>  Installed JREs.
    2. Select default JRE and Edit –> Add External JARs and select tools.jar file from JAVA_HOME/lib directory.

jre edit

javahome.lib

tools.jar

4.  Download hadoop source code from svn or git. Using git latest version of hadoop can be downloaded with below command. 

5.  Change directory (cd) to hadoop-common folder and submit below command from terminal to build maven hadoop project.

mvn build

6.  Import the above project into Eclipse:

    1. Go to File -> Import.
    2. Select Maven -> Existing Maven Projects.
    3. Navigate to the top directory of the downloaded source. Here hadoop-common directory in this example.

hadoop projects

7.  The generated sources as above may show up some errors due to the java files that are generated from protoc. To fix them, right click on each project –> Build Path –> Configure Build Path.

configure build path2

link sources from  target/generated-sources and target/generated-test-sources. For inclusion pattern, select “**/*.java”.

link sources

Conclusion:

As discussed above, choosing the option 1 for creation of hadoop eclipse plugin will be easier than resolving the errors in option 2. So, we preferred using option 1 to create hadoop-eclipse-plugin-2.3.0 and copied into /opt/eclipse/plugins folder.

For Example Mapreduce program WordCount development under Eclipse IDE please refer the next post –> Sample Mapreduce Program In Eclipse.


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

11 thoughts on “Eclipse Configuration for Hadoop

  • BSR

    Hi,
    I have compiled this for hadoop 2.4.1 and loaded the plugin into eclipse kepler. However, not able to add any locations to it. The UI that takes location name, MRV2 master host/port details does not appear at all. The behaviour is same with eclipse Luna.

    Thanks and Regards,
    BSR

    • Profile photo of Siva
      Siva Post author

      Can you please let me know how did you get your hadoop-eclipse plugin, whether you have created as shown in Option 1 of the post or you get it from anywhere on the net. Because there will be version inconsistencies if you try to use the different versions hadoop and its eclips plugin.

    • Profile photo of Siva
      Siva Post author

      Yes, it has miscellaneous behaviour while adding DFS locations to it and even adding DFS locations to eclipse is not always preferred. So, Eclipse is better for coding and building jars at one place and then finally copying our jar file into datanode from which we will plan to submit the job.

    • Profile photo of Siva
      Siva Post author

      Hi Kunal,

      There is some issue with the integration of DFS locations into Eclipse but this plugin works well for Mapreduce perspective. You can use it for construction of mapreduce programs and building jars. Once jars are created, these can be copied to any datanode from which we want to submit our job. ($hadoop jar my.jar Mainclass i/o o/p – command)

  • teena

    Hi sir,

    i tried to configure eclipse or hadoop-2.4.1 with eclipse mars and i got stuck with the 4th step  ant jar -Dversion…; i changed the version number for -Dversion but stil got an error like

    BUILD FAILED

    /home/hadoopuser/Downloads/hadoop-eclipse-plugin/src/contrib/eclipse-plugin/build.xml:123: Warning: Could not find file  /usr/local/hadoop/share/hadoop/commom/lib/hadoop-auth-2.3.0.jar to copy

    but the hadoop-auth-2.4.1.jar file is present in lib folder

    please help with this..

    thank u in advance

     

  • santosh

    hi I have dynamic job ordering and slot configuration using mapreduce hadoop project I know how to run it on windows using eclipse cygwin and VMware but I want to run the same project in Ubuntu os so how I can do this? wethere I need to install ecclipse vmware for ubuntu or something else?????????


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.