Eclipse is a powerful IDE for java development. Since Hadoop and Mapreduce programming is done in java, it would be better to do our programming in a well-featured Integrated Development Environment (IDE). So, In this post, we are going to learn how to install eclipse on Ubuntu machine and configure it for Hadoop and Mapreduce programming. Let’s start with downloading and installing Eclipse on ubuntu machine.
1. Install Eclipse:
- Download latest version of Eclipse IDE for java EE developers from Eclipse downloads page http://www.eclipse.org/downloads/. In this post, we have described installation of Eclipse Kepler which is latest version at the time of writing this post.
- Extract the *.tar.gz file into your preferred location of installation directory. Usually into /opt/eclipse.
- Set up environment variable ECLIPSE_HOME in .bashrc file with installation directory and add the installation directory into existing of directories in PATH environment variable.
Now we can start eclipse from terminal with $ eclipse command.
2. Eclipse Configuration for Hadoop/Mapreduce:
Eclipse configuration for Hadoop can be done in two methods. One by creating eclipse plugin for the currently using hadoop version and copying it into eclipse plugins folder. And another way by installing Maven plugin for integration of eclipse with hadoop and performing necessary setup.
Creation of Hadoop Eclipse Plugin:
For creation of customized hadoop eclipse plugin for hadoop version currently being used. In this post, we have created plugin for hadoop-2.3.0 release.
- ant – We need ant building tool to be installed on our machine to create plugin jar file. To install ant on Ubuntu machine use the below command.
2. git – git needs to be installed on our machine to clone the source code required to build the jar file from github. git can be installed with below command.
- Download the the required source code from git hub into our preferred location.
- The following path has some customized source files to create plugin for hadoop-2.3.0 release which is the latest version at the time of writing this post. https://github.com/siva535/hadoop-eclipse-plugin-2.3.0/releases/download/1.0/hadoop-eclipse-plugin.zip
- Extract the source files from the above zip file and change directory into $ cd Downloads/hadoop-eclipse-plugin/src/contrib/eclipse-plugin.
- Compile the source code and build jar file with the below command.
Here in the above ant jar command, -Dversion=2.3.0 property is provided to specify the version number of hadoop release. It is specific to hadoop-2.3.0 release. The same source files can be used for other releases as well by changing the version number in this parameter and providing appropriate hadoop’s home directory.
In this example, hadoop’s home directory is mentioned with
-Dhadoop.home=/usr/lib/hadoop/hadoop-2.3.0/ property. This can be changed as per your hadoop installation directory.
Also we have changed libraries.properties file in hadoop-eclipse-plugin/ivy/ directory to avoid the version mismatch errors.(required version files are not present in hadoop home directory).
For building eclipse-plugin for hadoop-2.3.0 release, the above source code and commands work pretty well. No changes are needed for hadoop-2.3.0. Changes will be needed accordingly only if we needed to generate plugin for other versions.
5. Now copy this plugin jar file from hadoop-eclipse-plugin/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.3.0.jar to /opt/eclipse/plugins directory.
6. After restart of Eclipse, the Map/Reduce perspective will be available.
Maven plugin for Integration of Eclipse with Hadoop:
- For this option, maven needs to be installed on our machine and this can be done with the below command if it is not installed already.
- We need to setup classpath variables for ant and maven installations. So, start eclipse and go to Window –> Preferences. Open Java –> Build Path –> Classpath Variables. Add entries for ANT_HOME as /usr/share/ant (our ant installation path) and M2_REPO with maven installation directory.
2. Install m2e plugin by navigating through Help –> Install New Software. As shown in below screen, enter http://download.eclipse.org/technology/m2e/releases into “Work with” box and select the plugin and click next button and complete the installation.
3. For configuration of hadoop, eclipse need external jar’s from JAVA_HOME/lib directory. Here JAVA_HOME is our java installation directory. From this JAVA_HOME/lib, we need to add tools.jar file as external jar file.
- Go to Window –> Preferences –> Java –> Installed JREs.
- Select default JRE and Edit –> Add External JARs and select tools.jar file from JAVA_HOME/lib directory.
4. Download hadoop source code from svn or git. Using git latest version of hadoop can be downloaded with below command.
5. Change directory (cd) to hadoop-common folder and submit below command from terminal to build maven hadoop project.
6. Import the above project into Eclipse:
- Go to File -> Import.
- Select Maven -> Existing Maven Projects.
- Navigate to the top directory of the downloaded source. Here hadoop-common directory in this example.
7. The generated sources as above may show up some errors due to the java files that are generated from protoc. To fix them, right click on each project –> Build Path –> Configure Build Path.
link sources from target/generated-sources and target/generated-test-sources. For inclusion pattern, select “**/*.java".
As discussed above, choosing the option 1 for creation of hadoop eclipse plugin will be easier than resolving the errors in option 2. So, we preferred using option 1 to create hadoop-eclipse-plugin-2.3.0 and copied into /opt/eclipse/plugins folder.
For Example Mapreduce program WordCount development under Eclipse IDE please refer the next post –> Sample Mapreduce Program In Eclipse.