In this post, we will briefly discuss about the steps for RHadoop Installation on Ubuntu 14.04 Machine with Hadoop-2.6.0 version. We also see the procedure for R & RStudio Installations on Ubuntu Machine. All these installations are done on single node hadoop machine.
Table of Contents
RStudio Installation on Hadoop Machine
Before proceeding with steps detailed below, Hadoop machine setup should be completed. Please refer “install-hadoop-on-single-node-cluster” in this blog, for Hadoop installation
Install latest R version
Ubuntu Linux machine contains repositories which may not install the latest R version which 3.0.1. In order to deal with that, we need to add the R CRAN repository to the sources list. Below are the relevant commands which are needed to install latest R version.
1 2 |
$ sudo gedit /etc/apt/sources.list |
Once the list file is opened, copy and paste the below line at the end of the file (choose link based on the linux version on the system)
Ubuntu LTS 14.10
deb http:///bin/linux/ubuntu utopic/
Ubuntu LTS 12.04
deb http://cran.cnr.berkeley.edu/bin/linux/ubuntu precise/
Ubuntu LTS 14.04
deb http:///bin/linux/ubuntu trusty/
Ubuntu LTS 14.10
deb http:///bin/linux/ubuntu lucid/
After closing the file, you need to run the below commands to authenticate the newly added source.
1 2 3 4 |
$ sudo gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9 $ sudo gpg -a --export E084DAB9 | sudo apt-key add – $ exit |
To install the latest R version, execute. R base is mandatory for installing RStudio.
1 2 3 |
$ sudo apt-get update $ sudo apt-get install r-base-core |
Validate installation of R by entering R command in the terminal
Install R-Studio
To download and install RStudio Server open a terminal window and execute the commands listed below. Note that the gdebi-core package is installed first so that gdebi can be used to install RStudio and all of its dependencies. Also note that the libapparmor1 dependency is required for Ubuntu.
1 2 3 |
$ sudo apt-get install gdebi-core $ sudo apt-get install libapparmor1 |
for 32-bit version OS, type below
1 2 3 |
$ wget http://download2.rstudio.org/rstudio-server-0.97.551-i386.deb $ sudo gdebi rstudio-server-0.97.551-i386.deb |
for 64-bit version OS, type below
1 2 3 |
$ wget http://download2.rstudio.org/rstudio-server-0.98.1102-amd64.deb $ sudo gdebi rstudio-server-0.98.1102-amd64.deb |
Run the below script to check whether Rstudio server installation was successful.
1 2 |
$ sudo rstudio-server verify-installation |
Now rstudio can be accessed using a web interface. The address would be a combination of ip address and a default port number (8787).
URL: http://<IP-Address>:8787/ (here, http://192.168.1.3:8787/)
Username: <<System Username>>
Password: <<System Password>>
Run below commands in RStudio
1 2 3 4 5 6 7 8 |
Sys.setenv(HADOOP_HOME="/usr/lib/hadoop/hadoop-2.6.0") Sys.setenv(HADOOP_CMD="/usr/lib/hadoop/hadoop-2.6.0/bin/hadoop") Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop/hadoop-2.6.0/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar") install.packages(c("dplyr","R.methodsS3")) install.packages(c("Hmisc")) install.packages(c("caTools")) |
Download Required R Packages and Install
Download rmr, rhdfs, plyrmr, rhbase from here
Go to Tools -> Install packages -> package Archive File, then browse and select to install downloaded packages
Run below example in RStudio to validate installation
1 2 3 4 5 6 7 8 9 10 11 12 |
library(rhdfs) hdfs.init() library(rmr2) sample<-1:10 small.ints<-to.dfs(sample) out<-mapreduce(input = small.ints, map=function(k,v) keyval(v,v^2)) from.dfs(out) df<-as.data.frame(from.dfs(out)) print(df) |
Thanks Siva. The tutorial was very helpful to me.
Interestingly, I was setting HADOOP_CMD to $HADOOP_HOME instead of $HADOOP_HOME/bin/hadoop .
sir, after i set hadoop_cmd and try to get it (Sys.getenv(HADOOP_CMD)) its showing error.. saying hadoop_cmd not set what should i do
Dear siva,
Thank you siva. Great tutorial. It was very helpful to me.
I follow the same instructions as you mentioned above for setting up RHadoop environment.
But after running sample program as you mentioned I did not get any output.
I got output as below
Please kindly give me some pointers how to solve above issue.
Dear sir,
this is really helpful but ip address what i have to give? which ip address?
http://localhost:8787/
It will open R Studion in local machine (Single Node setup)