Cloudera Manager Installation on Amazon EC2 21


In this post, we will discuss about hadoop installation on cloud storage. Though there are number of posts available across internet on this topic, we are documenting the procedure for Cloudera Manager Installation on Amazon EC2 instances with some of our practical views on installation and tips and hints to avoid getting into issues. This post also gives a basic introduction on usage of Amazon AWS cloud services.

Creation of Amazon EC2 Instances:

First we need to create necessary EC2 instances on Amazon AWS cloud services with appropriate AMI (Amazon Machine Image). In this post, we are using Ubuntu 14.04 as AMI for Clouder Manager 5 installation along with CDH 5.2 release. We are going to setup 4 node cluster with 1 Namenode and 3 Datanodes, which is the minimum requirement for cloudera hadoop cluster setup without any error messages or warnings.

Private and Public IP Addresses:

Creation of EC2 instances in Amazon AWS cloud services, will assign one private IP address (This will be used within AWS to access each machine) and one Public IP address (This is used to access the machines from outside AWS like Internet) to each EC2 instance created.

Pricing Mode:

The pricing mode of EC2 instances usage is hourly basis and whenever we are not using any EC2 instance, we can stop that instance and we can start it again with same AMI configuration to save the cost of EC2 instances usage. In this case the billing is charged only for the hours during which the instances were running.

But the only disadvantage of stopping and starting instances again is that, every time we start the instance, instance will be assigned with new Private and Public IP address pair created dynamically. And we can’t access this instance with previous Private and Public IP address pairs.

Hint:

If we install Hadoop on EC2 instances directly, then either we need to keep all the EC2 instances running forever so that their Private and Public IP addresses will not change after Hadoop Installation or we need to terminate the instances and re-create them and re-install hadoop after every stop/start of EC2 instances. Both of these options are not ideal maintaining a hadoop cluster.

So, in order to keep the cluster cost effective (so that we can stop and start the instances whenever we needed), we can make use of Amazon VPC (Virtual Private Cloud network) cloud service and Elastic IP addresses. With these two AWS cloud services, we can achieve Static Private and Public IP addresses for the EC2 instances being created. But keep in mind that these two additional services come at the extra cost but provides flexibility to stop and start the EC2 instances whenever we do not need the instances running to save the cost.

In this post, we will make use of Amazon VPC, Elastic IP and EC2 Instance AWS cloud services to setup a private cloud network and maintain static IP addresses.

Creation of VPC, Launching EC2 Instances and Assigning Elastic IP addresses:

After login into AWS console, first select VPC cloud service it will open VPC dashboard as shown below.

VPC Dashboard

Click on Start VPC Wizard and select VPC with a single public subnet as shown below. Provide VPC name and rest all properties can be left as default values. And create VPC as HDP-VPC.

VPC Configuration

Now select Services –> EC2 to open EC2 Dashboard and Launch Instances into VPC.

Namenode Instance Configuration:

Now select AMI configuration for Namenode as m3.2xlarge instance type.

Launch Instance

Click on Launch Instance and choose AMI as Ubuntu 14.04 and follow steps as shown in below screens in the same order.

Choose AMI

Select 1 instance for Namenode and select HDP-VPC (the above created VPC) as Network and remaining properties as default values.

Instance Configuration

Now Add storage at least 80 GB to install Cloudera Manager.

Storage And give instance name=CL_NN in Tag instance and Create a new security group as shown below.

Creation of Security Group:

Creation of Security Group

Add inbound rules as shown above for TCP ports 7180, 7182, 7183 and 7432 and SSH port 22 and all other rules shown in above screen are better keep. In order to access this EC2 instance from any machine from outside, we need to select Anywhere in source tab.

If this is not setup properly, we can’t access Cloudera Manager server admin login, Postgre SQL login.

Now review the configuration and launch the instance:

Review

After this page click on Launch button and we will be asked for creation of private key pair and Download Key Pair , This is the only place where we can save the private key pair otherwise we can’t connect to these EC2 instances from outside. Give the key pair name as HDPCluster1

Create Key Pair

Now we can see the instance running under Instances tab.

Creation of Elastic IP Address:

Elastic IPs –> Allocate New Address and after new IP address allocation, open the Associate Address and select the instance just now created.

Allocate Elastic IP

Elastic IP

This will associate a static Private and Public IP address pair to Namenode Instance.

Create DataNode EC2 Instances:

Similar to Namenode EC2 instance creation as shown above, create 3 instances under HDP-VPC, each with 100 GB storage and all are allocated to same security group which is created in the above. This time, Choose AMI as Ubuntu 14.04 and Instance Type as m3.xlarge and select the instance configuration as shown below.

Datanode instances DN Storage And review the configuration and launch the instances and Allocate three new Elastic IP addresses and associate them to Data node instances. Below are the list of four instances:

Running Instances

Install Cloudera Manager on NameNode Instance:

Now connect to Namenode instance via the terminal from our local Ubuntu machine through SSH port 22. The commands needed for connection to an EC2 instance is shown in below screen.

Connect Instance

After changing the permissions on HDPCluster1.pem file we can use below ssh command to connect to EC2 instance.

After connecting to EC2 instance perform below commands in sequence.

SSH Connectivity

This will start the Cloudera manager installer as shown below.

CM Installer Follow the directions shown by installer and after successful completion screen will instruct to login at 7180 port on Namenode hostname for Cloudera Manager Admin page login for further CDH5.2 installation.

CM Installer Success

Login to admin page with admin as both username and password.

CM admin

Continue the steps as shown in below screens in the sequence.

CM Express

CM install services

Here in the search box provide Private IP addresses or Private Hostnames to avoid unnecessary error messages while starting Cloudera Agent Services later.

Provide Hostnames or IP addresses

Even Public IP addresses seem to be working fine but some time we may receive error messages as shown below:

Perform Cluster installation by using parcels to install CDH 5.2.

Cluster Installation

Here in the below SSH Login Credentials Screen, we need to select the username as ubuntu than root. We should not select root here. We have to assign HDPCluster1.pem as the private key file and need to select all hosts accept same private key.

SSH Login Credentials

CL Install1

If we don’t get any error messages, then the installation will be successful as shown below

CL Success

Or if we get any error messages as shown below,

CL failed

In this case, provide Private IP Addresses and Private DNS names in /etc/hosts file of all nodes of the cluster being installed.

c35

In the next steps, save the Postgre SQL login Username and Password somewhere to login Postgre SQL manually incase of any issues in creating metastore tables.

PostGreSQL Credentials

Next select Continue with the default settings for the cluster configuration and follow till first run of all the requested services. On Successful start of all service cluster show good health for all the services as shown below.

Healthy Cluster

As all the services are showing green status, the hadoop cluster is successfully installed and configured and all the services are running successfully without any warning messages.


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

21 thoughts on “Cloudera Manager Installation on Amazon EC2

  • Jeff

    Hi,

    Thanks for the tutorial. It is helping me, a lot.

    I have been unsuccessful, so far. I am having a problem with reverse DNS lookup. How did you configure your networking so that it will work?

    I had to specify the internal names during my Cloudera configuration. Any advice you can offer is greatly appreciated.

    Thanks,
    Jeff

    • Profile photo of Siva
      Siva Post author

      Hi Jeff,

      Some times it will success without any error messages if we provide public IP addresses, in specifying host for cloudera installation page. If this fails trying providing your public hostnames (instead of ip addresses).

      Even if this also fails, try giving Private IP addresses/hostnames. In case if you still get errors in installing, then you need to change the /etc/hosts file on each node of the cluster being installed.

      you need to copy ip addresses of all the nodes into /etc/hosts file of each node in the below format

      private-ip-addresse-namenode private-hostname-namenode
      private-ip-addresse-dn1 private-hostname-dn1
      private-ip-addresse-dn2 private-hostname-dn2

      copy these lines (of course you need to provide actual ip addresses and hostnames in the above) into /etc/hosts file of each node and replace the 127.0.0.1 localhost lines on the machine.

      If you still get any error message please post your error message details/screen shots in hadoop discussion forum (http://hadooptutorial.info/forums/forum/hadoop-discussion-forum/)…we will definitely help you in resolving the issue.

      • Bharath

        Hi

        I am trying CDH automatic installation on AWs EC2 using cloudera manager bin. I have created one ubuntu Precise 12.04 LTS micro instance,

        I followed the on screen instructions as instructions on per this tutorial.. ” http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-7-1/Cloudera-Manager-I

        1) this is my vi /etc/hosts file
        127.0.0.1 localhost
        172.31.13.46 master
        # The following lines are desirable for IPv6 capable hosts
        ::1 ip6-localhost ip6-loopback
        fe00::0 ip6-localnet
        ff00::0 ip6-mcastprefix
        ff02::1 ip6-allnodes
        ff02::2 ip6-allrouters
        ff02::3 ip6-allhosts

        2) Downloaded and changed the permission for cloudera-manager-installer.bin also.

        3) sudo ./cloudera-manager-installer.bin after this command cloudera manager installed,

        4) But i couldnot access cloudera manager webconsole using ” http://172.31.13.46:7180 ” and i have opened my port 7180 while creating an instance, but still not able to acess through webconsole,

        5) my cloudera manager db and cloudera manager server both are running.

        6) and the port 7180 is also not listning in my ubuntu server and i used the following comand, ” sudo ufs allow 7180″ but no use,,

        7) I checked $ sudo ufw status and the result is inactive

        8) when I check $ sudo service cloudera-scm-agent status on 172.31.13.46 it comes as unrecognized service

        I am struggling in this part, Could you please let me know where I went wrong in installing cloudera in a clustered environment..???

        if yes, it will be helpful for me, please,

        Thanks in advance,

        Regards,
        Bharath

  • ck

    Hi Siva,

    Thank you for this helpful tutorial. I am trying to evaluate whether I should be using EMR + S3 or if I should be using EC2+ Cloudera Enterprise Hub.

    From your experience, will you be willing to provide your thoughts on Pros and Cons of (EMR + S3) Vs (EC2 + Cloudera Enterprise Hub)? Thank you very much for sharing all the good work.

    Best,

    CK

  • dileep

    im getting below error as mentioned by you

     

    Installation Failed. Failed to Receive Heartbeat from Agent
    Ensure that the host’s hostname is configured properly.
    Ensure that port 7182 is accessible on the Cloudera Manager server (check firewall rules).
    Ensure that ports 9000 and 9001 are free on the host being added.
    Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details).

     

    could u pls help?

    in hosts file its looks like below: mine is single node cluster with CDH5 on local system with ubuntu 12.04 lts.

    127.0.0.1 localhost

    127.0.1.1 ubuntu

  • Robin Dong

    this /etc/hosts file is very critical in this installation.
    I have finished all other steps, but always get error on hostname not properly configured.
    Accroding to Cloudera doc, /etc/hosts should look like this
    127.0.0.1 localhost.localdomainlocalhost
    192.168.1.1 cluster01.example.com cluster01
    ….

    but it’s not working. so I guess we like to have author’s /etc/hosts file content to see how it setup to work.

    Please share your /etc/hosts file.

    thanks,

    Robin

  • Robin Dong

    OK, I noticed in the tutorials, it said, in this case(if failed of cloudera manager installation/hosts hostname confiugre), provide private ip and private DNS in /etc/hosts on all nodes.

    I wonder if the /etc/hosts have 2 steps to configure to make this cloudera manager installed.

    Shall we list all public ip, public dns, private ip and private dns in /etc/hosts file before install cloudera manager?

    thanks,

    Robin

     

     

     

     

  • Robin Dong

    I have posted my question for few days now, havent get any words from any one yet.

    I simply like to have a sample of /etc/hosts file to fix the host’s hostmane error from cloudera manager installation. any one can help?

    thanks,

    • Profile photo of Siva
      Siva Post author

      Hi Robin, sorry for the delayed response. I currently closed my AWS cluster since it is chargeable and maintaining offline cluster but anyway to your question below,

      First try listing out your public ip address and dns names in /etc/hosts and check if this works,
      In case if this is not working then you can try with private ip and dns names in /etc/hosts file
      Make sure that these entries are same across all the nodes in your cluster.

      Suppose if you are having 4 node cluster,

      then there should be entry for these 4 machines private ip, dns names in every /etc/hosts file across these 4 machines.

      I hope this will be helpful for you.

      • Robin Dong

        Siva Sr.
        thank you so much to take time to answer my question. Sorry to take your precise time.
        your tutorials is the simplest one for CDH 5 install on AWS/EC2. I have learned so much from it.
        thank you for doing this.

        I always get an error: Enrsure that host’s hostname configured properly…. and
        no matter how I modify my /etc/hosts file, this error stay with me like a cancel cell….

        However my host file is like this now:

        127.0.0.1 localhost.localdomainlocalhost
        52.88.118.48 ec2-52-88-118-48.us-west-2.compute.amazonaws.com 172.31.0.146 ip-172-31-0-146
        52.26.227.159 ec2-52-26-227-159.us-west-2.compute.amazonaws.com 172.31.0.147 ip-172-31-0-147
        52.27.9.129 ec2-52-27-9-129.us-west-2.compute.amazonaws.com 172.31.0.149 ip-172-31-0-149

        also, the output on my master server is:

        ubuntu@ip-172-31-0-146:~$ hostname
        ip-172-31-0-146
        ubuntu@ip-172-31-0-146:~$ hostname -f
        ip-172-31-0-146
        ubuntu@ip-172-31-0-146:~$ hostname -A
        ip-172-31-0-146
        ubuntu@ip-172-31-0-146:~$ sudo ifconfig
        …. addr:172.31.0.146 ……….

        please help, thank you so much.

        Robin

        so any suggestions?

        • Robin Dong

          yes, I did have the same /etc/hosts files across all my slave nodes. one of them died now. so only 3 for now.

          with this new /etc/hosts file, I still get the same error. it is like a cancer cell to me :(. I wanted to fix it so bad.

          thanks Siva,

          Robin

          • Odie

            Hi Robin,

            When you got the error click on back arrow at the bottom of the page until the home then click continue, you will be able to get the installation going.

            However, I have a problem that once the cluster shutdown, when trying to bring up, HDFS fail to start namenode and all sorts.

            Any suggests, Siva?

          • Profile photo of Siva
            Siva Post author

            Hi Odie,

            For your problem it is due to dynamic ip addresses allocated by AWS for your machines. To resolve this issue go with static private ip and public ip addresses

          • Odie

            Hi Siva,
            I use Elastic IP for all nodes still got problem with HDFS, HBase then my namenode went into safe mode.

  • Abhishek Srivastava

    I am pretty new to the whole concept on Big data and Cloudera in general. I have recently registered to Amazon services and got a free 1 year usage. So will it charge me if I try and do this cloudera setup in an instance for learning purposes?