Rack Awareness in Hadoop 6

Usually Hadoop clusters of more than 30-40 nodes are configured in multiple racks. Communication between two data nodes on the same rack is efficient than the same between two nodes on different racks.

In large clusters of Hadoop, in order to improve network traffic while reading/writing HDFS files, NameNode chooses data nodes which are on the same rack or a near by rack to read/write request (client node).

NameNode achieves this rack information by maintaining  rack ids of each data node. This concept of choosing closer data nodes based on racks information is called Rack Awareness in Hadoop.

A default Hadoop installation assumes all the nodes belong to the same rack.

Replica placement via Rack Awareness in Hadoop
A simple policy is to place replicas across racks. This prevents losing data when an entire rack fails and allows to make use of bandwidth from multiple racks when reading a file.
On Multiple rack cluster, block replications are maintained with a policy that no more than one replica is placed on one node and no more than two replicas are placed in the same rack with a constraint that number of racks used for block replication should be always less than total no of block replicas.
For example, 
  • When a new block is created, the first replica is placed on the local node, the second one is placed at a different rack, the third one is on a different node at the local rack
  • When re-replicating a block, if the number of existing replicas is one, place the second one on a different rack.
  • When the number of existing replicas is two, if the two replicas are on the same rack, place the third one on a different rack;
  • For reading, the name node first checks if the client’s computer is located in the cluster. If yes, block locations are returned from the close data nodes to the client.
This policy minimizes the write cost and maximizing read speed.

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *

6 thoughts on “Rack Awareness in Hadoop

    • Siva Post author

      yes we can using setrep command for each file level and also there are some data ingestion tools that support replication factor.

  • John Ross

    Hi, we are using HDP and already contains HDFS data. If we enable rack-awareness on hdfs do we still need to rebalance? or replicas will automatically placed at different racks? Are there tools available to check if the replicas are separated at different racks accordingly?

  • nastaran

    I think you say there are two replicas is in the local rack but

    “For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on one node in the local rack, another on a node in a different (remote) rack, and the last on a different node in the same remote rack.”

    as Hadoop Apache mentioned there is one replica in the local rack and two replica in an other rack.(it means we just use two rack one local and one remote)

  • knpcode

    Rack aware replica placement policy has the following consideration-

    Off-rack communication has to go through switches means more time is spent.
    Keeping block replica where client is, means fastest access.