Backup Node in Hadoop 4

Backup Node:

Backup Node in hadoop is an extended checkpoint node that performs checkpointing and also supports online streaming of file system edits.

The advantage over the checkpoint node is that the namespace (meta-data) present in it’s main memory is always in sync with primary namenode file system namespace, Since it maintains an in-memory, up-to-date copy of file system namespace and accepts a real time online stream of file system edits and applies these edits on its own copy of namespace in its main memory.

Thus, at any point of time, it maintains a latest backup of current file system namespace.

In Secondary Namenode and Checkpoint Node, checkpoints are created on their local files systems by downloading fsimage and edits log files from active primary namenode and merges these two files and new fsimage copy is saved on their local file systems. But unlike Secondary NameNode or Checkpoint Node, the Backup node does not need to download fsimage and edits files from the active NameNode to create a checkpoint, as it already has an up-to-date state of the namespace in it’s own main memory. So, creating checkpoint in backup node is just saving a copy of file system meta-data (namespace) from main-memory to its local files system.

So, obviously checkpoint creation in backup node will always be faster than that of in secondary namenode or checkpoint node.

As the Backup node keeps a copy of the namespace in main memory similar to NameNode, its main memory (hardware) specifications should be same as the NameNode.

Unlike Checkpoint nodes, there is only one Backup node is allowed to be registered with namenode at any time but multiple checkpoint nodes registration is possible. if a Backup node is in use, then there might not be need for checkpoint nodes and these may not be required to register with namenode.

Backup Node in hadoop can be started with below command on the dedicated node configured in the cluster.

Below two configuration variables are used for specifying the addresses of the Backup node and its web interface

dfs.namenode.backup.address The backup node server address and port. If the port is 0 then the server will start on a free port.
dfs.namenode.backup.http-address The backup node http server address and port. If the port is 0 then the server will start on a free port.

Note: One of the main advantage of a Backup node is that, it provides the option of running the NameNode with no persistent storage, delegating all responsibility for persisting the namespace to the Backup node.

To do this, NameNode needs to be started with below command and by not specifying edits directory  dfs.namenode.edits.dir in hdfs-site.xml.

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *

4 thoughts on “Backup Node in Hadoop