Checkpoint Node in hadoop

Checkpoint Node

Checkpoint node in hadoop is a new implementation of the Secondary NameNode to solve the drawbacks of Secondary NameNode.

Main function of the Checkpoint Node in hadoop is to create periodic checkpoints of file system metadata by merging edits file with fsimage file. Usually the new fsimage from merge operation is called as a checkpoint.

Checkpoint Node periodically downloads the fsimage and edits log files from primary NameNode and merges them locally and stores in a directory structure which is similar to the directory structure of a primary NameNode so that primary NameNode can easily access the latest checkpoint if necessary in case of any NameNode failures.

It usually runs on a different machine than the primary NameNode since its memory requirements are same as the primary NameNode.

The advantage over the Secondary NameNode is, it also uploads the resulted fsimage from merge operation back to the active NameNode. 

Current Hadoop release allows multiple Checkpoint Nodes registered with NameNode.

Checkpoint Node can be started by

Similar to Secondary NameNode Configuration,  below are the two important configuration parameters that controls the checkpoint process on Checkpoint Node.

If NameNode is failed, then the latest checkpoint created by Checkpoint Node can be imported to NameNode’s metadata directory.

Procedure for Importing Checkpoint
  1. A new empty directory needs to be created on NameNode with the name same as the name present in dfs.namenode.name.dir configuration variable.
  2. dfs.namenode.checkpoint.dir configuration variable needs to be updated with the directory location of the latest checkpoint on Checkpoint Node.
  3. Start NameNode with checkpoint Option as mentioned below.

With this command, NameNode will start copying the checkpoint from dfs.namenode.checkpoint.dir directory on Checkpoint Node to NameNode’s directory dfs.namenode.name.dir.

Note:  Before Checkpoint Import process, the NameNode directory should be empty (no valid fsimage file on NameNode) otherwise import process will fail. 

Leave a Reply

Your email address will not be published. Required fields are marked *

  1. Pingback: HDFS Interview Questions Part – 1 | Hadoop Online Tutorials

  2. Pingback: Secondary NameNode in Hadoop - Hadoop Online Tutorials