Checkpoint Node in hadoop 2

Checkpoint Node

Checkpoint node in hadoop is a new implementation of the Secondary NameNode to solve the drawbacks of Secondary NameNode.

Main function of the Checkpoint Node in hadoop is to create periodic checkpoints of file system metadata by merging edits file with fsimage file. Usually the new fsimage from merge operation is called as a checkpoint.

Checkpoint Node periodically downloads the fsimage and edits log files from primary NameNode and merges them locally and stores in a directory structure which is similar to the directory structure of a primary NameNode so that primary NameNode can easily access the latest checkpoint if necessary in case of any NameNode failures.

It usually runs on a different machine than the primary NameNode since its memory requirements are same as the primary NameNode.

The advantage over the Secondary NameNode is, it also uploads the resulted fsimage from merge operation back to the active NameNode.

Current Hadoop release allows multiple Checkpoint Nodes registered with NameNode.

Checkpoint Node can be started by

Similar to Secondary NameNode Configuration, below are the two important configuration parameters that controls the checkpoint process on Checkpoint Node.

If NameNode is failed, then the latest checkpoint created by Checkpoint Node can be imported to NameNode’s metadata directory.

Procedure for Importing Checkpoint
  1. A new empty directory needs to be created on NameNode with the name same as the name present in configuration variable.
  2. dfs.namenode.checkpoint.dir configuration variable needs to be updated with the directory location of the latest checkpoint on Checkpoint Node.
  3. Start NameNode with checkpoint Option as mentioned below.

With this command, NameNode will start copying the checkpoint from dfs.namenode.checkpoint.dir directory on Checkpoint Node to NameNode’s directory

Note: Before Checkpoint Import process, the NameNode directory should be empty (no valid fsimage file on NameNode) otherwise import process will fail.

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *

2 thoughts on “Checkpoint Node in hadoop

  • Damodara

    When the fsimage is copied from the Secondary name Node/Check Point Name Node to the Primary Name Node?

    Does it happens frequently? or when the Primary Name node restarts then only it will request for the FSImage from the 2ndary name node?

Skip to toolbar