Table of Contents
Safe Mode
Safe Mode in hadoop is a maintenance state of NameNode during which NameNode doesn’t allow any changes to the file system.
During Safe Mode, HDFS cluster is read-only and doesn’t replicate or delete blocks. At the start up of NameNode
- It loads the file system namespace from the last saved fsimage into its main memory and the edits log file.
- Applies/merges edits log file on fsimage and results in new file system namespace.
- And then it receives block reports containing information about block locations from all data nodes
The state during which it performs collection of block reports from data nodes is safe mode. I.e. NameNode enters safe mode automatically during its start up.
To leave Safe Mode, NameNode need to collect reports for at least a specified threshold percentage of blocks and these should satisfy minimum replication condition.
Once the threshold is reached, still the safe mode extends to a configurable amount of time to let the remaining data nodes to check in before it will start replicating missing blocks or deleting over replicated blocks.
Once all the blocks reports are collected and namenode starts block replication adjustment activity, i.e. Any under replicated blocks will be replicated to a minimum value of block replication factor. And any over replicated blocks will be deleted to make sure block replication is within limits of maximum block replication factor.
After completion of block replication maintenance activity, the name node leaves safe mode automatically.
Safe Mode can also be entered Manually for administration activities with dfsadmin command utility.
1. To know the current status of safe mode, use the below command.
2. Enter Safe Mode Manually with the below command.
Below will be the log entry stating that Safe Mode is entered Manually by hadoop1 user.
Now, only hadoop1 user can leave the Safe Mode.
Note: If Safe Mode is entered manually then it must be removed manually only otherwise it will not leave automatically.
3. To leave safe mode manually, use below command.
Thus by entering into Safe Mode manually at any time, we can recover under-replicated blocks to a minimum replication limit.
Configuration Parameters for Safe Mode in Hadoop
Configuration parameter | Default | Description |
dfs.replication | 3 | Default block replication. |
dfs.replication.max | 512 | Maximal block replication. |
dfs.namenode.replication.min | 1 | Minimal block replication. |
dfs.namenode.safemode.threshold-pct | 0.999f | Specifies the percentage of blocks that should satisfy the minimal replication requirement defined by dfs.namenode.replication.min. |
dfs.namenode.safemode.min.datanodes | 0 | Specifies the number of datanodes that must be considered alive before the name node exits safemode. |
dfs.namenode.safemode.extension | 30000 | Determines extension of safe mode in milliseconds after the threshold level is reached. |
All these are defined in hdfs-default.xml and if any value needs to be overridden, override these entries in hdfs-site.xml file.
Special Cases
- If dfs.namenode.safemode.threshold-pct value is less than or equal to 0 then it means that, not to wait for any particular percentage of blocks before exiting safemode. Values greater than 1 will make safe mode permanent.
- If dfs.namenode.safemode.min.datanodes value is less than or equal to 0 then it means that not to take the number of live datanodes into account when deciding whether to remain in safe mode during startup. Values greater than the number of datanodes in the cluster will make safe mode permanent.
The above configuration parameter details for safemode in hadoop are sourced from hdfs-default.xml file.
Great Explanation siva…
Sometimes the safemode won’t be released by normal user, then you need to issue the command with sudo like
Sudo –u hdfs hdfs dfsadmin –safemode leave.This is for HDP 2.3 on linux 6.7
Good explanation shiva.
Not entirely correct, NameNode will not do any block replications/deletions while in safe mode.
https://wiki.apache.org/hadoop/FAQ#Does_the_name-node_stay_in_safe_mode_till_all_under-replicated_files_are_fully_replicated.3F
This is to avoid prematurely replicating blocks that already might have enough replicas when starting the NameNode.
Great explanation……..