HDFS is a distributed file system implemented on Hadoop’s framework designed to store vast amount of data on low cost commodity hardware and ensuring high speed process on data.
Hadoop Distributed File System design is based on the design of Google File System. It’s notion is “Write Once Read Multiple times".
Below are the main objectives of HDFS.
? Able to store vast amount of data probably in Tera bytes or Peta bytes by spreading the data across a number of machines on cluster.
? Storing data reliably, and in fault-tolerant manner by maintaining data replication to cope with loss of individual machines in the cluster
? Able to process the data locally by moving the computation/processing to data nodes instead of bringing data from data nodes to computation server.
As HDFS is designed on the notion of “Write Once, Read multiple times", once a file is written to HDFS, Then it can’t be updated. But delete, append, and read Operations can be performed on HDFS files.
HDFS is not suitable for large number of small sized files but best suits for large sized files. Because file system namespace maintained by Namenode is limited by it’s main memory capacity as namespace is stored in namenode’s main memory and large number of files will result in big fsimage file.
HDFS Design Concepts
Important components in HDFS Architecture are:
HDFS is a block structured file system. Each HDFS file is broken into blocks of fixed size usually 128 MB which are stored across various data nodes on the cluster. Each of these blocks is stored as a separate file on local file system on data nodes (Commodity machines on cluster).
Thus to access a file on HDFS, multiple data nodes need to be referenced and the list of the data nodes which need to be accessed is determined by the file system metadata stored on Name Node.
So, any HDFS client trying to access/read a HDFS file, will get block information from Name Node first, and then based on the block id’s and locations, data will be read from corresponding data nodes/computer machines on cluster.
HDFS’s fsck command is a useful to get the files and blocks details of file system. Below command will list the blocks that make up each file in the file system.
Advantages of Blocks Feature
1. Quick Seek Time:
By default, HDFS Block Size is 128 MB which is much larger than any other file system. In HDFS, large block size is maintained to reduce the seek time for block access.
2. Ability to Store Large Files:
Another benefit of this block structure is that, there is no need to store all blocks of a file on the same disk or node. So, a file’s size can be larger than the size of a disk or node.
3. How Fault Tolerance is achieved with HDFS Blocks:
HDFS blocks feature suits well with the replication for providing fault tolerance and availability.
By default each block is replicated to three separate machines. This feature insures blocks against corrupted blocks or disk or machine failure. If a block becomes unavailable, a copy can be read from another machine. And a block that is no longer available due to corruption or machine failure can be replicated from its alternative machines to other live machines to bring the replication factor back to the normal level (3 by default).
Name Node is the single point of contact for accessing files in HDFS and it determines the block ids and locations for data access. So, Name Node plays a Master role in Master/Slaves Architecture where as Data Nodes acts as slaves. File System metadata is stored on Name Node.
File System Metadata contains majorly, File names, File Permissions and locations of each block of files. Thus, Metadata is relatively small in size and fits into Main Memory of a computer machine. So, it is stored in Main Memory of Name Node to allow fast access.
Important Components of Name Node
FsImage: It is a file on Name Node’s Local File System containing entire HDFS file system namespace (including mapping of blocks to files and file system properties)
EditLog: It is a Transaction Log residing on Name Node’s Local File System and contains a record/entry for every change that occurs to File System Metadata.
Since Name node is the only communication gateway for clients/users/applications to get the actual data residing on data nodes, It is a single point of failure and in fact, if this machine fails by any chance, then all the files on the HDFS file system would be lost since there is no other way of knowing how to reconstruct the files from the blocks on the data nodes.
So it is very important to keep the Name Node resilient to failure. One of the methods for this is maintaining Secondary Name Node.
Note: Only One Active Name Node is allowed on a cluster at any point of time.
Data Nodes are the slaves part of Master/Slaves Architecture and on which actual HDFS files are stored in the form of fixed size chunks of data which are called blocks.
Data Nodes serve read and write requests of clients on HDFS files and also perform block creation, replication and deletions.
Data Nodes Failure Recovery
Each data node on a cluster periodically sends a heartbeat message to the name node which is used by the name node to discover the data node failures based on missing heartbeats.
The name node marks data nodes without recent heartbeats as dead, and does not dispatch any new I/O requests to them. Because data located at a dead data node is no longer available to HDFS, data node death may cause the replication factor of some blocks to fall below their specified values. The name node constantly tracks which blocks must be re-replicated, and initiates replication whenever necessary.
Thus all the blocks on a dead data node are re-replicated on other live data nodes and replication factor remains normal.