HDFS File System Commands

Command Line is one of the simplest interface to Hadoop Distributed File System. Below are the basic HDFS File System Commands which are similar to UNIX file system commands. Once the hadoop daemons are started running, HDFS file system is ready and file system operations like creating directories, moving files, deleting files, reading files and listing directories. We can get list of FS Shell commands with below command.

Hadoop fs command Options

Some of the important commands from above list are described below with examples.

1. mkdir: 

similar to Unix mkdir command, it is used for creating directories in HDFS.

Syntax:

-p  Do not fail if the directory already exists

Notes:

  • In order to create a sub directory /user/hadoop, its parent directory /user must already exist. Otherwise ‘No such file or directory’ error message will be returned.

FS mkdir

2. ls:

similar to Unix ls command, it is used for listing directories in HDFS. The -lsr command can be used for recursive listing.

Syntax:

List the contents that match the specified file pattern. If path is not specified, the contents of /user/<currentUser> will be listed. Directory entries are of the form:

and file entries are of the form:

-d Directories are listed as plain files.
-h Formats the sizes of files in a human-readable fashion rather than a number of bytes.
-R Recursively list the contents of directories.

FS ls

3. put:

Copies files from local file system to HDFS. This is similar to -copyFromLocal command.

Syntax:

Copying fails if the file already exists, unless the -f flag is given. Passing -p preserves access and modification times, ownership and the mode. Passing -f overwrites the destination if it already exists.

FS put

4. get:

Copies files from HDFS to local file system. This is similar to -copyToLocal  command.

FS get

5. cat:

similar to Unix cat command, it is used for displaying contents of a file.

FS cat

6. cp:

similar to Unix cp command, it is used for copying files from one directory to another within HDFS.

FS cp

7. mv:

similar to Unix mv command, it is used for moving a file from one directory to another within HDFS.

FS mv

8. rm:

similar to Unix rm command, it is used for removing a file from HDFS. The command -rmr can be used for recursive delete.

Syntax:

-skipTrash   option bypasses trash, if enabled, and immediately deletes <src>
-f                          If the file does not exist, do not display a diagnostic message or  modify the exit status to reflect an error.
-[rR]                  Recursively deletes directories

Note:

  • Directories can’t be deleted by -rm command. We need to use -rm -r (recursive remove) command to delete directories and files inside them. Only files can be deleted by -rm command.

FS rmr

We can use hadoop fs -rmdir command to delete directories.

9. getmerge:

It is one of the important and useful command when trying to read the contents of map reduce job or pig job’s output files. It is used for merging a list of files in one directory on HDFS into a single file on local file system.

FS getmerge

10. setrep:

This command is used to change the replication factor of a file to a specific instead of the default of replication factor for the remaining in HDFS.

If <path> is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at <path>.

Syntax:

-w It requests that the command waits for the replication to complete. This  can potentially take a very long time.
-R It is accepted for backwards compatibility. It has no effect.

Example:

FS-setrep

11. touchz:

This command can be used to create a file of zero length in HDFS.

12. test:

This command can be used to test a hdfs file’s existence or zero length or is it a directory. Syntax is

Options:

-d return 0 if <path> is a directory.
-e return 0 if <path> exists.
-f return 0 if <path> is a file.
-s return 0 if file <path> is greater than zero bytes in size.
-z return 0 if file <path> is zero bytes in size, else return 1.

FS touchz

13. expunge:

This command is used to empty the trash in hadoop file system.

Its syntax is –

Example run

14. appendToFile:

Appends the contents of all the given local files to the given destination file on HDFS. The destination file will be created if it does not exist. If <localSrc> is , then the input is
read from stdin.

Syntax is:

Example Run:

15. tail:

Shows the last 1KB of the file. Syntax is:

-f Shows appended data as the file grows.

Example run:

16. stat:

This option prints statistics about the file/directory at <path> in the specified format.

Syntax:

Format accepts file size in blocks (%b), group name of owner(%g), file name (%n),
block size (%o), replication (%r), user name of owner(%u), modification date
(%y, %Y)

Example Run:

17. setfattr:

Sets an extended attribute name and value for a file or directory.

Syntax:

-n name The extended attribute name.
-v value The extended attribute value. There are three different encoding methods for the value. If the argument is enclosed in double quotes, then the value is the string inside the quotes. If the argument is prefixed with 0x or 0X, then it is taken as a hexadecimal number. If the argument begins with 0s or 0S, then it is taken as a base64 encoding.
-x name Remove the extended attribute  <path> The file or directory.

18. df:

Shows the capacity, free and used space of the filesystem. If the filesystem has
multiple partitions, and no path to a particular partition is specified, then
the status of the root partitions will be shown.

Syntax:

-h Formats the sizes of files in a human-readable fashion rather than a number
of bytes.

19. du:

Show the amount of space, in bytes, used by the files that match the specified
file pattern.

Syntax:

The following flags are optional:

-s Rather than showing the size of each individual file that matches the  pattern, shows the total (summary) size.
-h Formats the sizes of files in a human-readable fashion rather than a number
of bytes.

Note that, even without the -s option, this only shows size summaries one level
deep into a directory.

The output is in the form:

20. count:

Count the number of directories, files and bytes under the paths
that match the specified file pattern.

Syntax:

The output columns are:

21. chgrp:

Changes group of a file or path.

Syntax:

22. chmod:

Changes permissions of a file. This works similar to the Linux shell’s chmod command
with a few exceptions.

Syntax:

-R modifies the files recursively. This is the only option currently supported.
<MODE> Mode is the same as mode used for the shell’s command. The only
letters recognized are ‘rwxXt’, e.g. +t,a+r,g-w,+rwx,o=r.
<OCTALMODE> Mode specifed in 3 or 4 digits. If 4 digits, the first may be 1 or 0 to turn the sticky bit on or off, respectively. Unlike the shell command, it is not possible to specify only part of the mode, e.g. 754 is same as u=rwx,g=rx,o=r.

If none of ‘augo’ is specified, ‘a’ is assumed and unlike the shell command, no
umask is applied.

23. chown:

Changes owner and group of a file. This is similar to the shell’s chown command
with a few exceptions.

Syntax:

-R modifies the files recursively. This is the only option currently supported.

If only the owner or group is specified, then only the owner or group is
modified. The owner and group names may only consist of digits, alphabet, and
any of [-_./@a-zA-Z0-9]. The names are case sensitive.

WARNING: Avoid using ‘.’ to separate user name and group though Linux allows it.
If user names have dots in them and you are using local file system, you might
see surprising results since the shell command ‘chown’ is used for local files.