This post describes Java interface to HDFS File Read Write and it is a continuation for previous post, Java Interface for HDFS I/O.
Reading HDFS Files Through FileSystem API:
In order to read any File in HDFS,
- We first need to get an instance of FileSystem underlying the cluster.
- Then we need to get an InputStream to read from the data of the file.
- We can use IOUtils class static methods to copyBytes from input stream to any other stream like standard output, or we can use read() or readFully() methods on InputStream to read the data into byte buffers.
Below is a sample program for reading data from HDFS file to Standard output.
Compile the above program and add the directory containing class name HdfsRead.class to HADOOP_CLASSPATH. For example, $ export HADOOP_CLASSPATH=’/opt/eclipse/workspace/HDFS IO/bin’ run the above class with $ hadoop CLASSNAME args command. We can also close the streams directly by invoking close() method on FSDataInputStream or FSDataOutputStream objects instead of calling IOUtils.closeStream() method. i.e. in.close() is equivalent to IOUtils.closeStream(in);
Writing HDFS Files Through FileSystem API:
To write a file in HDFS,
- First we need to get instance of FileSystem.
- Create a file with create() method on file system instance which will return an FSDataOutputStream.
- We can copy bytes from any other stream to output stream using IOUtils.copyBytes() or write directly with write() or any of its flavors method on object of FSDataOutputStream.
Below is a sample program for copying data into a HDFS file from another HDFS file.