Java Interface to HDFS File Read Write 1


Java Interface to HDFS File Read Write

This post describes Java interface to HDFS File Read Write and it is a continuation for previous post, Java Interface for HDFS I/O. 

Reading HDFS Files Through FileSystem API:

In order to read any File in HDFS,

  • We first need to get an instance of FileSystem underlying the cluster.
  • Then we need to get an InputStream to read from the data of the file.
  • We can use IOUtils class static methods to copyBytes from input stream to any other stream like standard output, or we can use read() or readFully() methods on InputStream to read the data into byte buffers.

Below is a sample program for reading data from HDFS file to Standard output.

Compile the above program and add the directory containing class name HdfsRead.class to HADOOP_CLASSPATH. For example, $ export HADOOP_CLASSPATH=’/opt/eclipse/workspace/HDFS IO/bin’ run the above class with $ hadoop CLASSNAME args command. HdfsRead We can also close the streams directly by invoking close() method on FSDataInputStream or FSDataOutputStream objects instead of calling IOUtils.closeStream() method. i.e. in.close() is equivalent to IOUtils.closeStream(in);

Writing HDFS Files Through FileSystem API:

To write a file in HDFS,

  • First we need to get instance of FileSystem.
  • Create a file with create() method on file system instance which will return an FSDataOutputStream.
  • We can copy bytes from any other stream to output stream using IOUtils.copyBytes() or write directly with write() or any of its flavors method on object of FSDataOutputStream.

Below is a sample program for copying data into a HDFS file from another HDFS file.

and run the HdfsWrite class with hadoop command. HdfsWrite


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “Java Interface to HDFS File Read Write

  • Divya

    Hi Siva,

    I am trying to run a simple hadoop program which prints the contents of the file on the console.

    when i try to execute the program i am getting malformed url : no protocol.

    My code is from Hadoop the definitive guide : URLCat

    Command :

    hadoop fs -cat hdfs://localhost/user/training/test.txt

    Output

    This is the context of the test.txt file

    Command:

    hadoop jar Hadoop-Examples.jar URLCat hdfs://localhost/user/training/test.txt

    Output:

    java.net.URL.MalformedURLException : no protocol

    I hope i have provided enough input. Please let me know if i need to provide more input. I am waiting for your kind response.

     


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.