Monthly Archives: December 2014


Most Popular Hadoop Distributions

Most Popular Hadoop Distributions Currently there are lot of Hadoop distributions available in the big data market, but the major free open source distribution is from Apache Software Foundation. And even remaining hadoop distribution companies provide free versions of Hadoop, and also provide customized hadoop distributions suitable for client organization needs. By using Apache Hadoop as the core framework, these companies build their own customized hadoop cluster setup and services […]


Big Data Challenges 1

In the previous post we have discussed about brief introduction to Big Data, and now we will discuss about Big Data Challenges along with its characteristics. Before going into big data challenges, we will briefly go through the characteristics of Big data. Big Data Characteristics Often Big data characteristics are described with the help of Five Vs (Big Data Volume Velocity Variety and Veracity). They are as follows. Volume –  How […]


Big Data Introduction 3

We have been discussing all technical details on hadoop and its eco system tools in all categories of this site till now. To be successful for any hadoop developer, it is very important to focus on the data part in addition to technical details of Hadoop architecture and its sub-components. In any industry, at the end of day, business usage/ business benefits out of out one tool or product will rule […]


Sqoop Import Command Arguments 2

In this post we will discuss about one of the important commands in Apache Sqoop, Sqoop Import Command Arguments with examples. This documentation is applicable for sqoop versions 1.4.5 or later because earlier versions doesn’t support some of the below mentioned arguments to import command As of Sqoop 1.4.5 version, Sqoop import command supports various number of arguments to import relational database tables into below tools or services. HDFS Hive […]


Bucketing In Hive 28

In our previous post we have discussed about partitioning in Hive, now we will focus on Bucketing In Hive, which is another way of giving more fine grained structure to Hive tables. Bucketing in Hive Usually Partitioning in Hive offers a way of segregating hive table data into multiple files/directories. But partitioning gives effective results when, There are limited number of partitions Comparatively equal sized partitions But this may not […]


Partitioning in Hive 32

In this post, we will discuss about one of the most critical and important concept in Hive, Partitioning in Hive Tables. Partitioning in Hive Table partitioning means dividing table data into some parts based on the values of particular columns like date or country, segregate the input records into different files/directories based on date or country. Partitioning can be done based on more than column which will impose multi-dimensional structure on directory […]


Hive Data Types With Examples 9

In this post, we will discuss about all Hive Data Types With Examples for each data type. Hive supports most of the primitive data types supported by many relational databases and even if anything are missing, they are being added/introduced to hive in each release. Hive Data Types With Examples Hive Data types are used for specifying the column/field type in Hive tables. Hive data types can be classified into two […]


Hive Table Creation Commands 2

In this post, we will discuss about hive table commands with examples. This post can be treated as sequel to the previous post Hive Database Commands. Hive Table Creation Commands Introduction to Hive Tables In Hive, Tables are nothing but collection of homogeneous data records which have same schema for all the records in the collection. Hive Table = Data Stored in HDFS + Metadata (Schema of the table) stored […]


Hive Database Commands 1

In this post, we will discuss about Hive Database Commands (Create/Alter/Use/Drop Database) with some examples for each statement. All these commands and their options are from hive-0.14.0 release documentations. So, in order to use these commands with all the options described below we need at least hive-0.14.0 release. Hive Database Commands Note From Hive-0.14.0 release onwards Hive DATABASE is also called as SCHEMA. So, Both SCHEMA and DATABASE are same in […]