Yearly Archives: 2014

Most Popular Hadoop Distributions

Most Popular Hadoop Distributions Currently there are lot of Hadoop distributions available in the big data market, but the major free open source distribution is from Apache Software Foundation. And even remaining hadoop distribution companies provide free versions of Hadoop, and also provide customized hadoop distributions suitable for client organization needs. By using Apache Hadoop as the core framework, these companies build their own customized hadoop cluster setup and services […]

Big Data Challenges 1

In the previous post we have discussed about brief introduction to Big Data, and now we will discuss about Big Data Challenges along with its characteristics. Before going into big data challenges, we will briefly go through the characteristics of Big data. Big Data Characteristics Often Big data characteristics are described with the help of Five Vs (Big Data Volume Velocity Variety and Veracity). They are as follows. Volume –  How […]

Big Data Introduction 3

We have been discussing all technical details on hadoop and its eco system tools in all categories of this site till now. To be successful for any hadoop developer, it is very important to focus on the data part in addition to technical details of Hadoop architecture and its sub-components. In any industry, at the end of day, business usage/ business benefits out of out one tool or product will rule […]

Sqoop Import Command Arguments 2

In this post we will discuss about one of the important commands in Apache Sqoop, Sqoop Import Command Arguments with examples. This documentation is applicable for sqoop versions 1.4.5 or later because earlier versions doesn’t support some of the below mentioned arguments to import command As of Sqoop 1.4.5 version, Sqoop import command supports various number of arguments to import relational database tables into below tools or services. HDFS Hive […]

Bucketing In Hive 28

In our previous post we have discussed about partitioning in Hive, now we will focus on Bucketing In Hive, which is another way of giving more fine grained structure to Hive tables. Bucketing in Hive Usually Partitioning in Hive offers a way of segregating hive table data into multiple files/directories. But partitioning gives effective results when, There are limited number of partitions Comparatively equal sized partitions But this may not […]

Partitioning in Hive 32

In this post, we will discuss about one of the most critical and important concept in Hive, Partitioning in Hive Tables. Partitioning in Hive Table partitioning means dividing table data into some parts based on the values of particular columns like date or country, segregate the input records into different files/directories based on date or country. Partitioning can be done based on more than column which will impose multi-dimensional structure on directory […]

Hive Data Types With Examples 9

In this post, we will discuss about all Hive Data Types With Examples for each data type. Hive supports most of the primitive data types supported by many relational databases and even if anything are missing, they are being added/introduced to hive in each release. Hive Data Types With Examples Hive Data types are used for specifying the column/field type in Hive tables. Hive data types can be classified into two […]

Hive Table Creation Commands 2

In this post, we will discuss about hive table commands with examples. This post can be treated as sequel to the previous post Hive Database Commands. Hive Table Creation Commands Introduction to Hive Tables In Hive, Tables are nothing but collection of homogeneous data records which have same schema for all the records in the collection. Hive Table = Data Stored in HDFS + Metadata (Schema of the table) stored […]

Hive Database Commands 1

In this post, we will discuss about Hive Database Commands (Create/Alter/Use/Drop Database) with some examples for each statement. All these commands and their options are from hive-0.14.0 release documentations. So, in order to use these commands with all the options described below we need at least hive-0.14.0 release. Hive Database Commands Note From Hive-0.14.0 release onwards Hive DATABASE is also called as SCHEMA. So, Both SCHEMA and DATABASE are same in […]

QlikView Integration with Hadoop 2

In this post we will discuss about basic introduction to Qlikview BI tool and Qlikview Integration with hadoop hive. In this post we will use Cloudera Hive and its jdbc drivers/connectors to connect with Qlikview and we will see sample table retrieval from cloudera hadoop hive database. QlikView Overview What is QlikView? QlikView is one of the famous business intelligence and visualization software/tool build by Qlik (previously known as QlikTech) company for […]

Run Remote Commands over SSH

In this post, we will discuss about the details on communication between two nodes in a network via SSH and executing/running remote commands over SSH on a remote machine. These two nodes in the cluster can be treated as server/client machines for easy understanding. To allow secure communications between Server and client machines, on the server side, we will need a public key and an authorization file, and on the […]

Brief Notes on Unix Shell Scripting Concepts

This post provides a very brief notes on Unix Shell Scripting. As this topic is very well described in many text books,we are not going much deep into the details of each point. This post is for quick review/revision/reference of common Unix commands or Unix Shell Scripting. Unix Shell Scripting Kernel The kernel is the heart of the UNIX system. It provides utilities with a means of accessing a machine’s […]