Hive


Hive Table Creation Commands 2

In this post, we will discuss about hive table commands with examples. This post can be treated as sequel to the previous post Hive Database Commands. Hive Table Creation Commands Introduction to Hive Tables In Hive, Tables are nothing but collection of homogeneous data records which have same schema for all the records in the collection. Hive Table = Data Stored in HDFS + Metadata (Schema of the table) stored […]


QlikView Integration with Hadoop 2

In this post we will discuss about basic introduction to Qlikview BI tool and Qlikview Integration with hadoop hive. In this post we will use Cloudera Hive and its jdbc drivers/connectors to connect with Qlikview and we will see sample table retrieval from cloudera hadoop hive database. QlikView Overview What is QlikView? QlikView is one of the famous business intelligence and visualization software/tool build by Qlik (previously known as QlikTech) company for […]


Creating Custom UDF in Hive – Auto Increment Column in Hive 13

In this post we will describe about the process of creating custom UDF in Hive. Though there are many generic UDFs (User defined functions)  provided by Hive we might need to write our custom UDFs sometime to meet our requirements. In this post, we will discuss about one of the general requirement for the clients, those migrating from any traditional RDBMSs to Hive, they will expect Auto Increment Column in […]


Hive Connectivity With Hunk (Splunk) 3

In this post we will discuss about the configuration required for Hive connectivity with Hunk, Hadoop flavor of Splunk, the famous visualization tool. Splunk Overview: Splunk tool captures, indexes and correlates real-time data in a searchable repository from which it can generate graphs, reports, dashboards and visualizations. Splunk released a product called Hunk: Splunk Analytics for Hadoop, which supports accessing, searching, and reporting on external data sets located in Hadoop from […]


Hive on Tez – Hive Integration with Tez 1

In this post, we will discuss about Hive integration with Tez framework or Enabling Tez for Hive Queries. And we will also run sample hive queries both on Mapreduce and Tez frameworks and we will evaluate the performance difference between Tez and MR Frameworks. Tez Advantages: Tez offers a customizable execution architecture that allows us to express complex computations as data flow graphs and allows for dynamic performance optimizations based […]


Apache Tez – Successor of Mapreduce Framework 4

Apache Tez Overview What is Apache Tez? Apache Tez is another execution framework project from Apache Software Foundation and it is built on top of Hadoop YARN. It is considered as a more flexible and powerful successor of the mapreduce framework. Apache Tez Features: Tez provides, Performance gain over Map Reduce also Provides backward compatibility to Mapreduce framework. Optimal resource management Plan reconfiguration at run-time Dynamic physical data flow decisions Tez is […]


Hive CLI Commands 1

In our previous posts, we have seen about Hive Overview and Hive Architecture and now we will discuss about the default service in hive, Hive Command Line Interface and Hive CLI Commands. Ways to Interact with Hive CLI, command-line interface . Karmasphere (http://karmasphere.com ) (commercial product), Cloudera’s open source Hue (https://git hub.com/cloudera/hue ), A new “Hive-as-a-service” offering from Qubole (http://qubole.com) A simple web interface called Hive web interface (HWI), and programmatic […]


Java vs Hive 3

In this post we will discuss the differences between Java vs Hive with the help of word count example. We will examine the Word Count Algorithm first using the Java MapReduce API and then using Hive. The following Java implementation is included in the Apache Hadoop distribution.

For implementing the Word Count algorithm we need to write 63 lines of Java code and we need to compile it and build a Jar […]