Hive


Hadoop and Hive Interview Cheat Sheet 1

Hive SQL Based Datawarehouse app built on top of hadoop(select,join,groupby…..) It is a platform used to develop SQL type scripts to do MapReduce operations. PARTITIONING Partition tables changes how HIVE structures the data storage *Used for distributing load horizantally ex: PARTITIONED BY (country STRING, state STRING); A subset of a table’s data set where one column has the same value for all records in the subset. In Hive, as in most databases […]


String Functions in Hive 2

This post is about basic String Functions in Hive with syntax and examples. Creating Table in HIVE:

String Functions and Normal Queries:

ASCII ASCII Function converts the first character of the string into its numeric ASCII value.

CONCAT The CONCAT function concatenates all the strings/columns.

CONCAT_WS Syntax: “CONCAT_WS(string delimiter, string str1,str2……)" The CONCAT_WS function concatenates all the strings only strings and Column with datatype string.

[…]


Hive Aggregate Functions 1

Creating Table in HIVE :

Aggregated Functions and Normal Queries:

SUM Returns the sum of the elements in the group or sum of the distinct values of the column in the group.

Count count(*) – Returns the total number of retrieved rows, including rows containing NULL values; count(expr) – Returns the number of rows for which the supplied expression is non-NULL; count(DISTINCT expr[, expr]) – Returns the […]


Hive Date Functions 3

HIVE Date Functions from_unixtime: This function converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a STRING that represents the TIMESTAMP of that moment in the current system time zone in the format of “1970-01-01 00:00:00”. The following example returns the current date including the time.

from_utc_timestamp This function assumes that the string in the first expression is UTC and then, converts that string to the […]


Hive Functions Examples 2

Hive Functions Examples SET SHOW USE CREATE DATABASE CREATE MANAGED TABLE CREATE EXTERNAL TABLE CREATING TABLE FROM EXISTING TABLE CREATING EXTERNAL TABLES FROM MANAGED TABLES LOAD COPY DATA FROM ONE TABLE TO ANOHTER DROP QUIT SELECT DESCRIBE DESCRIBE SPECIFIC FIELD DESCRIBE EXTENDED ALTER CLONE SCHEMA (DATA IS NOT COPIED) CLONE SCHEMA TO ANOTHER DB USING REGULAR EXPRESSIONS MATHEMATICAL FUNCTIONS AGGREGATE FUNCTIONS LIMIT NESTED SELECT STATEMENT CASE..WHEN..THEN LIKE & RLIKE JOINS […]


Hive Performance Tuning 6

In our previous post we have discussed about hadoop job optimization or Hadoop Performance Tuning for Mapreduce jobs. In this post we will briefly discuss a few points on how to optimize hive queries/ Hive Performance tuning. If we do not fine tune Hive properly, then even for select queries on smaller tables in Hive, some times it may take minutes to emit results. So, because of this reason Hive […]