Hive Aggregate Functions 1


Creating Table in HIVE :

Aggregated Functions and Normal Queries:

SUM

Returns the sum of the elements in the group or sum of the distinct values of the column in the group.

Count

count(*) – Returns the total number of retrieved rows, including rows containing NULL values;

count(expr) – Returns the number of rows for which the supplied expression is non-NULL;

count(DISTINCT expr[, expr]) – Returns the number of rows for which the supplied expression(s) are unique and non- NULL;

Average

Returns the average of the elements in the group or the average of the distinct values of the column in the group.

Minimum

Returns the minimum of the column in the group.

Maximum

Returns the maximum of the column in the group.

Variance

Returns the variance of a numeric column in the group.

Standard Deviation

Returns the Standard Deviation of a numeric column in the group.

Covariance

Returns the population covariance of a pair of numeric columns in the group.

Correlation

Returns the Pearson coefficient of correlation of a pair of a numeric columns in the group.

Percentile

Returns the exact pth percentile of a column in the group(does not work with floating point types).P must be between 0 and 1. NOTE: A true percentile “ Percentile(BIGINT col,P)”can only be computed for INTEGER values. Use PERCENTILE_APPROX if you are input is non-integral.

Histogram

Computes a histogram of a numeric column in the group using b non-uniformly spaced bins.The output is an array of size b of double-valued (x,y) coordinates that represent the bin centers and heights.    “histogram_numeric(col, b)”

Collections

Returns a set of objects with duplicate elements eliminated.

NTILE

This function divides an ordered partition into x groups called buckets and assigns a bucket number to each row in the partition. This allows easy calculation of tertiles, quartiles, deciles, percentiles and other common summary statistics. (As of Hive 0.11.0.).

 

 


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “Hive Aggregate Functions


Review Comments
default gravatar

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA

.