**Creating Table in HIVE :**

1 2 3 4 5 6 |
hive> create external table Tri100(id int,name string,location varchar(30),sal int,Hike int) > row format delimited > fields terminated by ',' > lines terminated by '\n' > stored as textfile location '/Jayanth/Jay/'; |

**Aggregated Functions and Normal Queries:**

1 2 3 4 5 6 7 8 9 10 |
hive> select * from Tri100; OK 1 rahul Hyderabad 3000 40000 2 Mohit Banglore 22000 25000 3 Rohan Banglore 33000 40000 4 Ajay Bangladesh 40000 45000 5 srujay Srilanka 25000 30000 Time taken: 0.184 seconds, Fetched: 5 row(s) |

**SUM**

**Returns the sum of the elements in the group or sum of the distinct values of the column in the group.**

1 2 3 4 5 6 7 8 9 10 11 |
hive> select sum(sal) from Tri100; OK 150000 Time taken: 17.909 seconds, Fetched: 1 row(s) hive> select Sum(sal) from Tri100 where loccation='Banglore'; OK 55000 Time taken: 18.324 seconds, Fetched: 1 row(s) |

**Count**

**count(*) – Returns the total number of retrieved rows, including rows containing NULL values;**

**count(expr) – Returns the number of rows for which the supplied expression is non-NULL;**

**count(DISTINCT expr[, expr]) – Returns the number of rows for which the supplied expression(s) are unique and non- NULL;**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
hive> select count(*) from Tri100; OK 5 Time taken: 16.307 seconds, Fetched: 1 row(s) hive> select count(distinct location) from Tri100; OK 4 Time taken: 17.37 seconds, Fetched: 1 row(s) hive> select count(*) from Tri100 where sal>30000; OK 2 Time taken: 18.36 seconds, Fetched: 1 row(s) hive> select count(location) from Tri100; OK 5 Time taken: 17.338 seconds, Fetched: 1 row(s) |

**Average**

**Returns the average of the elements in the group or the average of the distinct values of the column in the group.**

1 2 3 4 5 6 7 8 9 10 11 12 |
hive> select avg(sal) from Tri100 where location='Banglore'; OK 27500.0 Time taken: 17.276 seconds, Fetched: 1 row(s) hive> select avg(distinct sal) from Tri100; OK 30000.0 Time taken: 17.276 seconds, Fetched: 1 row(s) |

**Minimum**

**Returns the minimum of the column in the group.**

1 2 3 4 5 6 |
hive> select min(sal) from Tri100; OK 22000 Time taken: 17.368 seconds, Fetched: 1 row(s) |

**Maximum**

**Returns the maximum of the column in the group.**

1 2 3 4 5 6 |
hive> select max(sal) from Tri100; OK 40000 Time taken: 17.267 seconds, Fetched: 1 row(s) |

**Variance**

**Returns the variance of a numeric column in the group.**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
hive> select variance(sal) from Tri100; OK 3.96E7 Time taken: 17.223 seconds, Fetched: 1 row(s) hive> select var_pop(sal) from Tri100; OK 3.96E7 Time taken: 17.195 seconds, Fetched: 1 row(s) Returns the unbiased sample variance of a numeric column in the group. hive> select var_samp(sal) from Tri100; OK 4.95E7 Time taken: 17.245 seconds, Fetched: 1 row(s) |

**Standard Deviation**

**Returns the Standard Deviation of a numeric column in the group.**

1 2 3 4 5 6 7 8 9 10 11 12 13 |
hive> select stddev_pop(sal) from Tri100; OK 6292.8530890209095 Time taken: 18.63 seconds, Fetched: 1 row(s) Returns the unbiased sample Standard Deviation of a numeric column in the group. hive> select stddev_samp(sal) from Tri100; OK 7035.623639735144 Time taken: 17.299 seconds, Fetched: 1 row(s) |

**Covariance**

**Returns the population covariance of a pair of numeric columns in the group.**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
hive> select covar_pop(sal,Hike) from Tri100; OK 4.4E7 Time taken: 18.888 seconds, Fetched: 1 row(s) Returns the sample covariance of a pair of numeric columns in the group. hive> select covar_samp(sal,Hike) from Tri100; OK 5.5E7 Time taken: 18.302 seconds, Fetched: 1 row(s) |

**Correlation**

**Returns the Pearson coefficient of correlation of a pair of a numeric columns in the group.**

1 2 3 4 5 6 |
hive> select corr(sal,Hike) from Tri100; OK 0.9514987095307504 Time taken: 17.514 seconds, Fetched: 1 row(s) |

**Percentile**

**Returns the exact pth percentile of a column in the group(does not work with floating point types).****P must be between 0 and 1. NOTE: A true percentile “ Percentile(BIGINT col,P)”can only be computed for INTEGER values. Use PERCENTILE_APPROX if you are input is non-integral.**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
hive> select percentile(sal,0) from Tri100;------------------------Output Gives Lower Value of table as P is 0.It takes lower value as 0%. OK 22000.0 Time taken: 17.321 seconds, Fetched: 1 row(s) hive> select percentile(sal,1) from Tri100; -----------------------Output Gives Higher Value of table as P is 1.It takes Higher value as 100%. OK 40000.0 Time taken: 17.966 seconds, Fetched: 1 row(s) hive> select percentile(sal,0.5) from Tri100; OK 30000.0 Time taken: 17.368 seconds, Fetched: 1 row(s) |

**Histogram**

**Computes a histogram of a numeric column in the group using b non-uniformly spaced bins.****The output is an array of size b of double-valued (x,y) coordinates that represent the bin centers and heights. “histogram_numeric(col, b)”**

1 2 3 4 5 6 |
hive> select histogram_numeric(sal,5) from Tri100; OK [{"x":22000.0,"y":1.0},{"x":25000.0,"y":1.0},{"x":30000.0,"y":1.0},{"x":33000.0,"y":1.0},{"x":40000.0,"y":1.0}] Time taken: 17.534 seconds, Fetched: 1 row(s) |

**Collections**

**Returns a set of objects with duplicate elements eliminated.**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
hive> select collect_set(Hike) from Tri100; OK [45000,40000,25000,30000] Time taken: 18.29 seconds, Fetched: 1 row(s) Returns a set of objects with duplicates(as of Hive 0.13.0) hive> select collect_list(Hike) from Tri100; OK [40000,25000,40000,45000,30000] Time taken: 17.217 seconds, Fetched: 1 row(s) |

**NTILE**

**This function divides an ordered partition into x groups called buckets and assigns a bucket number to each row in the partition. This allows easy calculation of tertiles, quartiles, deciles, percentiles and other common summary statistics. (As of Hive 0.11.0.).**

1 2 3 4 5 6 7 8 9 10 |
hive> select name,Hike,NTILE(3) over (order by sal DESC) from Tri100; OK Ajay 45000 1 Rohan 40000 1 Rahul 40000 2 Srujay 30000 2 Mohit 25000 3 Time taken: 17.217 seconds, Fetched: 1 row(s) |

The create table statement ( 1st command ) fails . location is reserved word in hive , can’t be used ad column name