Hive Data Types With Examples 9


In this post, we will discuss about all Hive Data Types With Examples for each data type. Hive supports most of the primitive data types supported by many relational databases and even if anything are missing, they are being added/introduced to hive in each release.

Hive Data Types With Examples

Hive Data types are used for specifying the column/field type in Hive tables. Hive data types can be classified into two categories.

  • Primary Data Types
  • Complex Data Types

Primary Data Types

Primary Data Types are further classified into four categories. They are:

  • Numeric Types
  • String Types
  • Date/Time Types
  • Miscellaneous Types

These data types and their sizes are similar to Java/SQL primitive data types and sizes.

Numeric Data Types

Integral types are – TINYINT, SMALLINT, INT & BIGINT – Equivalent to Java’s byte , short , int , and long primitive types

Floating types are – FLOAT, DOUBLE & DECIMAL. – Equivalent to Java’s float and double , and SQL’s Decimal respectively.

DECIMAL(5,2) represents total of 5 digits, out of which 2 are decimal digits. Below is the chart for all numeric types with their ranges and examples.

Numeric Data types

In Hive, by default integral values are treated as INT unless they cross the range of INT values as shown in above table. But if we need to use a low integral value like 100 to be treated as TINYINT or SMALLINT or BIGINT then we need to suffix the value with Y, S or L respectively.

Examples: 100Y – TINYINT, 100S – SMALLINT, 100L – BIGINT

String Data Types

Three types of string data types are supported in Hive as of 0.14 release. They are listed in below chart.

String data types

CHAR vs VARCHAR
  • CHAR is fixed length and values shorter than are padded with spaces.
  • VARCHAR is of variable length but we need to specify the max length of the field (example : name VARCHAR(64)). If the values are less than the max length specified then remaining space will be freed out.
  • The maximum length of CHAR is 255 but VARCHAR can be upto 65355 bytes.
  • Space/storage optimization is done in VARCHAR by releasing the unused bytes but in CHAR unused bytes will  not be released but filled with spaces.
  • If a string value being assigned to a VARCHAR value exceeds the length specified, then the string is silently truncated.
Date/Time Data Types

Hive provides DATE and TIMESTAMP data types in traditional UNIX time stamp format for date/time related fields in hive.

DATE values are represented in the form YYYY-­MM-­DD. Example: DATE ‘2014-­12-­07’. Date ranges allowed are 0000-­01-­01 to 9999-­12-­31.

TIMESTAMP use the format yyyy-mm-dd hh:mm:ss[.f...]. 

We can also cast the String, Time-stamp values to Date format if they match format but declared as String or Time-stamp data type.

Cast Type Result
cast(date as date) Same date value
cast(date as string) Date is formatted as a string in the form ‘YYYY-MM-DD’.
cast(date as timestamp) Midnight of the year/month/day of the date value is returned as timestamp.
cast(string as date) If the string is in the form ‘YYYY-MM-DD’, then a date value corresponding to that is returned. If the string value does not match this format, then NULL is returned.
cast(timestamp as date) The year/month/day of the timestamp is returned as a date value.
Miscellaneous Types

Hive supports two more primitive data types, BOOLEAN and BINARY. Similar to Java’s Boolean, BOOLEAN in hive stores true or false values only.

BINARY is an array of Bytes and similar to VARBINARY in many RDBMSs. BINARY columns are stored within the record, not separately like BLOBs . We can include arbitrary bytes in BINARY column and these bytes are not parsed by Hive as numbers or strings.

Implicit Conversion Between Primitive Data Types
  • Primitive Type
    • Number
      • DOUBLE
        • FLOAT
          • BIGINT
            • INT
              • SMALLINT
                • TINYINT
        • DECIMAL (Can be converted to String, varchar only)
        • STRING (Can be converted to Varchar, Double, Decimal)
        • VARCHAR (Can be converted to String, Double, Decimal)
          • DATE (Converted to String, varchar)
          • TIMESTAMP (same as date)
    • BOOLEAN
    • BINARY

In the above hierarchy, Implicit conversion is allowed for types from child to an ancestor. So, TINYINT can be converted to any other numeric data type but BIGINT can only be converted to FLOAT or DOUBLE but the reverse

Boolean & Binary data types will not be  converted to any other data type implicitly.

Explicit Conversion

Explicit type conversion can be done using the cast operator only.

Example: CAST(‘500’ AS INT) will convert the string ‘500’ to the integer value 500.  But If cast is used incorrectly as in CAST(‘Hello’ AS INT) , then cast operation will fail and returns NULL .

Complex Data Types

As discussed above, all the primitive data types in Hive are similar to primitive data types in other languages or RDBMSs. But in addition to primitive data types, Hive also support complex data types (or also known as collection data types) which are not available in many RDBMSs.

Complex Types can be built up from primitive types and other composite types. Data type of the fields in the collection are specified using an angled bracket notation. Currently Hive supports four complex data types. They are:

  • ARRAY – An Ordered sequences of similar type elements that are indexable using
    zero-based integers. It is similar to arrays in Java.

Example – array (‘siva’, ‘bala’, ‘praveen’); Second element is accessed with array[1].

  • MAP – Collection of key-value pairs. Fields are accessed using array notation of keys (e.g., [‘key’]).

Example – ‘first’ -> ‘bala’ , ‘last’ -> ‘PG’ is represented as map(‘first’, ‘bala’, ‘last’, ‘PG’). Now ‘bala ‘ can be accessed with map[‘first’].

  • STRUCT – It is similar to STRUCT in C language. It is a record type which encapsulates a set of named fields that can be any primitive data type. Elements in STRUCT type are accessed using the DOT (.) notation.

Example – For a column c of type STRUCT {a INT; b INT} the a field is accessed by the expression c.a

  • UNIONTYPE – It is similar to Unions in C. At any point of time, an Union Type can hold any one (exactly one) data type from its specified data types.

For example if we declare a column as Union Type as shown in below.

 Then retrieving results for col1 from test table can be any of the below possibilities.

Default delimiters table for the fields in collection data types.

Delimiter Code Description
\n \n Record or row delimiter
^A  (Ctrl+A) \001 Field delimiter
^B  (Ctrl+B) \002 Element delimiter in ARRAYs and STRUCTs
^C  (Ctrl+C) \003 Delimits key/value pairs in a MAP
Example Table Creation Using Above data types

Below is the sample User table creation to cover most of the data types discussed in this post.

References

Apache Hive Manualhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

9 thoughts on “Hive Data Types With Examples

  • venkat sandeep

    Below is the examples for complex data types.

    ARRAY:

    $ cat >arrayfile
    1,abc,40000,a$b$c,hyd
    2,def,3000,d$f,bang

    hive> create table tab7(id int,name string,sal bigint,sub array<string>,city string)
    > row format delimited
    > fields terminated by ‘,’
    > collection items terminated by ‘$’;

    hive>select sub[2] from tab7 where id=1;

    hive>select sub[0] from tab7;

    MAP:

    $ cat >mapfile
    1,abc,40000,a$b$c,pf#500$epf#200,hyd
    2,def,3000,d$f,pf#500,bang

    hive>create table tab10(id int,name string,sal bigint,sub array<string>,dud map<string,int>,city string)
    row format delimited
    fields terminated by ‘,’
    collection items terminated by ‘$’
    map keys terminated by ‘#’;

    hive> load data local inpath ‘/home/training/mapfile’ overwrite into table tab10;

    hive>select dud[“pf”] from tab10;

    hive>select dud[“pf”],dud[“epf”] from tab10;

    STRUCT:

    cat >mapfile
    1,abc,40000,a$b$c,pf#500$epf#200,hyd$ap$500001
    2,def,3000,d$f,pf#500,bang$kar$600038

    hive> create table tab11(id int,name string,sal bigint,sub array<string>,dud map<string,int>,addr struct<city:string,state:string,pin:bigint>)
    > row format delimited
    > fields terminated by ‘,’
    > collection items terminated by ‘$’
    > map keys terminated by ‘#’;

    hive> load data local inpath ‘/home/training/structfile’ into table tab11;

    hive>select addr.city from tab11;

  • taj

    i have declare one field  like below  in hive table :

    addressprofid   uniontype<int,array<string>>

    and i have load data as  100  from file to table  now its returning result as null.

     

  • m123

    SELECT sum(CASE WHEN sex_age.sex = ‘Male’
    . . . . . . .> THEN sex_age.age ELSE 0 END)/
    . . . . . . .> count(CASE WHEN sex_age.sex = ‘Male’ THEN 1
    . . . . . . .> ELSE NULL END) AS male_age_avg FROM employee

     

    how to covert int to string

    Argument type mismatch ‘0’: The expression after ELSE should have the same type as those after THEN: “string” is expected but “int” is found

  • Ramesh Naik

    how to handle bit data type (0, 1 or NULL) of Mysql in HIVE/Impala.
    I mean which datatype in hive/impala can be used while table creation to ingest data later from mysql database


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

.