String Functions in Hive 1


This post is about basic String Functions in Hive with syntax and examples.

Creating Table in HIVE:

String Functions and Normal Queries:

ASCII

ASCII Function converts the first character of the string into its numeric ASCII value.

 CONCAT

The CONCAT function concatenates all the strings/columns.

CONCAT_WS

Syntax: “CONCAT_WS(string delimiter, string str1,str2……)”

The CONCAT_WS function concatenates all the strings only strings and Column with datatype string.

FIND_IN_SET :

Syntax: “FIND_IN_SET(string search_string,string source_string_list)”

The FIND_IN_SET function searches for the search_string in the source_string_list and returns the position of the first occurrence in the source_string_list. Here the source_string_list should be comma delimited one.

 LENGTH :

LENGTH function returns the number of characters in the string.

 LOWER or LCASE :

The LOWER and LCASE convert the string into the Lower case.

UPPER or UCASE:

The UPPER and UCASE convert the string into the UPPER case.

 

LPAD :

Syntax: “LPAD(string str,int len,string pad)”

The LPAD function returns the string with a length of len characters left-padded with pad.

RPAD :

Syntax: “RPAD(string str,int len,string pad)”

The RPAD function returns the string with a length of len characters Right-padded with pad.

TRIM

TRIM Function removes all the trailing spaces from the string.

REPEAT and REVERSE

REPEAT Function repeat the string for n times.

REVERSE Function gives the reverse string.

SPACE :

SPACE function returns the specified number of spaces.

SPLITT :

Syntax: SPLITT(‘string1:string2’,’pat’)

Split function splits the string depending on the pattern pat and returns an array of strings.

SubString :

The SUBSTR or SUBSTRING function returns a part of the source string from the start position with the specified length of characters.

If the length is not given, then it returns from the start position to the end of the string.

Format :

Syntax: “FORMAT_NUMBER(number X,int D)”

Formats the number X to a format like #,###,###.##, rounded to D decimal places and returns result as a string. If D=0 then the value will only have fraction part there will not be any decimal part.

INSTRING :

Syntax: “instr(string str,string substring)”

Returns the position of the first occurrence of substr in str. Returns null if either of the arguments are null and returns 0 if substr could not be found in str. Be aware that this is not zero based. The first character in str has index 1.

Locate :

Syntax: “Locate(string substring, string str[,int pos])”

Returns the position of the first occurrence of substr in str after position pos.

N-Grams :

Syntax: N-grams(array<array<string>>,int N, int K, int P)

Returns the top-k N-grams from a set of tokenized sentences, such as those returned by the sentences() UDAF.

Parse URL :

Syntax: “parse_url(string urlString, string partToExtract [, string keyToExtract])”

Returns the specified part from the URL. Valid values for partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.

Printf :

Syntax: “printf(String format, Obj… args)”

Returns the input formatted according do printf-style format strings

Regexp_Extract :

Syntax: “regexp_extract(string subject, string pattern, int index)”

Returns the string extracted using the pattern.

Regexp_Repalce :

Syntax: “regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)”

Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT.

Sentences :

Syntax: “sentences(string str, string lang, string locale)”

Tokenizes a string of natural language text into words and sentences, where each sentence is broken at the appropriate sentence boundary and returned as an array of words. The ‘lang’ and ‘locale’ are optional arguments.

Str_to_map :

Syntax: “str_to_map(text[, delimiter1, delimiter2])”

Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ‘,’ for delimiter1 and ‘=’ for delimiter2.

Translate :

Syntax: “translate(string|char|varchar input, string|char|varchar from, string|char|varchar to)”

Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string.  If any of the parameters to this UDF are NULL, the result is NULL as well.


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “String Functions in Hive

  • SJ

    Hi, how to do WHERE COL-N LIKE  ‘% ABDC TEMP%’; in HQL

    my COL-N pattern will be like below:

    ABDC TEMP

    ABDC TEMP(+23/_)

    ADCB TEMP(23+)

    ABDC TEMP (45+,

    my results should be :

    ABDC TEMP

    ABDC TEMP(+23/_)

    ABDC TEMP (45+,

    how to do these in HQL like where  clause query?? please help me on this.


Review Comments
default gravatar

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA

.