Impala Miscellaneous Functions

Impala Conditions with Example Impala supports the following conditional functions for testing equality, comparison operators, and nullity: ‘Case’ Example: 1)  If else select case when 20 > 10 then 20 else 15 end; Output:  20 2) If else if select case when 9 > 10 then 20 when 1 > 2 then 1.0 else 15 end; Output:  15 ===================================================================================== ‘Coalesce’ Function Example: The COALESCE function in Impala returns the first […]

Creating UDF and UDAF for Impala

 Installing the UDF Development Package

The output will be like below code. [cloudera@quickstart impala-udf-samples-master]$ cmake . — The C compiler identification is GNU 4.4.7 — The CXX compiler identification is GNU 4.4.7 — Check for working C compiler: /usr/bin/cc — Check for working C compiler: /usr/bin/cc — works — Detecting C compiler ABI info — Detecting C compiler ABI info – done — Check for working CXX compiler: /usr/bin/c++ […]

Impala Best Practices 1

Below are Impala performance tuning options: Pre-execution Checklist    Data types    Partitioning    File Format Data Type Choices      Define integer columns as INT/BIGINT      Operations on INT/BIGINT more efficient than STRING      Convert “external” data to good “internal” types on load      e.g. CAST date strings to TIMESTAMPS      This avoids expensive CASTs in queries later Partitioning The fastest I/O is the one […]

Impala Commands Cheat Sheet

This is quick touch on Impala commands and Functions. Impala accepts basic SQL syntax and below is the list of a few operators and commands that can be used inside Impala. This is just a quick cheat sheet. Databases In Impala, a database is a logical container for a group of tables. Each database defines a separate namespace. Within a database, you can refer to the tables inside it using […]

Impala Introduction 1

Cloudera provides a separate tool called Impala to overcome the slowness of Hive Queries. Syntactically Impala queries are more or less same as Hive Queries but they run very faster than Hive Queries. Impala provides high-performance, low-latency SQL queries. When we are dealing with medium sized data sets and we expect real time response from our queries then choosing Impala is the best option but Impala is available only in […]

Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017