Hadoop


Creating UDF and UDAF for Impala

Installing the UDF Development Package

The output will be like below code. [cloudera@quickstart impala-udf-samples-master]$ cmake . — The C compiler identification is GNU 4.4.7 — The CXX compiler identification is GNU 4.4.7 — Check for working C compiler: /usr/bin/cc — Check for working C compiler: /usr/bin/cc — works — Detecting C compiler ABI info — Detecting C compiler ABI info – done — Check for working CXX compiler: /usr/bin/c++ […]


Postgres Installation On Centos 1

To install the server locally use the command line and type

To start off, we need to set the password of the PostgreSQL user (role) called “postgres”; we will not be able to access the server externally otherwise. As the local “postgres" Linux user, we are allowed to connect and manipulate the server using the psql command. In a terminal, type:

this connects as a role with same […]


Impala Best Practices 1

Below are Impala performance tuning options: Pre-execution Checklist Data types Partitioning File Format Data Type Choices Define integer columns as INT/BIGINT Operations on INT/BIGINT more efficient than STRING Convert “external" data to good “internal" types on load e.g. CAST date strings to TIMESTAMPS This avoids expensive CASTs in queries later Partitioning The fastest I/O is the one […]


Apache Storm Integration With Apache Kafka

Installing Apache Storm The prerequisite for storm to work on the machine. a. Download and installation commands for ZeroMQ 2.1.7: Run the following commands on terminals

b. Download and installation commands for JZMQ:

  2. Download latest storm from http://storm.apache.org/downloads.html