📁 Hadoop


Creating UDF and UDAF for Impala

Installing the UDF Development Package

The output will be like below code. [cloudera@quickstart impala-udf-samples-master]$ cmake . — The C compiler identification is GNU 4.4.7 — The CXX compiler identification is GNU 4.4.7 — Check for working C compiler: /usr/bin/cc — Check for working C compiler: /usr/bin/cc — works — Detecting C compiler ABI info — Detecting C compiler ABI info – done — Check for working CXX compiler: /usr/bin/c++ […]


Postgres Installation On Centos

To install the server locally use the command line and type

To start off, we need to set the password of the PostgreSQL user (role) called “postgres”; we will not be able to access the server externally otherwise. As the local “postgres" Linux user, we are allowed to connect and manipulate the server using the psql command. In a terminal, type:

this connects as a role with same […]


Impala Best Practices 1

Below are Impala performance tuning options: Pre-execution Checklist Data types Partitioning File Format Data Type Choices Define integer columns as INT/BIGINT Operations on INT/BIGINT more efficient than STRING Convert “external" data to good “internal" types on load e.g. CAST date strings to TIMESTAMPS This avoids expensive CASTs in queries later Partitioning The fastest I/O is the one […]


Apache Storm Integration With Apache Kafka

Installing Apache Storm The prerequisite for storm to work on the machine. a. Download and installation commands for ZeroMQ 2.1.7: Run the following commands on terminals

b. Download and installation commands for JZMQ:

  2. Download latest storm from http://storm.apache.org/downloads.html