Miscellaneous


Impala Miscellaneous Functions

Impala Conditions with Example Impala supports the following conditional functions for testing equality, comparison operators, and nullity: ‘Case’ Example: 1)  If else select case when 20 > 10 then 20 else 15 end; Output:  20 2) If else if select case when 9 > 10 then 20 when 1 > 2 then 1.0 else 15 end; Output:  15 ===================================================================================== ‘Coalesce’ Function Example: The COALESCE function in Impala returns the first […]


PMD (Programming Mistake Detector)

PMD (Programming Mistake Detector) What is PMD? PMD aka Programming Mistake Detector is Java Source Code Analyzer. It is used to clean erroneous code in our java projects based on predefined set of rules. PMD supports the ability to write custom rules. Issues reported by PMD may not be true errors always, but rather inefficient code, i.e. the application could still function properly even if they were not corrected. PMD […]


Creating UDF and UDAF for Impala

 Installing the UDF Development Package

The output will be like below code. [cloudera@quickstart impala-udf-samples-master]$ cmake . — The C compiler identification is GNU 4.4.7 — The CXX compiler identification is GNU 4.4.7 — Check for working C compiler: /usr/bin/cc — Check for working C compiler: /usr/bin/cc — works — Detecting C compiler ABI info — Detecting C compiler ABI info – done — Check for working CXX compiler: /usr/bin/c++ […]


Postgres Commands

CREATE

We can see our new table by typing this:

List of relations Schema |    Name    | Type  |  Owner ——–+————+——-+———- public | playground | table | postgres (1 row) INSERT

  Message returned if only one row was inserted. oid is the numeric OID of the inserted row. Ex: INSERT oid 1 Message returned if more than one rows were inserted. # is the number of rows […]


Impala Best Practices

Below are Impala performance tuning options: Pre-execution Checklist    Data types    Partitioning    File Format Data Type Choices      Define integer columns as INT/BIGINT      Operations on INT/BIGINT more efficient than STRING      Convert “external” data to good “internal” types on load      e.g. CAST date strings to TIMESTAMPS      This avoids expensive CASTs in queries later Partitioning The fastest I/O is the one […]


Apache Storm Integration With Apache Kafka

Installing Apache Storm The prerequisite for storm to work on the machine. a. Download and installation commands for ZeroMQ 2.1.7: Run the following commands on terminals

b. Download and installation commands for JZMQ: 

  2. Download latest storm from http://storm.apache.org/downloads.html 

Second start Storm Cluster by starting master and worker nodes. Start master node i.e. nimbus. To start master i.e. nimbus go to the ‘bin’ directory of the […]


Kafka Design

While developing Kafka, the main focus was to provide the following:   An API for producers and consumers to support custom implementation   Low overheads for network and storage with message persistence on disk   A high throughput supporting millions of messages for both publishing and subscribing—for example, real-time log aggregation or data feeds   Distributed and highly scalable architecture to handle low-latency delivery   Auto-balancing multiple consumers in the […]


Kafka Installation

There are number of ways in which Kafka can be used in any architecture. This section discusses some of the popular use cases for Apache Kafka and the well-known companies that have adopted Kafka. The following are the popular Kafka use cases: Log aggregation This is the process of collecting physical log files from servers and putting them in a central place (a file server or HDFS) for processing. Using […]


Cassandra production scenarios/issues

Production issue: when we are trying to write a select query with 8 lacks ids “in condition “. then we got faced below issue,    To solve the above exception, we used distributed calls in Java client as shown below,

Few Production configurations in cassandra RetryPolicy Three scenarios you can control retry policy for: Read time out: When a coordinator received the request and sent the read to replica(s) but the replica(s) […]


Cassandra query language (CQL) and Cassandra Java Client Example

Cassandra Table structure/Terminology Before going to learn CQL commands, we just need to know terminology in cassandra. RDBMS Cassandra Terminology Database Keyspace Table Column Family Primary key Row Key Column name Column name Column value column value CQL Commands Creating a key-space

Use the keyspace (will use that key space)

Note: key spaces are equivalent to database/schema in RDBMS Get list of key spaces

Create table

Get list […]


Cassandra write and read process

Storage engine Cassandra uses a storage structure similar to a Log-Structured Merge Tree, unlike a typical relational database that uses a B-Tree. Cassandra avoids reading before writing. Read-before-write, especially in a large distributed system, can produce stall in read performance and other problems. Cassandra never re-writes or re-reads existing data, and never overwrites the rows in place. How data is written? Different stages of write process in cassandra Logging data […]


Cassandra Architecture

Cassandra is designed in such a way that, there will not be any single point of failure. There is no master- slave architecture in cassandra. cassandra addresses the problem of SPOF by employing a peer-to-peer distributed system across homogeneous nodes where data is distributed among all nodes in the cluster. In cassandra all nodes are same. There will not be any master or slave in cassandra. Each node frequently exchanges state information about itself […]


CAP Theorem

What is CAP Theorem? CAP describes that before choosing any Database (Including distributed database), Basing on your requirement we have to choose only two properties out of three. Consistency  – Whenever you read a record (or data), consistency guaranties that it will give same data how many times you read. Simply we can say that each server returns the right response to each request, thus the system will be always […]


Zookeeper Commands 1

This post is about some notes on Zookeeper commands and scripts. This is mainly useful for Hadoop Admins and all commands are self explanotry. ZooKeeper is a distributed centralized co-ordination service Zookeeper addresses issues with distributed applications: Maintain configuration information (share config info across all nodes) Naming Service(allows one node to find a specific machine in a cluster of 1000’s of servers) Distributed synchronization (locks, barriers, queues, etc) Group services […]


Separation Anxiety: A Tutorial for Isolating Your System with Linux Namespaces 2

With the advent of tools like Docker, Linux Containers, and others, it has become super easy to isolate Linux processes into their own little system environments. This makes it possible to run a whole range of applications on a single real Linux machine and ensure no two of them can interfere with each other, without having to resort to using virtual machines. These tools have been a huge boon to […]


Review Comments
default gravatar

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA

.