Hadoop Common

Impala Miscellaneous Functions

Impala Conditions with Example Impala supports the following conditional functions for testing equality, comparison operators, and nullity: ‘Case’ Example: 1) If else select case when 20 > 10 then 20 else 15 end; Output: 20 2) If else if select case when 9 > 10 then 20 when 1 > 2 then 1.0 else 15 end; Output: 15 ===================================================================================== ‘Coalesce’ Function Example: The COALESCE function in Impala returns the first […]

PMD (Programming Mistake Detector)

Table of ContentsPMD (Programming Mistake Detector)What is PMD?How to install PMD?How to use PMD?Finding Cut and Paste Code(CPD):Working POM confiiguration PMD (Programming Mistake Detector) What is PMD? PMD aka Programming Mistake Detector is Java Source Code Analyzer. It is used to clean erroneous code in our java projects based on predefined set of rules. PMD supports the ability to write custom rules. Issues reported by PMD may not be true […]

Creating UDF and UDAF for Impala

Installing the UDF Development Package

The output will be like below code. [cloudera@quickstart impala-udf-samples-master]$ cmake . — The C compiler identification is GNU 4.4.7 — The CXX compiler identification is GNU 4.4.7 — Check for working C compiler: /usr/bin/cc — Check for working C compiler: /usr/bin/cc — works — Detecting C compiler ABI info — Detecting C compiler ABI info – done — Check for working CXX compiler: /usr/bin/c++ […]

Postgres Commands


We can see our new table by typing this:

List of relations Schema | Name | Type | Owner ——–+————+——-+———- public | playground | table | postgres (1 row) INSERT

  Message returned if only one row was inserted. oid is the numeric OID of the inserted row. Ex: INSERT oid 1 Message returned if more than one […]

Postgres Installation On Centos 1

To install the server locally use the command line and type

To start off, we need to set the password of the PostgreSQL user (role) called “postgres”; we will not be able to access the server externally otherwise. As the local “postgres" Linux user, we are allowed to connect and manipulate the server using the psql command. In a terminal, type:

this connects as a role with same […]

HBase & Solr Search Integration 1

Table of Contents­HBase & Solr – Near Real time indexing and searchCreating a Lily HBase Indexer configurationCreating a Morphline Configuration FileStarting & Registering a Lily HBase Indexer configuration with the Lily HBase Indexer ServiceVerifying the indexing is workingConfiguring Lily HBase NRT Indexer Service for Use with Cloudera SearchUsing the Lily HBase NRT Indexer ServiceSteps to build indexing ­HBase & Solr – Near Real time indexing and search Requirement: A. HBase […]

Resilient Distributed Dataset

Table of ContentsWhat is an RDD?Why RDD in Spark?Data Sharing in MapReduce:Data Sharing in Spark :RDD Abstraction:How to program with RDD:Example :1 Creating an RDD of Strings with text file () in Python: Example :2 Calling the filter() transformationExample 3 : Calling first() actionExample 4: Persisting an RDD in memoryLazy Evaluation What is an RDD? A Resilient Distributed Datasets (RDD) is a fundamental data structure of Spark. It is an immutable […]

Impala Best Practices 1

Below are Impala performance tuning options: Table of ContentsPre-execution ChecklistData Type ChoicesPartitioningUse Parquet Columnar Format for HDFSQuick Note on CompressionSnappyGzip/ZlibLeft-Deep Join TreeTypes of Hash JoinsBroadcastShuffleHow to use ANALYZEHinting JoinsDetermining Join Type From EXPLAINMemory Requirements for Joins & Aggregates Pre-execution Checklist Data types Partitioning File Format Data Type Choices Define integer columns as INT/BIGINT Operations on INT/BIGINT more efficient than STRING Convert […]