Monthly Archives: May 2015


Hive Functions Examples 4

Hive Functions Examples SET SHOW USE CREATE DATABASE CREATE MANAGED TABLE CREATE EXTERNAL TABLE CREATING TABLE FROM EXISTING TABLE CREATING EXTERNAL TABLES FROM MANAGED TABLES LOAD COPY DATA FROM ONE TABLE TO ANOHTER DROP QUIT SELECT DESCRIBE DESCRIBE SPECIFIC FIELD DESCRIBE EXTENDED ALTER CLONE SCHEMA (DATA IS NOT COPIED) CLONE SCHEMA TO ANOTHER DB USING REGULAR EXPRESSIONS MATHEMATICAL FUNCTIONS AGGREGATE FUNCTIONS LIMIT NESTED SELECT STATEMENT CASE..WHEN..THEN LIKE & RLIKE JOINS […]


Pig Functions Examples

Below is one of the good collection of examples for most frequently used functions in Pig. Pig Functions Examples. Contents LOAD DESCRIEBE/EXPLAIN/ILLUSTRATE FOREACH GROUP STORE LIMIT ORDER DISTINCT JOIN JOIN USING MULTIPLE KEYS OUTER JOINS SELF JOIN COUNT NUMBER OF ROWS IN SELF JOIN’S OUTPUT SAMPLE PARALLEL UDF:REGISTER UDF:DEFINE CALLING JAVA STATIC FUNCTIONS FLATTEN REPLACE EMPTY BAG WITH CONSTANT BAG NESTED FOREACH ORDER BY THE group SAMPLE SCRIPTS EXEXUTE PIG […]


Pig Functions Cheat Sheet 2

Below is the Pig Functions Cheat Sheet prepared by collecting different types of functions. Pig Execution Modes Grunt mode: It is interactive mode of pig. Very useful for testing syntax checking and ad-hoc data exploration Script mode: Run set of instructions from a file Similar to sql script file Embedded mode: Executes pig program from a Java program Suitable to create pi scripts on the fly Local mode: pig –x […]


Flume Sqoop Pig HBase Unit Testing 1

Testing Flume Scope Testing will cover the functional testing of the data transfer from source machines (External Systems) to HDFS/HBase. Testing of Individual Flume Components like, different Source types, Channel Types and Sink Types will be included. Testing of Custom Flume Agents/Embedded flume agents in other automated jobs/tools. Limitations & Exclusions Installation of Flume (Infrastructure) may not need to be tested. Record level transfer validation may not be done from […]


HBase Shell Commands in Practice 6

In Our previous posts we have seen HBase Overview and HBase Installation, now it is the time to practice some Hbase Shell Commands to get familiarize with HBase. We will test a few Hbase shell commands in this post. HBase Shell Usage Quote all names in HBase Shell such as table and column names. Commas delimit command parameters. Type <RETURN> after entering a command to run it. Dictionaries of configuration used […]


100 Interview Questions on Hadoop 5

1. What does commodity Hardware in Hadoop world mean? ( D ) a) Very cheap hardware b) Industry standard hardware c) Discarded hardware d) Low specifications Industry grade hardware 2. Which of the following are NOT big data problem(s)? ( D) a) Parsing 5 MB XML file every 5 minutes b) Processing IPL tweet sentiments c) Processing online bank transactions d) both (a) and (c) 3. What does “Velocity” in […]


Hive Performance Tuning 6

In our previous post we have discussed about hadoop job optimization or Hadoop Performance Tuning for Mapreduce jobs. In this post we will briefly discuss a few points on how to optimize hive queries/ Hive Performance tuning. If we do not fine tune Hive properly, then even for select queries on smaller tables in Hive, some times it may take minutes to emit results. So, because of this reason Hive […]


Enable Compression in Hive 1

For data intensive workloads, I/O operation and network data transfer will take considerable time to complete. By Enabling Compression in Hive we can improve the performance Hive Queries and as well as save the storage space on HDFS cluster. Find Available Compression Codecs in Hive To enable compression in Hive, first we need to find out the available compression codes on hadoop cluster, and we can use below set command […]