📁 Pig


HCatalog and Pig Integration 3

HCatalog and Pig Integration In short, HCatalog opens up the hive metadata to other mapreduce tools. Every mapreduce tools has its own notion about HDFS data (example Pig sees the HDFS data as set of files, Hive sees it as tables). With having table based abstraction, HCatalog supported mapreduce tools do not need to care about where the data is stored, in which format and storage location (HBase or HDFS). […]


Pig Functions Examples

Below is one of the good collection of examples for most frequently used functions in Pig. Pig Functions Examples. Contents LOAD DESCRIEBE/EXPLAIN/ILLUSTRATE FOREACH GROUP STORE LIMIT ORDER DISTINCT JOIN JOIN USING MULTIPLE KEYS OUTER JOINS SELF JOIN COUNT NUMBER OF ROWS IN SELF JOIN’S OUTPUT SAMPLE PARALLEL UDF:REGISTER UDF:DEFINE CALLING JAVA STATIC FUNCTIONS FLATTEN REPLACE EMPTY BAG WITH CONSTANT BAG NESTED FOREACH ORDER BY THE group SAMPLE SCRIPTS EXEXUTE PIG […]


Pig Functions Cheat Sheet 2

Below is the Pig Functions Cheat Sheet prepared by collecting different types of functions. Pig Execution Modes Grunt mode: It is interactive mode of pig. Very useful for testing syntax checking and ad-hoc data exploration Script mode: Run set of instructions from a file Similar to sql script file Embedded mode: Executes pig program from a Java program Suitable to create pi scripts on the fly Local mode: pig –x […]


Processing Logs in Pig 3

In the previous post we have discussed about the basic introduction on log files and the architecture of log analysis in hadoop. In this post, we will enter into much deeper details on processing logs in pig. As discussed in the previous post, there will be three types of log files majorly. Web Server Access Logs Web Server Error Logs Application Server Logs All these log files will be in […]


Built-in Load Store Functions in Pig 2

In this post, we will discuss about the following built in load store functions in pig with examples. PigStorage TextLoader BinStorage JsonLoader, JsonStorage AvroStorage HBaseStorage MongoStorage PigStorage: PigStorage() is the default load/store function in pig. PigStorage expects data to be formatted using field delimiters and the default delimiter is ‘\t’. PigStorage() itself can be used for both Load and Store functions. It Loads/stores data as structured text files. All Pig simple and complex […]


Load Functions In Pig

In this post, we will discuss about basic details of load functions in pig with some sample examples and we will also discuss about custom load functions in pig by writing UDFs. To work with data in Pig, the first thing we need to do is load data from a source, and Pig has a built-in LOAD function that loads data from the file system. Load Operator: Syntax:

Input […]


Log Analysis in Hadoop 5

In this post we will discuss about various log file types and Log Analysis in Hadoop. Log Files: Logs are computer-generated files that capture network and server operations data.They are useful during various stages of software development, mainly for debugging and profiling purposes and also for managing network operations. Need For Log Files: Log files are commonly used at customer’s installations for the purpose of permanent software monitoring and/or fine-tuning. […]


Pig Installation on Ubuntu 1

In this post, we will describe the procedure for Pig Installation on Ubuntu Machine. Prerequisite: Below are the basic requirement for Pig installation on Ubuntu and getting started. Java 1.6 or Later versions installed and JAVA_HOME environment variable set to Java installation directory Hadoop1.x or 2.x Installed on the cluster. In this post we will use Hadoop-2.3.0 version for HADOOP_HOME environment variable setup. Pig Installation Procedure: Download the latest stable […]


Apache Pig Overview 2

In this post we will discuss about the basic details/introduction about Apache Pig. What is Apache Pig? Pig is a scripting language for exploring huge data sets of size gigabytes or terabytes very easily. Pig provides an engine for executing data flows in parallel on Hadoop. Pig is made up of two things mainly. Pig Latin: Language for expressing data flows Pig Engine: Execution Environment to run Pig Latin programs. It has two […]


Review Comments
default gravatar

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA

.

Next Batch Spark, Scala Training