Below are some of the Hadoop Pig Interview questions and answers that suitable for both freshers and experienced hadoop programmers.
1. What is Apache Pig?
Pig is a scripting language for exploring huge data sets of size gigabytes or terabytes very easily. Pig provides an engine for executing data flows in parallel on Hadoop
2. What is Apache Pig?
Apache Pig is top level project in Apache Software foundation for analyzing large data sets that consists of a high-level language for expressing data analysis programs.
3. What is Pig Latin?
Pig Latin is a data flow Scripting Language like Perl for exploring large data sets. A Pig Latin program is made up of a series of operations, or transformations, that are applied to the input data to produce output.
4. What is Pig Engine?
Pig Engine is an execution environment to run Pig Latin programs. It converts these Pig Latin operators or transformations into a series of MapReduce jobs.
5. What are the modes of Pig Execution?
Pig execution can be done in two modes.
- Local Mode: Local execution in a single JVM, all files are installed and run using local host and file system.
- Mapreduce Mode: Distributed execution on a Hadoop cluster, it is the default mode.
6. What are the Pig Latin Features?
- Pig Latin script is made up of a series of operations, or transformations, that are
applied to the input data to produce output
- Pig Latin programs can be executed either in Interactive mode through Grunt shellor in Batch mode via Pig Latin Scripts.
- Pig Latin includes operators for many of the traditional data operations (join, sort, filter, etc.)
- User Defined Functions (UDF)
- Debugging Environment
7. What are the advantages of using Pig over Mapreduce?
- Development cycle is very long. Writing mappers and reducers, compiling
and packaging the code, submitting jobs, and retrieving the results is a time
- Performing Data set joins is very difficult
- Low level and rigid, and leads to a great deal of custom user code that is hard to maintain and reuse is complex.
- No need of compiling or packaging of code. Pig operators will be converted into map or reduce tasks internally.
- Pig Latin provides all of the standard data-processing operations, such as join, filter, group by, order by, union, etc
- high level of abstraction for processing large data sets
8. What is the difference between Pig Latin and HiveQL ?
- Pig Latin is a Procedural language
- Nested relational data model
- Schema is optional
- HiveQL is Declarative
- HiveQL flat relational
- Schema is required
9. What are the common features in Pig and Hive?
- Both provide high level abstraction on top of Mapreduce
- Both convert their commands internally into Mapreduce jobs
- Both doesn’t support low-latency queries and thus OLAP or OLTP are not supported.
10. What is the difference between logical and physical plans?
Pig undergoes some steps when a Pig Latin Script is converted into MapReduce jobs. After performing the basic parsing and semantic checking, it produces a logical plan. The logical plan describes the logical operators that have to be executed by Pig during execution. After this, Pig produces a physical plan. The physical plan describes the physical operators that are needed to execute the script.
11. Why do we need MapReduce during Pig programming?
Pig is a high-level platform that makes many Hadoop data analysis issues easier to execute. A program written in Pig Latin is like a query written in SQL, where we need an
execution engine to execute the query. So, Pig engine will convert the program into MapReduce jobs. Here, MapReduce acts as the execution engine.
12. How many ways we can run Pig programs?
Pig programs or commands can be executed in three ways.
- Script – Batch Method
- Grunt Shell – Interactive Method
- Embedded mode
All these ways can be applied to both Local and Mapreduce modes of execution.
13. What is Grunt in Pig?
Grunt is an Interactive Shell in Pig, and below are its major features:
- Ctrl-E key combination will move the cursor to the end of the line.
- Grunt remembers command history, and can recall lines in the history buffer using up or down cursor keys.
- Grunt supports Auto completion mechanism, which will try to complete
Pig Latin keywords and functions when you press the Tab key
14. Is Pig Latin Case Sensitive?
The names (aliases) of relations and fields are case sensitive. The names of Pig Latin
functions are case sensitive. The names of parameters and all other Pig Latin keywords are case insensitive.
15. What is bag?
A bag is one of the data models present in Pig. It is an un-ordered collection of tuples with possible duplicates. Bags are used to store collections while grouping. The size of bag is the size of the local disk, this means that the size of the bag is limited. When the bag is full, then Pig will spill this bag into local disk and keep only some parts of the bag in memory. There is no necessity that the complete bag should fit into memory. We represent bags