Pig Interview Questions and Answers Part – 2 2

Below are a few more Pig Interview Questions and Answers
1. What is a tuple?

A tuple is an ordered set of fields and A field is a piece of data.

2. What is a relation in Pig?

A Pig relation is a bag of tuples. A Pig relation is similar to a table in a relational database,
where the tuples in the bag correspond to the rows in a table. Unlike a relational table,
however, Pig relations don’t require that every tuple contain the same number of fields or that the fields in the same position (column) have the same type.

3. What does mean by unordered collection in a bag or in a relation?

Relations are unordered means there is no guarantee that tuples are processed in any particular order. Furthermore, processing may be paralleled in which case tuples are not processed according to any total ordering.

4. How the fields are referenced in a relation?

Fields in a relation can be referenced in two ways, by positional notation or by name (alias)

  • Positional notation is generated by the system. Positional notation is indicated with the dollar sign ($) and begins with zero (0); for example, $0, $1, $2.
  • Names are assigned by user using schema (or, in the case of the GROUP operator and some functions, by the system). We can use any name that is not a Pig keyword.
5. What are the simple data types supported by Pig?
Simple Types Description Example
int Signed 32-bit integer 10
long Signed 64-bit integer Data:10L or 10l
float 32-bit floating point Data: 10.5F or 10.5f or 10.5e2f
double 64-bit floating point Data: 10.5 or 10.5e2 or 10.5E2
chararray Character array hello world
bytearray Byte array
boolean boolean true/false (case insensitive)
datetime datetime 1970-01-01T00:00:00.000+00:00
biginteger Java BigInteger 200000000000
bigdecimal Java BigDecimal 33.4567833213
6. What are the complex data types supported in Pig Latin?
Data Types Description Example
tuple An ordered set of fields. (19,2)
bag A collection of tuples. {(19,2), (18,1)}
map A set of key value pairs. [open#apache]
7. What are the features of bag?
  • A bag can have duplicate tuples.
  • A bag can have tuples with differing numbers of fields. However, if Pig tries to access a field that does not exist, a null value is substituted.
  • A bag can have tuples with fields that have different data types. However, for Pig to
    effectively process bags, the schemas of the tuples within those bags should be the same.
8. What is an outer bag?

An outer bag is nothing but a relation.

9. What is an inner bag?

An inner bag is a relation inside any other bag.

Example: (4,{(4,2,1),(4,3,3)})

In the above example, the complete relation is an outer bag and {(4,2,1),(4,3,3)} is an inner bag.

10. What is a Map?

A map is a set of key/value pairs. Key values within a relation must be unique.

11. What does FOREACH do?

FOREACH is used to apply transformations to the data and to generate new data items. The name itself is indicating that for each element of a data bag, the respective action will be performed.

Syntax: FOREACH bagname GENERATE expr1, expr2, …..

The meaning of this statement is that the expressions mentioned after GENERATE will be applied to the current record of the data bag.

12. What does DISTINCT operator will do in Pig?

Removes duplicate tuples in a relation.

Syntax: alias = DISTINCT alias;

13. What is FILTER operator in Pig?

Selects tuples from a relation based on some condition.

Syntax: alias = FILTER alias BY expression;

14. What does GROUP operator will do in Pig?

Groups the data in one or more relations.

Syntax: alias = GROUP alias { ALL | BY expression} [, alias ALL | BY expression …] [USING ‘collected’ | ‘merge’] [PARTITION BY partitioner] [PARALLEL n];

15. What is difference between GROUP and COGROUP?

The GROUP and COGROUP operators are identical. Both operators work with one or more relations. For readability GROUP is used in statements involving one relation and COGROUP is used in statements involving two or more relations. We can COGROUP up to
but no more than 127 relations at a time.

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *

2 thoughts on “Pig Interview Questions and Answers Part – 2

Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016