Programming Pig

Alan Gates

Mentioned 2

This guide is an ideal learning tool and reference for Apache Pig, the programming language that helps programmers describe and run large data projects on Hadoop. With Pig, they can analyze data without having to create a full-fledged application--making it easy for them to experiment with new data sets.

More on Amazon.com

Mentioned in questions and answers.

I am beginner to APACHE PIG and following is what i have a slight confusion

I am trying to run pig in local mode using pig -x local.

Now ia m trying the simple code

dividends = load 'NYSE_dividends' as (exchange, symbol, date, dividend);
grouping= group dividends by symbol;
avg= foreach grouped generate group, AVG(dividends.dividend);

store avg into 'average_dividend'.

A folder by the name average_dividend is formed on my machine.

Now as per the book if i want to run it in local mode I have to give the following syntax

pig_path/bin/pig -x local average_dividend.pig

But where is the file average_dividend.pig(like where it is formed)?

I assume you are trying to run one of the examples of Programming Pig. First locate average_dividend.pig in the directory where you extracted the code. Since you are working in local mode you have to set the path to NYSE_dividends, e.g: load '/home/user/programmingpig-master/data/NYSE_dividends', Set the output directory (shouldn't exist) too where you want to save the result, e.g: store avg into '/home/user/output'.

Then issue:

pig_path/bin/pig -x local -f average_dividend.pig

Pig exists with exit code 7 after printing these 3 lines:

2014-07-16 21:57:37,271 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.6.0 (rexported) compiled Feb 26 2014, 03:01:22
2014-07-16 21:57:37,272 [main] INFO  org.apache.pig.Main - Logging error messages to: ..../pig_1405562257268.log
2014-07-16 21:57:37,627 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/sam/.pigbootup not found

what does this mean?

  • The INFO messages are normal
  • The only unusual bit is the exit code (7, see above)
  • The pig_*.log file does not exist

Is this documented somewhere?

EDIT: the problem was eliminated when I removed the semicolon from the end of the %declare line. go figure...

You may take a look at the return codes in the source code.
The book Programming Pig also contains a list of their meaning in chapter two. I copy them here for reference:

0   Success      
1   Retriable failure    
2   Failure      
3   Partial failure - Used with multiquery; see “Nonlinear Data Flows”
4   Illegal arguments passed to Pig      
5   IOException thrown - Would usually be thrown by a UDF
6   PigException thrown - Usually means a Python UDF raised an exception
7   ParseException thrown (can happen after parsing if variable substitution
    is being done)   
8   Throwable thrown (an unexpected exception)
Realated tags

apache-pighdfs