1 Pig runs on Java 1.5. This version of Pig was designed for Hadoop 12.1.
5 Pig is designed to run against an Hadoop cluster. The pig.jar file includes
6 the ability to startup such a cluster if needed (see next section). Important
7 configuration parameters (such as the ip and port of the jobtracker and
8 name node) for Hadoop clusters are contained in a file called
9 hadoop-site.xml. You will need to add your site's hadoop-site.xml to pig.jar
10 for it to communicate properly with your hadoop cluster. hadoop-site.xml
11 is added to pig.jar using zip: zip pig.jar hadoop-site.xml.
13 Run "java -jar pig.jar -" to see if Pig is running correctly. If it is,
14 you should get a "grunt>" prompt. See the documentation (distributed
15 separately) for more details
19 The pig.jar file includes everything needed to startup an Hadoop cluster.
20 You will need to consult the Hadoop website for details, but briefly:
22 1) "java -jar pig.jar -H setup" will generate an hadoop-site.xml.
23 2) "java -jar pig.jar -H namenode -format" will format a namenode directory.
24 3) "java -jar pig.jar -H conf" will output the cluster configuration. This
25 information is useful to write startup scripts to startup the namenode,
26 jobtracker, datanode, tasktrackers on the proper nodes. (My scripts
27 look at hostname of fs.default.name and mapred.job.tracker and compare with
28 hostname of the machine that the script is running on. If it matches,
29 I startup the name node or jobtracker, otherwise I start a datanode and
31 4) "java -jar pig.jar -H namenode" starts the namenode;
32 "java -jar pig.jar -H datanode" starts the datanode;
33 "java -jar pig.jar -H jobtracker" starts the jobtracker;
34 "java -jar pig.jar -H tasktracker" starts the tasktracker.
38 To build Pig you will need to build the hadoop-exe target of Hadoop.
39 You must apply the patch that is attached to HADOOP-435 in the Hadoop issue
40 tracker. Put the resulting hadoop.jar file in the lib subdirectory of
41 hadoop. You will also need libraries from the following packages
42 in the lib subdirectory:
44 bzip2: http://www.kohsuke.org/bzip2/:Apache license
45 javacc: https://javacc.dev.java.net/:BSD license
46 hadoop: http://lucene.apache.org/hadoop/:Apache license
48 You must also create conf directory. You may put your hadoop-site.xml file
49 in that directory so that it will be included at build time. You may also
50 add the hadoop-site.xml file to pig.jar after the build has completed.
52 Use ant with the build.xml.oss to build pig.