java/pig-galago/README.pig

   1 Pig runs on Java 1.5. This version of Pig was designed for Hadoop 12.1.
   2
   3 RUNNING PIG
   4
   5 Pig is designed to run against an Hadoop cluster. The pig.jar file includes
   6 the ability to startup such a cluster if needed (see next section). Important
   7 configuration parameters (such as the ip and port of the jobtracker and
   8 name node) for Hadoop clusters are contained in a file called
   9 hadoop-site.xml. You will need to add your site's hadoop-site.xml to pig.jar
  10 for it to communicate properly with your hadoop cluster. hadoop-site.xml
  11 is added to pig.jar using zip: zip pig.jar hadoop-site.xml.
  12
  13 Run "java -jar pig.jar -" to see if Pig is running correctly. If it is,
  14 you should get a "grunt>" prompt. See the documentation (distributed
  15 separately) for more details
  16
  17 RUNNING HADOOP
  18
  19 The pig.jar file includes everything needed to startup an Hadoop cluster.
  20 You will need to consult the Hadoop website for details, but briefly:
  21
  22 1) "java -jar pig.jar -H setup" will generate an hadoop-site.xml.
  23 2) "java -jar pig.jar -H namenode -format" will format a namenode directory.
  24 3) "java -jar pig.jar -H conf" will output the cluster configuration. This
  25    information is useful to write startup scripts to startup the namenode,
  26    jobtracker, datanode, tasktrackers on the proper nodes. (My scripts
  27    look at hostname of fs.default.name and mapred.job.tracker and compare with
  28    hostname of the machine that the script is running on. If it matches,
  29    I startup the name node or jobtracker, otherwise I start a datanode and
  30    tasktracker.)
  31 4) "java -jar pig.jar -H namenode" starts the namenode;
  32    "java -jar pig.jar -H datanode" starts the datanode;
  33    "java -jar pig.jar -H jobtracker" starts the jobtracker;
  34    "java -jar pig.jar -H tasktracker" starts the tasktracker.
  35
  36 BUILDING PIG
  37
  38 To build Pig you will need to build the hadoop-exe target of Hadoop.
  39 You must apply the patch that is attached to HADOOP-435 in the Hadoop issue
  40 tracker. Put the resulting hadoop.jar file in the lib subdirectory of
  41 hadoop. You will also need libraries from the following packages
  42 in the lib subdirectory:
  43
  44 bzip2: http://www.kohsuke.org/bzip2/:Apache license
  45 javacc: https://javacc.dev.java.net/:BSD license
  46 hadoop: http://lucene.apache.org/hadoop/:Apache license
  47
  48 You must also create conf directory. You may put your hadoop-site.xml file
  49 in that directory so that it will be included at build time. You may also
  50 add the hadoop-site.xml file to pig.jar after the build has completed.
  51
  52 Use ant with the build.xml.oss to build pig.