1

I want to run PIG in local mode, which is very easy pig -x local file.pig

My requirement is to run PIG in local mode from OOZIE? Is it possible as i think OOZIE will automatically launch map task first?

AddWeb Solution Pvt Ltd
  • 21,025
  • 5
  • 26
  • 57

2 Answers2

0

It's possible. When a pig script is run by Oozie, it's run as a one-map map-reduce job, which only runs the pig script, which in turn runs other map-reduce jobs (when pig is run in mapred mode).

It seems, that Pig action configuration doesn't allow running in local mode, but you can still run Pig script in local mode using shell action type. You only have to make sure, that your script, input and output data are in HDFS.

Mikhail Golubtsov
  • 6,285
  • 3
  • 29
  • 36
  • Thanks for the answer But Pig in local mode load/store data from local file system instead of HDFS. Keeping pig on hdfs make sense but input and output path on hdfs doest makes sense ? Can you show me how to write the oozie workflow for that ? – Vishal Donderia Aug 03 '15 at 11:42
  • You should be aware how Oozie runs things. It runs pig or shell scripts as map-reduce jobs with one mapper and no reducers. Therefore an execution node is selected randomly from the map-reduce cluster. So you can not say ahead which node will run your script. Because of that there is no point in using local filesystem, use HDFS because it's available on all nodes. Use full URI in form of 'hdfs://namenode:port/path/to/file' to refer HDFS paths in a pig script. – Mikhail Golubtsov Aug 03 '15 at 11:56
  • It would be great if you have provided some reasoning when running pig in a local mode with Oozie is useful. – Mikhail Golubtsov Aug 03 '15 at 12:02
  • All all job scheduling we use oozie frame work, so we want to use the same frame work but run pig in local mode as with less amount in data local model is faster than mapred mode – Vishal Donderia Aug 03 '15 at 12:16
0

I don't think, we can run pig in local mode from oozie. Comment which Vishal wrote makes sense. In some cases, where there is lesser amount of data, Its better to go for pig in local mode. To run in local mode, you can run by writing a shell script and scheduling that in crontab.If you try this through oozie. Upto my knowledge It won't suit well , because Oozie is meant to run in HDFS.

If you want oozie to run on some data . It expects that data to be in HDFS (i.e distributed).And You must have the pig script as well in hdfs.I rembered seeing post from AlanGates where he mentioned PIG is designed to process data from/to HDFS and hive is for local to HDFS or HDFS to HDFS.

Govind
  • 419
  • 8
  • 25