7

I am new to Hadoop/PIG. I have a basic question.

Do we have a Logging facility in PIG UDF? I have written a UDF which I need to verify I need to log certain statements to check the flow. Is there a Logging facility available? If yes where are the Pig logs present?

Uno
  • 533
  • 10
  • 24

2 Answers2

6

Assuming your UDF extends EvalFunc, you can use the Logger returned from EvalFunc.getLogger(). The log output should be visible in the associated Map / Reduce task that pig executes (if the job executes in more than a single stage then you'll have to pick through them to find the associated log entries).

Chris White
  • 29,949
  • 4
  • 71
  • 93
  • So the logs will end up in the Map Reduce Task log file? Could I specifically make my Log statements to a separate file? – Uno Jun 13 '12 at 18:57
  • Yes they will. You could, but then you'd have to go to each task tracker to view / collect them. I guess you could try and configure a remote logger (logging to a DB for example). – Chris White Jun 13 '12 at 19:23
  • I don't know for sure, but you could try the PigLogger - that might send things back to the client. – Chris White Jun 13 '12 at 19:25
  • I am sorry for such a naive question. But, I have used it in the following way: PigLogger pigLogger = this.getPigLogger(); pigLogger.warn(object,String,enum); Am I missing anything here? Or this is it for EvalFunc Logger. I cannot see anything other than warn. Dont we have debug,info,error? – Uno Jun 13 '12 at 23:19
  • I tried using this.getLogger.info(String); Should this pop up in the tasktarcker log? I cannot see any log for this. – Uno Jun 14 '12 at 00:18
  • the PigLogger is for aggregating warning messages, so yes it only has the warn method (It's a hack, but do the messages propagate to the client shell with PigLogger?). And yes, getLogger().info(String) should show up in the task logs (not the tasktracker logs, but the actual logs for the map or reduce task that is executing your UDF) – Chris White Jun 14 '12 at 10:14
  • I am sorry, but could you please tell me where should I be configuring these task logs? where can I find these? The only logs that I am aware of are the logs in logs directory of the hadoop/logs. I mean the datanode/tasktracker/namenode/secondarynamenode/jobtracker. – Uno Jun 14 '12 at 17:22
  • They are not in the standard log directory, they are attached to each job - find the Pig job in the job tracker web ui, and drill down to an individual map or reduce task, and then view the logs – Chris White Jun 14 '12 at 19:00
  • After clicking on the map/reduce, I will be redirected to : Task page, where all tasks are listed. When I click on that I see Task logs. Is this where they will be stored? Apologize for such minute details requirement. – Uno Jun 14 '12 at 19:09
  • http://books.google.com/books?id=Nff49D7vnJcC&pg=PA175&dq=hadoop+the+definitive+guide+job+tracker&hl=en&sa=X&ei=7jjaT4KLPIOO8wTL5eDsBQ&ved=0CE4Q6AEwAA#v=onepage&q=Figure%205-4&f=false - Figure 5-4 shows the link you're looking for (Task Logs column, click the All link) – Chris White Jun 14 '12 at 19:20
  • I used both PigLogger pigLogger = this.getPigLogger(); pigLogger.warn(object,String,enum); and this.getLogger.info(String) And I cannot see any of them in the log file. I mean in the task logs. I clicked on ALL as told by you above(Thanks for that). What am I missing here? Please help. – Uno Jun 19 '12 at 01:16
  • Did the pig script execute more than a single MR job? i think you should try @ihadanny idea for local mode. – Chris White Jun 19 '12 at 10:04
  • Thanks Chris, I can see the Log statements now. Ran a small sample program and I can see the Logs in the Task logs. Thanks a lot. – Uno Jun 19 '12 at 21:52
2

perhaps obvious, but I advise debugging your UDF in local mode before deploying on a cluster/pseudocluster. This way, you can debug it right inside your IDE (eclipse in my case) which is easier than log-debugging.

ihadanny
  • 4,377
  • 7
  • 45
  • 76
  • Is there a site or some steps that I can follow to get started on Eclipse. I mean pig on eclipse. – Uno Jun 19 '12 at 21:54
  • 1
    don't know about a site with steps, but it's simple enough: put the hadoop-core and pig dependencies in your maven pom, and then work with `org.apache.pig.PigServer`. try `pigServer.registerScript(resource.getInputStream(), pigScriptParams, null);` and then `PigStats stats = pigServer.store("final_output", pigScriptParams.get("output_folder"), pigStoreFunc).getStatistics();` – ihadanny Jun 24 '12 at 10:48