0

I m new to hadoop and pig. I wonder how to run a pig script that internally calls a UDF method? The thing is I dont see the statement "register blah.jar" mentioned like on Pig UDF Manual site:

register myudfs.jar;
A = load 'student_data' as (name: chararray, age: int, gpa: float);
B = foreach A generate flatten(myudfs.Swap(name, age)), gpa;
C = foreach B generate $2;
D = limit B 20;
dump D;

But i do see a "jar" directory that contains "blah.jar". My coworker left already, so i wonder what was the trick? Maybe I can add the jar file to the command line?

Thanks a lot!

trillions
  • 3,669
  • 10
  • 40
  • 59
  • Have you tried running the script? If so, do you get an error message? What is the name of the UDF being called? – reo katoa Nov 19 '12 at 04:19
  • If you could answer @WinnieNicklaus's questions, that would help us understand what the problem may be. Too many questions at this point. – Dan Nov 19 '12 at 17:40
  • Winnie and Dan, I have not tried to run the script. Since I actually have two questions (or more). first of all, I am not sure how exactly I can generate a myudfs.jar. So on my mac, should I just open a project on eclipse and add pig's lib/jar, then code the UDF and wrap everything into myudfs.jar? Secondly, in the script, do i really need "register myudfs.jar"? If not, how does pig find the jar? – trillions Nov 19 '12 at 18:07
  • What is the name of the UDF? – reo katoa Nov 19 '12 at 18:41
  • Winnie, I wanna code the UDF you suggested in another question thread of mine :) – trillions Nov 19 '12 at 19:10

1 Answers1

1

If there is no REGISTER statement in the script (and the script is valid), then it does not call any UDFs except possibly any of Pig's builtin functions. If you would like to use a UDF, you will need a REGISTER statement. REGISTER is unnecessary if no UDFs are called, which is probably why you don't see it in the script you have.

Here is a good reference on writing UDFs. After you have written it, you will need to compile it into a jar file, being sure to also include any classes it depends on (such as EvalFunc). This is the jar you will REGISTER.

reo katoa
  • 5,751
  • 1
  • 18
  • 30
  • Thanks a lot, Winnie! I am now surprised with the script i read at work that calls some UDFs but no register on the top of the script...But at least i can firstly build my own UDF to understand how to run it in the script, then i will find out more :) Really appreciate your help! :) – trillions Nov 19 '12 at 23:13