0

I am trying to write a Java UDF with the end goal of extending/overriding the load method of PigStorage to support entries that take multiple lines.

My pig script is as follows:

REGISTER udf.jar;
register 'userdef.py' using jython as parser;
A = LOAD 'test_data' USING PigStorage() AS row:chararray;
C = FOREACH A GENERATE myTOKENIZE.test();
DUMP D;

udf.jar looks like:

udf/myTOKENIZE.class

myTOKENIZE.java imports org.apache.pig.* ande extends EvalFunc. the test method just returns a Hello world String.

The problem that I am having is that when I try to call the method test() of class myTOKENIZE I get Error 1070: ERROR 1070: Could not resolve myTOKENIZE.test using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] Thoughts?

kira_codes
  • 1,457
  • 13
  • 38
  • As you found out already, you need to call the UDF using the full package name, udf.myTOKENIZE.test() in this case. An alternative is to use a define statement first like `define myTOKENIZE udf.myTOKENIZE();` Then you can simply use the UDF via myTOKENIZE.test() – LiMuBei Jul 08 '15 at 12:02

4 Answers4

1

As your UDF extends EvalFunc there should me a method called exec() in the class myTOKENIZE.

Your pig code would then look as follows:

C = FOREACH A GENERATE udf.myTOKENIZE(*);

Please read http://pig.apache.org/docs/r0.7.0/udf.html#How+to+Write+a+Simple+Eval+Function

Hope that helps.

Frederic
  • 3,274
  • 1
  • 21
  • 37
  • Fred, thank you for your response! I have followed that tutorial but I still get the same error when I try your method. ` Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve myTOKENIZE using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]` When I attempt to follow the tutorial and call myudfs.myTOKENIZE(*) I get `ERROR 2998: Unhandled internal error. myudfs/myTOKENIZE (wrong name: myudf/myTOKENIZE)` – kira_codes Jul 07 '15 at 14:51
  • Actually, your suggestion to check the tutorial again did provide some insight. The reason calling myudfs.myTOKENIZE(*); did not work was because I had defined the package in myTOKENIZE.java as **package myudf** rather than **package myudfs**. I suspect that this was the root of all errors. – kira_codes Jul 07 '15 at 15:07
0

So is myTOKENIZE in the package udf? In that case you'd need

C = FOREACH A GENERATE udf.myTOKENIZE.test();
DMulligan
  • 8,993
  • 6
  • 33
  • 34
0

After waaaaay too much time (and coffee) and a bunch a trial and error, I figured out my issue.

Important note: For some jar myudfs.jar, the classes contained within must have package defined as myudfs.

The corrected code is as follows:

REGISTER myudfs.jar;
register 'userdef.py' using jython as parser;
A = LOAD 'test_data' USING PigStorage() AS row:chararray;
C = FOREACH A GENERATE myudfs.myTOKENIZE('');
DUMP C;

myTOKENIZE.java:

package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;
public class myTOKENIZE extends EvalFunc (String)
{
    public String exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0)
            return null;
        try{
            String str = (String)input.get(0);
            return str.toUpperCase();
        }catch(Exception e){
            throw WrappedIOException.wrap("Caught exception processing input row ", e);
        }
    }
}

the structure of myudfs.jar:

myudfs/myTOKENIZE.class

Hopefully this proves useful to someone else with similar issues!

kira_codes
  • 1,457
  • 13
  • 38
  • 1
    I've never named the package the same thing as my jar and I've used UDFs without problem. I think you found the incorrect root cause. What package was your classes originally in? – DMulligan Jul 09 '15 at 18:33
0

This is very late but I think the solution is that while using the udf in your pig you have to give fully qualified path of the class with your package name. package com.evalfunc.udf; and Power is my class name as public class Power extends EvalFunc<Integer> {....}

Then while using it in pig first register the jar file in pig and then use the udf with full package name like:

record = LOAD '/user/fsbappdev/maitytest/pig/pigudf/power_data' USING PigStorage(',');

pow_result = foreach record generate com.evalfunc.udf.Power(base,exponent);