1

Hi I'm developing drill user-defined functions. I have written this UDF.

package somepackage.udfs;

import io.netty.buffer.DrillBuf;
import org.apache.drill.exec.expr.DrillSimpleFunc;
import org.apache.drill.exec.expr.annotations.FunctionTemplate;
import org.apache.drill.exec.expr.annotations.Output;
import org.apache.drill.exec.expr.annotations.Param;
import org.apache.drill.exec.expr.holders.Float8Holder;
import org.apache.drill.exec.expr.holders.NullableVarCharHolder;
import org.apache.drill.exec.expr.holders.VarCharHolder;

import javax.inject.Inject;

@FunctionTemplate(
        name = "split_sample",
        scope = FunctionTemplate.FunctionScope.SIMPLE,
        nulls = FunctionTemplate.NullHandling.NULL_IF_NULL
)
public class SplitTrainTestSample implements DrillSimpleFunc {
    @Param
    NullableVarCharHolder targetIn;

    @Param(constant = true)
    Float8Holder train_test_rate;

    @Output
    VarCharHolder label;

    @Inject
    DrillBuf buffer;

    public SplitTrainTestSample() {
    }

    @Override
    public void setup() {
    }

    @Override
    public void eval() {
        double r = Math.random();
        String l;
        assert 0 < train_test_rate.value && train_test_rate.value < 1;
        if (r < train_test_rate.value) {
            l = "train";
        }
        else {
            l = "test";
        }
        byte[] bytes = l.getBytes();

        label.buffer = buffer;
        label.start = 0;
        label.end = bytes.length;
        label.buffer.setBytes(0, bytes);
    }
}

But when I run this query

Apache Drill> select split_sample(cast(full_name as char), 0.5) from cp.`employee.json`;

Drill returns an error message.

Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 49: No match found for function signature split_sample(<CHARACTER>, <NUMERIC>)

Please help me find out the problem. I've written another UDF under the same package and it works quite well. So it's unlikely to be a UDF registry fault.

Is there a method to probe the function signature of UDFs?

fields1631
  • 21
  • 4

1 Answers1

0

Drill UDFs are really difficult to debug.

I suspect the issue in this case is:

double r = Math.random();

Try replacing that with:

double r = java.lang.Math.random();

If that doesn't work, you might want to try using a simple if statement rather than assert. Also, I've never seen (constant = true) in a UDF Param.

In general, aside from Drill internal classes, nearly all external classes must be written out with their full paths. You cannot import anything into a UDF other than Drill internals. UDFs actually use a subset of Java and the result is that it can be a little tricky to know what is supported and what is not.

What makes this annoying is that Drill doesn't give you any helpful info for debugging. It just says that it can't find the UDF.

When this has happens to me, what I'll do is comment out the entire UDF body, run the function after uncommenting each line to see which line is causing the issue.

One easy way around that is to create a helper class with static functions.

public class functionHelpers {
 public String getLabel(<params>) {
   // Your code here... 
 }
}

// Then in your UDF exec() method
...

String label = com.my_package.functionHelpers.getLabel(<params>);
...

If you have complex UDFs that makes life a lot easier. You can also debug the code much easier in that you can easily write unit tests for the helper class then use canned code to map the output to Drill vectors.

cgivre
  • 513
  • 4
  • 21