1

I'm writing a UDAF aggregation function and I want to return a data type that is either a struct with column names (e.g start and end both of long type) or 2 columns.

In the evaluate function tried to return a map type and an array but that was not what I was expecting.

Would love to get a clue about it. Thanks

Tom Ron
  • 5,906
  • 3
  • 22
  • 38
  • I had a similar question, which was answered. Perhaps this may help: https://stackoverflow.com/q/33939642/1433614 – ab853 Nov 28 '17 at 16:34

1 Answers1

1

The simplest way to do that is to return a List with your values in one field, and then, expand it in several columns.

Here you can read an example where the UDAF try to return two Integer columns:


UDAF (important code parts)


public YourUDAFName(someParams) {
    [...]
    _returnDataType = DataTypes.createArrayType(DataTypes.IntegerType);
}
[...]
@Override
public Object evaluate(Row buffer) {
    List<Integer> output = new ArrayList<>();
    output.add(1); //Here put your logical...
    output.add(5); // "
    return output;    
}

Example of use...


Dataset<Row> ds = getYourDatasetHere();
YourUDAFName udaf = new YourUDAFName(someParams);
ds.groupBy("yourGroupByKey")
.agg(udaf .apply(
    col("someColumnFromDs"),
    col("someOtherColumn")).as("columnWithList"));

// Here we expand the "columnWithList"...
List<Column> newColumns = new ArrayList<>();
for (int i = 0; i < numElementInTheList; i++) {
    ds = ds.withColumn("nameOfYourExpandedColumn", ds.col("outputByIntervals").getItem(i));
}
ds.show();

I hope that helps you!

tagore84
  • 301
  • 3
  • 8