I'm working on adding Spark 3.1
and Scala 2.12
support for Kylo Data-Lake Management Platform.
I need help with migrating the following functions:
/**
* Creates an {@link Accumulable} shared variable with a name for display in the Spark UI.
*/
@Nonnull
static <R, P1> Accumulable<R, P1> accumulable(@Nonnull final R initialValue, @Nonnull final String name, @Nonnull final AccumulableParam<R, P1> param,
@Nonnull final KyloCatalogClient<Dataset<Row>> client) {
return ((KyloCatalogClientV2) client).getSparkSession().sparkContext().accumulable(initialValue, name, param);
}
/**
* Applies the specified function to the specified field of the data set.
*/
@Nonnull
static Dataset<Row> map(@Nonnull final Dataset<Row> dataSet, @Nonnull final String fieldName, @Nonnull final Function1 function, @Nonnull final DataType returnType) {
final Seq<Column> inputs = Seq$.MODULE$.<Column>newBuilder().$plus$eq(dataSet.col(fieldName)).result();
final UserDefinedFunction udf = new UserDefinedFunction(function, returnType, Option$.MODULE$.<Seq<DataType>>empty());
return dataSet.withColumn(fieldName, udf.apply(inputs));
}
I'm adding a new maven module kylo-spark-catalog-spark-v3
to support apache-spark:3.1.2
and scala:2.12.10
at time of writing this.
I'm having trouble in:
- Creating an instance of
AccumulatorV2
as the deprecation notice on theAccumulable
class is not very clear. here's my attempt at the first function - NOT COMPILING:
@Nonnull
static <R, P1> AccumulatorV2<R, P1> accumulable(@Nonnull final R initialValue, @Nonnull final String name, @Nonnull final AccumulatorV2<R, P1> param,
@Nonnull final KyloCatalogClient<Dataset<Row>> client) {
AccumulatorV2<R, P1> acc = AccumulatorContext.get(AccumulatorContext.newId()).get();
acc.register(((KyloCatalogClientV3) client).getSparkSession().sparkContext(), new Some<>(name), true);
return acc;
}
- Creating an instance of UDF in the second function, the
UserDefinedFunction
seems to complain that it cannot be instanciated as its an abstract class. here's my attempt at the second function - COMPILING but not sure if makes sense:
/**
* Applies the specified function to the specified field of the data set.
*/
@Nonnull
static Dataset<Row> map(@Nonnull final Dataset<Row> dataSet, @Nonnull final String fieldName, @Nonnull final Function1 function, @Nonnull final DataType returnType) {
final Seq<Column> inputs = Seq$.MODULE$.<Column>newBuilder().$plus$eq(dataSet.col(fieldName)).result();
final UserDefinedFunction udf = udf(function, returnType);
return dataSet.withColumn(fieldName, udf.apply(inputs));
}
Can you please advice me on how to get this right, or if there's docs out there that is close to this case.