2

I have defined a column StartDate as follows for a DataFrame I am loading using the dataframe-ec library.

schema.addColumn("StartDate", ValueType.DATE); 

I would like to add a computed column named DaysToEvent but am unsure how to define a function leveraging the Java time library so the following expression code will work.

dataFrame.attachColumn(
        dataFrame.createComputedColumn(
                "DaysToEvent",
                ValueType.LONG,
                "daysBetween(toDate(2023, 3, 11), StartDate)"));

I saw there was a built-in function named withinDays but am hoping to not have to change the library to add this function. I tried defining the expression using Java Code inline but that didn't work.

dataFrame.attachColumn(
        dataFrame.createComputedColumn(
                "DaysToEvent",
                ValueType.LONG,
                "java.time.temporal.ChronoUnit.DAYS.between(toDate(2023, 3, 11), StartDate)"));

Donald Raab
  • 6,458
  • 2
  • 36
  • 44

1 Answers1

1

You can add your custom functions to the dataframe-ec expression DSL at runtime. Then these functions can be used just like those that come with the library out of the box in expressions like computed columns, filters, and so on. So you will be able to write code that looks like:

dataFrame.addColumn("DaysToEvent", "daysBetween(toDate(2023, 3, 11), StartDate)")

(this is a bit more streamlined than what you have in your example: you can call addColumn directly on a dataframe object and you don’t need to specify the type of a computed column as it is inferred from the expression you provide).

To add a function (in this case daysBetween) to the expression DSL you need to call the addFunctionDescriptor method on the BuiltInFunctions class. You can look at the existing implementations in BuiltInFunctions (which also has examples of dealing with parameters, different types, validations, etc.) and also take a look at RuntimeAddedFunctionTest. In your case something like this should work:

BuiltInFunctions.addFunctionDescriptor(new IntrinsicFunctionDescriptor("daysBetween", Lists.immutable.of("date1", "date2"))
    {
        @Override
        public Value evaluate(EvalContext context)
        {
            LocalDate date1 = ((DateValue) context.getVariable("date1")).dateValue();
            LocalDate date2 = ((DateValue) context.getVariable("date2")).dateValue();

            return new LongValue(ChronoUnit.DAYS.between(date1, date2));
        }

        @Override
        public ValueType returnType(ListIterable<ValueType> paraValueTypes)
        {
            return LONG;
        }
    }
);

Note: I am the original author of the dataframe-ec library

zakhav
  • 56
  • 4