Can I do a groupby and shift on a Date column using Pandas on Spark API, similar to how I can in Pandas?

Asked Aug 19 '23 at 10:36

Active Aug 19 '23 at 10:36

Viewed 8 times

On Pandas, I can do the following code:

contract['PREV_END'] = contract.groupby('SUBSCR_NO').END.shift(1)

But using Pandas on Spark API, I get this error:

AnalysisException: cannot resolve 'isnan(lag(CON_END, 1, NULL) OVER (PARTITION BY SUBSCR_NO ORDER BY natural_order ASC NULLS FIRST ROWS BETWEEN -1 FOLLOWING AND -1 FOLLOWING))' due to data type mismatch: argument 1 requires (double or float) type, however, 'lag(CON_END, 1, NULL) OVER (PARTITION BY SUBSCR_NO ORDER BY natural_order ASC NULLS FIRST ROWS BETWEEN -1 FOLLOWING AND -1 FOLLOWING)' is of date type.;

I refer to the following documentation: https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.groupby.GroupBy.shift.html?highlight=shift#pyspark.pandas.groupby.GroupBy.shift

It doesn't state that Dates are not supported, but it also doesn't state what is supported.

What can I do to achieve this shift in Pandas on Spark API?

asked Aug 19 '23 at 10:36

Ee Ann Ng

Can I do a groupby and shift on a Date column using Pandas on Spark API, similar to how I can in Pandas?

0 Answers0