How do I compute the first discrete difference using Dask DataFrame? Or, in "Pandas speak", how do I do pandas.DataFrame.diff()
in Dask? Mathematically, the operation is very simple: subtract a column vector from a copy of itself shifted by one or more rows.
I have tried implementing diff()
in Dask in the following ways, none of which works (yet):
df - df.shift(periods=1)
works in Pandas. But Dask DataFrame doesn't have ashift()
method.df.values[:-1] - df.values[1:]
works in Pandas. But I can't see how to index into a Dask DataFrame by position.
My current best idea for implementing diff
would be to wrap some custom code in dask.dataframe.rolling.wrap_rolling
, as suggested in this stack overflow answer (although I haven't been able to find any documentation on how to do this). Or wrap some custom code using Dask Delayed? Any other thoughts?