Spark/Koalas implementation of pandas resample('D') method

Question

I have a Spark dataframe that needs to be ffilled. The size of the dataframe is large (>100 million rows). I'm able to achieve what I want using pandas as shown below.

new_df = df_pd.set_index('someDateColumn') \
              .groupby(['Column1', 'Column2', 'Column3']) \
              .resample('D') \
              .ffill() \
              .reset_index(['Column1', 'Column2', 'Column3'], drop=True) \
              .reset_index()

I got stuck when trying .resample('D') using Koalas. Is there any better alternative to do ffill replication logic in spark native functions? The reason being, I want to avoid pandas as it is not distributed and executes only on Driver Node.

How can I achieve the same as above using Spark/Koalas packages?

score 0 · Answer 1 · answered Aug 04 '20 at 05:49

0

In case you are looking for forward fill in Spark, follow this tutorial in order to cater that - here

answered Aug 04 '20 at 05:49

dsk

1,863
2
10
13

Spark/Koalas implementation of pandas resample('D') method

1 Answers1