0

I have a Spark dataframe that needs to be ffilled. The size of the dataframe is large (>100 million rows). I'm able to achieve what I want using pandas as shown below.

new_df = df_pd.set_index('someDateColumn') \
              .groupby(['Column1', 'Column2', 'Column3']) \
              .resample('D') \
              .ffill() \
              .reset_index(['Column1', 'Column2', 'Column3'], drop=True) \
              .reset_index()

I got stuck when trying .resample('D') using Koalas. Is there any better alternative to do ffill replication logic in spark native functions? The reason being, I want to avoid pandas as it is not distributed and executes only on Driver Node.

How can I achieve the same as above using Spark/Koalas packages?

svn
  • 159
  • 10

1 Answers1

0

In case you are looking for forward fill in Spark, follow this tutorial in order to cater that - here

dsk
  • 1,863
  • 2
  • 10
  • 13