How can I load a checkpointed pyspark dataframe

Asked Apr 29 '21 at 19:13

Active Apr 30 '21 at 08:54

Viewed 67 times

My code below crashed, and instead of to restart from the start, I would like to start from the last checkpointed dataframe. How can I load it? I have got this folder in my directory /tmp/53af5ba0-4419-4ab9-93c0-e5f69fd1c8eb

spark.sparkContext.setCheckpointDir("/tmp")

df_1 = df.randomSplit([1.0] * 10, 123456)

for i in range(len(df_1)):
   df_1[i]=df_1[i].join(df_2)
   df_1[i].checkpoint()
   print(f'df[{i}] checkpointed!')

edited Apr 30 '21 at 08:54

asked Apr 29 '21 at 19:13

Florian

How can I load a checkpointed pyspark dataframe

0 Answers0