Spark DataFrame is not saved in Delta format

Question

I want to save Spark DataFrame in Delta format to S3, however, for some reason, the data is not saved. I debugged all the processing steps there was data and right before saving it, I ran count on the DataFrame which returned 24 rows. But as soon as save is called no data appears in the resulting folder. What could be the reason for it?

This is how I save the data:

df
  .select(schema)
  .repartition(partitionKeys.map(new ColumnName(_)): _*)
  .sortWithinPartitions(sortByKeys.map(new ColumnName(_)): _*)
  .write
  .format("delta")
  .partitionBy(partitionKeys: _*)
  .mode(saveMode)
  .save("s3a://etl-qa/data_feed")

Can you replace `format` and `path` to their values so it's clear what you do? How do you check that _"no data appears in the resulting folder"_? — Jacek Laskowski, Dec 15 '20 at 20:32
Also, what is `saveMode` ? Have you validated if writing works without repartition, sortWithinPartitions and partitionBy ? — Michael Heil, Dec 16 '20 at 09:57

score 1 · Answer 1 · answered Dec 14 '20 at 20:26

1

There is a quick start from Databricks that explains how to read and write from and to a delta lake.

If the Dataframe you are trying to save is called df you need to execute:

df.write.format("delta").save(s3path)

answered Dec 14 '20 at 20:26

Michael Heil

16,250
3
42
77

I've edited the code I tried and still the data is not saved and there are no errors in the program – Cassie Dec 15 '20 at 05:28
Can you share a minimal reproducible example? – Michael Heil Dec 15 '20 at 07:11

Spark DataFrame is not saved in Delta format

1 Answers1