0

I have access to a repository where a team writes parquet file (without partitioning them), using delta (i.e there is a delta log in this repository). I have no access to the table itself though. To create a dataframe from those parquet, I am using the below code:

spark.read.format('delta').load(repo)

Executing this loads the entire dataframe, regardless of the delta log. How should I proceed to load the latest version of my data?

Oli
  • 9,766
  • 5
  • 25
  • 46
V.Leymarie
  • 2,708
  • 2
  • 11
  • 18
  • 1
    If there's a delta log and you're using `.format('delta')` it *will* load latest version of the data. Post more details about how the directory looks like, which files are present vs loaded... – Kombajn zbożowy Jan 08 '23 at 21:10

0 Answers0