How to load the latest version of delta parquet using spark?

Asked Jan 05 '23 at 08:12

Active Aug 24 '23 at 10:08

Viewed 143 times

I have access to a repository where a team writes parquet file (without partitioning them), using delta (i.e there is a delta log in this repository). I have no access to the table itself though. To create a dataframe from those parquet, I am using the below code:

spark.read.format('delta').load(repo)

Executing this loads the entire dataframe, regardless of the delta log. How should I proceed to load the latest version of my data?

edited Aug 24 '23 at 10:08

Oli

9,766
5
25
46

asked Jan 05 '23 at 08:12

V.Leymarie

2,708
2
11
18

1

If there's a delta log and you're using `.format('delta')` it *will* load latest version of the data. Post more details about how the directory looks like, which files are present vs loaded... – Kombajn zbożowy Jan 08 '23 at 21:10

How to load the latest version of delta parquet using spark?

0 Answers0