Read the last delta partition without read all the delta

Question

I need to read automatically a delta file and I need to read only the last partition that was created. All the delta is big. The delta is partitioned by yyyy and mm

val df = spark.read.format("delta").load("url_delta").where(s"yyyy=${yyyy} and mm=${mm}")

I need to know the values of yyyy year and mm month. Is not efficient read all the delta and filter it bt the max("yyyy") and the max("mm")

score 3 · Answer 1 · answered Dec 24 '20 at 00:19

3

Actually, if you partition on yyyy and mm, then getting the max year and month will be a metadata only operation and just look at the transaction log, so it should be really quick.

answered Dec 24 '20 at 00:19

Joe Widen

2,378
1
15
21

Read the last delta partition without read all the delta

1 Answers1