1

I need to read automatically a delta file and I need to read only the last partition that was created. All the delta is big. The delta is partitioned by yyyy and mm

val df = spark.read.format("delta").load("url_delta").where(s"yyyy=${yyyy} and mm=${mm}")

I need to know the values of yyyy year and mm month. Is not efficient read all the delta and filter it bt the max("yyyy") and the max("mm")

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
AFC
  • 29
  • 5

1 Answers1

3

Actually, if you partition on yyyy and mm, then getting the max year and month will be a metadata only operation and just look at the transaction log, so it should be really quick.

Joe Widen
  • 2,378
  • 1
  • 15
  • 21