I am trying to fetch when the table (Delta table) was last optimized using below code and the getting the output as expected. This code will for all the tables which are present in the database.
table_name_or_path = "abcd"
df = spark.sql("desc history {}".format(table_name_or_path)).select("operation","timestamp").filter("operation == 'OPTIMIZE'").orderBy(col("timestamp").desc())
if len(df.take(1)) != 0:
last_optimize = df.select(col("timestamp").cast("string").alias("timestamp")).first().asDict()
print(last_optimize["timestamp"])
last_optimize = last_optimize["timestamp"]
else:
last_optimize = ""
The above code will take some time and it will trigger lots of spark jobs.
I want to optimize the above code to get better performance.
Is there any way to write the optimized code and that will be more helpful.