I'd like to test delta-cache in local cluster mode (jupyter)
1. What I want to do:
Whole delta-formatted files aren't re-downloaded every time, only new data will be re-downloaded
2. What I've tried
...
# cell1
spark.conf.set("spark.databricks.io.cache.enabled", "true")
# cell2
spark.sql("""
CREATE TABLE my_table2
USING DELTA
LOCATION 'MY_S3_DELTA_FORMATTED_PATH';
""")
# cell3
import time
s = time.time()
spark.sql("select * from my_table2").show()
print(time.time() - s)
But cell3
shows the same time taken every time I run the cell, which means that the table is not cached.
Did I miss something?