I'm trying to persist a temp view with the purpose of querying it again via sql:
val df = spark.sqlContext.read.option("header", true).csv("xxx.csv")
df.createOrReplaceTempView("xxx")
persist/cache:
df.cache() // or
spark.sqlContext.cacheTable("xxx") // or
df.persist(MEMORY_AND_DISK) // or
spark.sql("CACHE TABLE xxx")
Then I move the underlying xxx.csv
, and:
spark.sql("select * from xxx")
Upon which, I find that only CACHE TABLE xxx
stores a copy. What am I doing wrong, how can persist eg. DISK_ONLY
a queryable view/table?