I have next code, where I am repartition filtered input data and persist it:
val df = sparkSession.sqlContext.read
.parquet(path)
.as[struct1]
.filter(dateRange(_,lowerBound,upperBound))
.repartition(nrInputPartitions)
.persist()
df.count
I expect all data to be stored in Memory, but instead I get the following in Spark UI:
Storage
Size in Memory 424.2 GB Size on Disk 44.1 GB
Is it because some partition didn't have enough Memory, and Spark automatically switched to MEMORY_AND_DISK
storage level?