I am working on an attribution report and i am caching the dataframe since it is being used frequently in the later stage of the code. Once the usage is done, should i unpersist() or unpersist(true). I understand the basic difference would be async and sync respectively. But does one have more latency than the other? Or is there any other implications to it?
val dfForWeb = loadData(aggregationType, readConfigForWeb).cache()
//some logical code blocks
..
..
..
dfForWeb.unpersist() //This works fine
//Tried using the below and got the same result:
//dfForWeb.unpersist(true) --This also works fine
The actual code is as follows:
val dfForWeb = loadData(aggregationType, readConfigForWeb).cache()
val dfForMobile = loadData(aggregationType, readConfigForMobile).cache()
if (condition) {
for (item <- GeoAggregationList) {
processData(dfForWeb) //This dataframe is used for a lot of computations later
}
} else {
processData(dfForWeb) //This dataframe is used for a lot of computations later
}
dfForWeb.unpersist()
dfForMobile.unpersist()
I am trying to be cautious as this application needs to be scaled and when the actual data is processed, i am doubtful if unpersist() and unpersist(true) would make a huge difference in terms on latency and data loss. Please advise.