Is it mandatory to use df.unpersist()
after using df.cache()
to release the cache memory?
If I store my DataFrame in cache without unpersisting, then the code runs very quickly. However, it takes pretty longer time when I use df.unpersist()
.
Asked
Active
Viewed 9,704 times
1

Markus
- 3,562
- 12
- 48
- 85
-
When RDD is garbage collected, unpersist is automatically called on it. – vvg May 23 '18 at 09:30
-
@Rumoku: And when RDD is garbage collected? – Markus May 23 '18 at 09:31
-
It's JVM, so RDD became eligible for garbage collection as soon as there are no more references to this object, isn't it. – vvg May 23 '18 at 09:41
1 Answers
5
It is not mandatory, but if you have a long run ahead and you want to release resources that you no longer need, it's highly suggested that you do it. Spark will anyhow manage these for you on an LRU basis; quoting from the docs:
Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion.
The unpersist
method does this by default, but consider that you can explicitly unpersist
asynchronously by calling it with the a blocking = false
parameter.
df.unpersist(false) // unpersists the Dataframe without blocking
The unpersist
method is documented here for Spark 2.3.0.

stefanobaghino
- 11,253
- 4
- 35
- 63
-
What is the difference between `df.unpersist()` and `df.unpersist(false)` ? – thentangler Jan 25 '21 at 02:28
-
1It means to not wait for all blocks to be unpersisted before returning. – stefanobaghino Jan 26 '21 at 07:03
-
-
1"Blocking" is the term usually applied to any form of call that returns the control to the caller after it has performed the action it's supposed to. In contrast, a call "does not block" if it somehow starts performing the action and returns the control to the caller while the action keeps running in the background. These latter methods usually rely on either accepting a callback to be run when the action is performed or by returning some form of handle that represents the "promise" that the action will be performed (sometime called a "future"). – stefanobaghino Feb 12 '21 at 07:05
-
-