As far as I know, when you use .persist()
, writing the line persist
sets only the persistence level, and then the next action
in the script will cause the actual persistence work to be invoked.
However, sometimes, seemingly depending on the dataframe, persist()
will lead to a Java out of heap space error.
What is the intended behavior of persist, and why could this simple line actually lead to this memory error?