I have an application that submits jobs using livy. In the same livy session, various jobs are submitted. At times these jobs might be working on similar datasets, and so I want to reuse data from one job to another. I am caching the dataset in the jobs that I am submitting. But whenever a new job is submitted, it is not picking up the cached dataset, but instead caching the same data all over again.
Is caching a dataset dependent on the variable? Eg, if I do
var d1 = //make some dataset
d1.cache
and in another subsequent job,
var d2 = //same dataset
d2.cache
can I expect there to be only one cached dataset, and d2 to use the previously cached data? Currently I am seeing separate cached data in the storage section of my spark application. For reference, I am using the Livy programmatic API: here for submitting my jobs.