1

I am fairly new to spark and was exploring the topic of submitting spark jobs to a cluster. As per my understanding, every spark-submit job is a separate application in itself. As per our requirement, we need to access tables created by a spark session (in one spark-submit job) in another session created by a subsequent spark-submit application. Is there a way to do this currently? if so, any insights on how to do it would be helpful.

Note: I have found https://medium.com/@kar9475/data-sharing-between-multiple-spark-jobs-in-databricks-308687c99897 but to only talks about share state across a single application.

  • I am afraid that you will have to use some external data sink like a distributed filesystem, Hive or a database – werner May 15 '21 at 14:22

0 Answers0