You can cache your streaming data using cache or persist function as following
dstream.persist()
Do it only if you are using stream multiple times. For reducebywindow
and reducebyKeyandWindow
operation this is automatically done.
In your streaming job to keep your job running you need to initiate spark streaming context and start this context
val ssc = new StreamingContext(sc, Seconds(1))
// your logic goes here
ssc.start()
If your job is getting killed after running for few hours(and your cluster is kerborized), then check if kerberos tickets are expiring. This can cause long running job to fail.
Edit :
Note : If you are Talking specifically about structured streaming. cache on streaming datasets is not supported.. check this post Why does using cache on streaming Datasets fail with "AnalysisException: Queries with streaming sources must be executed with writeStream.start()"?