4

There is a way to enable graceful shutdown of spark streaming by setting property spark.streaming.stopGracefullyOnShutdown to true and then kill the process with kill -SIGTERM command. However I don't see such option available for structured streaming (SQLContext.scala).

Is the shutdown process different in structured streaming? Or is it simply not implemented yet?

Yuriy Bondaruk
  • 4,512
  • 2
  • 33
  • 49
  • We had a similar [case](https://stackoverflow.com/questions/59310665/how-to-stop-a-notebook-streaming-job-gracefully/59760474#59760474) recently and solved it by using the filesystem to stop the streaming job gracefully – abiratsis Mar 20 '20 at 23:42

3 Answers3

4

This feature is not implemented yet. But the write ahead logs of spark structured steaming claims to recover state and offsets without any issues.

Akhil Bojedla
  • 1,968
  • 12
  • 19
  • 1
    So basically what you are saying is that if I shut down a Spark Structured Streaming application launched through spark-submit command using a kill command, despite not having a "graceful shutdown", no data will be lost? – Ander Mar 19 '18 at 04:03
  • 1
    @AnderMurilloZohn yes – Akhil Bojedla Mar 19 '18 at 08:23
0

This Feature is not implemented yet and will also give you duplicates if you kill the job from the resource Manager while the batch is running.

Corrected: The duplicates will only be in the output directory, Write ahead logs handle everything beautifully; you don't need to worry about anything. Feel free to kill it any time.

iSingh
  • 61
  • 1
  • 3
  • According to another answer there shouldn't be issues due write-ahead logs feature. Did you try to kill the process from command line? Does it produce duplicates too? – Yuriy Bondaruk May 14 '18 at 18:40
  • Depends on your definition of issue. Technically duplicates are not considered an issue because you have at-least-once delivery guarantee anyway and duplicates don't violate that guarantee, i.e. you should be expecting duplicates anyway. – lfk Sep 27 '18 at 02:15
  • Yes, Correcting my old self here, You will not get duplicates the WAL stuff is so good in Spark it can handle any kill command. Go ahead and do it. – iSingh Jul 29 '22 at 10:31