0

I am raising a custom exception to test failure in my structured streaming job as below. I see the query gets terminated but not able to understand why driver script is not failing with a non zero exit code

streamingDF.writeStream
        .trigger(Trigger.ProcessingTime(10000L))
        .foreachBatch {
          (batchDF: DataFrame, batchId: Long) => {
            val transformedDF: DataFrame = DoSomeProcessing(batchDF)
            if (batchId == 1) {
              throw new Exception("Custom Exception as batchId is 1")
            }

I get below trace on my console but the driver script is not exiting and no new logs are printed on console.

Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: Custom Exception as batchId is 1
=== Streaming Query ===
Identifier: [id = 6f4c3b4c-bc30-46fe-93ef-8378c23380ab, runId = 1241cb37-493b-4882-ab28-9df8a8c6fb1a]
Current Committed Offsets: ...
Current Available Offsets: ...

Current State: ACTIVE
Thread State: RUNNABLE

Logical Plan:
RepartitionByExpression [timestamp#12], 10
...
    at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: java.lang.Exception: Custom Exception as batchId is 1
    at MySteamingApp$$anonfun$startSparkStructuredStreaming$1.apply(MySteamingApp.scala:61)
    at MySteamingApp$$anonfun$startSparkStructuredStreaming$1.apply(MySteamingApp.scala:57)
    at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:534)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:532)
    at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
    at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:531)
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
    at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
    at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
    at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
    at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
    ... 1 more
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
conetfun
  • 1,605
  • 4
  • 17
  • 38

1 Answers1

2

I think number of task failures were configured more

spark.task.maxFailures default 4 Number of failures of any particular task before giving up on the job. The total number of failures spread across different tasks will not cause the job to fail; a particular task has to fail this number of attempts. Should be greater than or equal to 1. Number of allowed retries = this value - 1.

Further have a look at Is there a way to dynamically stop Spark Structured Streaming?

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • ,My apologies but I got to check it now only. This doesn't seem to be an issue with spark.task.maxFailures as I don't see any failed tasks on Spark UI. With above code, I see batch 0 and batch 1 as completed and after that there is no new job/batch triggered in Spark UI but the driver script just don't exit – conetfun May 31 '20 at 18:41
  • then you could stop spark context ssc when ever exception happened then it will stop as pointed in link above – Ram Ghadiyaram May 31 '20 at 20:04
  • Yeah, I could do that but I wanted to understand from streaming job perspective that what is stopping this job from not returning with a non zero exit code. – conetfun May 31 '20 at 20:05
  • so its not usual case then you have to double check in to your code, configuration properties etc... completely – Ram Ghadiyaram May 31 '20 at 20:07
  • I was wondering how spark structured streaming job decides whether an exception is fatal or not and based on that driver script should fail and exit or not – conetfun May 31 '20 at 20:09