1

I am submitting spark jobs using spark-submit in standalone mode. All these jobs are triggered using cron. I want to monitor these jobs for any failure. But using spark-submit if any exception occurs in the application (Ex. ConnectionException) the jobs are terminated and i get 0 as the exit status for the spark-submit. Also on the Spark-UI it shows the Job status as FINISHED. What can be done to get the failure of spark jobs in case on any Exception occurs?

thebytewalker
  • 316
  • 4
  • 15

3 Answers3

0

You can use spark-submit --status (as described in [Mastering Apache Spark 2.0]).

spark-submit --status [submission ID]

To check its status.

  1. You can submit your job by calling external process (spark-submit) & read the output stream to parse and extract submissionId .
  2. Then, check your job status by calling the above process.
tauitdnmd
  • 369
  • 2
  • 9
  • Yes, i got that but we will get the [submission ID] from the spark web-ui, And I want this monitoring to be automated – thebytewalker Aug 29 '18 at 09:12
  • You can submit your job by calling external process & read the output stream to parse and extract `submissionId` . Then, check your job status – tauitdnmd Aug 29 '18 at 09:47
0

Spark-submit submits an application, not a job. So you naturally see the exit code 0 and FINISHED if application started and stoped successfully whether any job is failed or not.

To be able to get a failure code, you need to make a change to the job you are submitting by spark-submit and modify the exit code it producing when a critical job is failed.

You can monitor the job status from with in submitted spark job, for example before the context close or exiting. You can use this:

JavaSparkContext sc;
... 
JavaSparkStatusTracker statusTracker = sc.statusTracker();
...
final SparkJobInfo jobInfo = statusTracker.getJobInfo(jobId);
final JobExecutionStatus status = jobInfo.status();

If the job is failed (status == FAILED), you can trigger exiting the application with code different to 0

 System.exit(1);

Which will allow application properly to close Spark context and finish. Then you will able to check the exit status of your spark-submit command, since you are launching in standalone mode.

Note: For standalone mode you even do not need to use the Spark-submit in general. The jar can be launched as java -jar ..., it would be the same.

0

I know you didn't ask for this, but I would strongly recommend you to run Spark applications using Airflow, not cron. It provides an integration with Apache Spark, which takes care of many things. One of them is this problem you've discovered.

SparkSubmitOperator solves this problem by tailing and parsing driver's logs to extract the resulting Spark's job status code.

However, if you want to implement the log parsing yourself, you can take a look at the _process_spark_submit_log method in the airflow.providers.apache.spark.hooks.spark_submit code to get some inspiration of how it's usually done in production systems.

jirislav
  • 340
  • 6
  • 16