2

I need to return back "DEAD"/"FAIL" to the job "status" if the pyspark job matches a certain condiftion. Example:

from pyspark.sql import SparkSession
spark = SparkSession.builder\
    .master("yarn")\
    .appName("IDL") \
        .getOrCreate()
for i in range(0,10):
    if i ==5:
         print("Bye " + str(i))
         #Exit the program and return status code

The return has to be explicity done through the pyspark program. Depending on the status, the next pyspark job would run.

Job Submit:

curl -X POST --data '{"file": "/user/root/jsmith/test1.py"}' -H "Content-Type: application/json" localhost:8998/batches

Fetch Job status:

curl localhost:8998/sessions/7

Output of above command should have "state":"DEAD".

Parijat Bose
  • 380
  • 1
  • 6
  • 22

1 Answers1

0

You can use something like below to get status explicitly from livy:

import requests
def get_status(self, job_id):
        status_url = 'https://{hdi_host}/livy/batches/{batch_id}'.format(hdi_host=self.hdi_host, batch_id=job_id)
        try:
            r = requests.get(status_url, verify=False, auth=self.auth)
            r.raise_for_status()
            return r.json()
        except Exception as e:
            print("Error in getting livy job status {}, error {}".format(status_url, e))
            return e

But if you want to build a pipeline around it, then you will have to use something like airflow or azkaban.

ashwin agrawal
  • 1,603
  • 8
  • 16
  • Hi @ashwin, I need to have specifically the status "DEAD/"FAIL" to be sent to command line. It is like if a certain condition matches, the pyspark job should exit giving a status to the client. – Parijat Bose Jan 08 '20 at 13:49