0

We are using DSE Analytics. I am trying to schedule a spark job using crontab , via spark-submit. Basically this job should run every night , When the job is about to be submitted for subsequent times , the existing application should be killed , I am having trouble finding a way to do it.

Because I am unable to find the application Id of the submitted job or the driver Id so I can shutdown gracefully.

I understand that the Spark Master Web UI can be used to find the Submission Id , but if I am going to setup a cron for this , I can't get the Id from the UI . Is there a proper way to do this. We are running DSE 6.7 with Analytics running in a dedicated DC. Any help would be appreciated

zXor
  • 208
  • 1
  • 10
  • can you show the command line that you're using to start the job? – Alex Ott Aug 19 '19 at 06:54
  • I am using the below command dse spark-submit --class com.spark.Test /home/centos/spark-1.0-SNAPSHOT.jar – zXor Aug 19 '19 at 09:49
  • @zXor I am not sure about dse analytics but if you want these scenario to be handled using shell scripting, I have the algorithm created for this. Let me know if you need it. – Goldie Aug 19 '19 at 10:12

1 Answers1

1

Because you're running it this way, then the driver is deployed in the client mode, meaning that it's executing on your local machine, so you can kill it with just kill command. You can find the process ID with something like this

ps -aef|grep com.spark.Test|grep -v grep|awk '{print $2}'
Alex Ott
  • 80,552
  • 8
  • 87
  • 132