9

I've submitted my spark job in ambari-server using following command..

  ./spark-submit --class  customer.core.classname --master yarn --numexecutors 2 --driver-memory 2g --executor-memory 2g --executor-cores 1 /home/hdfs/Test/classname-0.0.1-SNAPSHOT-SNAPSHOT.jar newdata host:6667

and it is working fine...

But how can it will be keep on running like if we close the command prompt or try to kill the job, it must be keep on running.

Any help is appreciated.

StepUp
  • 36,391
  • 15
  • 88
  • 148
Mohan.V
  • 141
  • 1
  • 1
  • 10

4 Answers4

9

You can achieve this by couple of ways

1)You can run the spark submit driver process in background using nohup Eg:

nohup  ./spark-submit --class  customer.core.classname \
  --master yarn --numexecutors 2 \
  --driver-memory 2g --executor-memory 2g --executor-cores 1 \
  /home/hdfs/Test/classname-0.0.1-SNAPSHOT-SNAPSHOT.jar \
  newdata host:6667 &

2)Run in deploy mode as cluster so that driver process runs in different node.

Ilario Pierbattista
  • 3,175
  • 2
  • 31
  • 41
Vishnu V R
  • 111
  • 1
  • oh!! so if i am correct if the deployment mode is cluster and using nohup the driver will run onto some other machine in my cluster but if the deployment mode is client and using nohup the driver will run on the same machine form which i am submitting but just in background – Mohan.V May 16 '16 at 04:49
  • 1
    The answer leaves this slightly unclear. When running in --deploy-mode cluster, you are safe to quit spark-submit via ctrl+c. The spark job will continue to run even after you exit. Don't worry about the 'ShutdownHook' message that gets printed. it doesn't actually stop your job. – Jack Davidson Nov 06 '18 at 22:46
7

I think this question is more about shell than spark,

To keep an application running, even when closing the shell, tou should add & at the end of your command. So your spark-submit command will be (just add the & to the end)

./spark-submit --class  customer.core.classname --master yarn --numexecutors 2 --driver-memory 2g --executor-memory 2g --executor-cores 1 /home/hdfs/Test/classname-0.0.1-SNAPSHOT-SNAPSHOT.jar newdata host:6667 &
[1] 28299

You still get the logs and output messages, unless you redirected them

user1314742
  • 2,865
  • 3
  • 28
  • 34
  • Then with `jobs -l` you can check which PID have a background program, and its status (unless you closed the shell where the command has been launched from). – Andrea Jun 21 '17 at 13:40
0

hope I understand the question. In general, if you want a process to keep running you can create a process file that will run in the background. in your case, the job will continue running until you specifically kill it using yarn -kill. so even if you kill the spark submit it will continue to run since yarn is managing it after submission.

z-star
  • 680
  • 5
  • 6
0

Warning: I didn't test this. But the better way to do what you describe is probably by using the following settings:

--deploy-mode cluster \
--conf spark.yarn.submit.waitAppCompletion=false

Found here: How to exit spark-submit after the submission

Jack Davidson
  • 4,613
  • 2
  • 27
  • 31