I'm trying to setup a spark cluster and I've come across an annoying bug... When I submit a spark application it runs fine on workers until I kill one (for example by using stop-slave.sh on the worker node). When the worker is killed spark will then try to relaunch an executor on an available worker node but it fails everytime (I know because the webUI either displays FAILED or LAUNCHING for the executor, it never succeeds).
I can't seem to find any help, even on the documentation, so can someone assure me that spark can and will try to relaunch a worker on an available node if one is killed (on the same node where the worker previously ran or on another available node if the node where it previously rank is unreachable) ?
Here's the output from the worker node :
Spark worker error Thank you for your help !