1

The question is exactly what is specified in the title.

I want to start my driver program on 192.168.1.1, but the fact is when I submit my spark application to yarn, yarn will choose a random machine to be the driver of my application.

Can I choose the driver manually in yarn cluster mode?
the dupilicated question won't work on yarn.

no123ff
  • 307
  • 5
  • 16
  • Possible duplicate of [Forcing driver to run on specific slave in spark standalone cluster running with "--deploy-mode cluster"](https://stackoverflow.com/questions/40526723/forcing-driver-to-run-on-specific-slave-in-spark-standalone-cluster-running-with) – philantrovert Dec 12 '17 at 09:02
  • Not sure if it'll work on yarn but you can try the solution given above. – philantrovert Dec 12 '17 at 09:03
  • 1
    apparently,it's not working – no123ff Dec 12 '17 at 09:25
  • I know the common answers to my question,thank you very much.I was wondering if there were some tricks can manage that?Or I won't ask here – no123ff Dec 12 '17 at 10:46
  • 1
    Thinking aloud...I think the only way to do it would be to use YARN labels that I may have seen supported in Spark. By default, Spark on YARN would deploy the driver together with ApplicationMaster on a random machine (that meets the resource requirements) – Jacek Laskowski Dec 12 '17 at 11:28
  • @JacekLaskowski thank u ~I will have a try – no123ff Dec 13 '17 at 02:34

2 Answers2

2

Like Yaron replied before, with YARN as master you have two options:

  • client
  • cluster

If you select cluster mode then you let yarn manage where the driver is spawned, based on resource availability in Yarn. If you select client mode then the driver is spawned in the client process, on the server where you ran the spark-submit.

So, a solution for your problem should be to run the command spark-submit --master yarn --deploy-mode client ... on the machine you want the driver to be on. Make sure that:

  • the machine has the resources to host the driver,
  • the resources you want to give to the driver are not committed to Yarn as well
  • there is a Spark gateway (for CM) role on that machine
UrVal
  • 351
  • 6
  • 17
0

If you want to use a specific machine as the driver, you should use YARN Client mode

SPARK docs - launching spark on yarn:

There are two deploy modes that can be used to launch Spark applications on YARN. In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

In YARN Client mode - the driver runs in the client process (you can choose the driver machine, it is the machine which execute the spark-submit command)

In YARN Cluster mode - the Spark driver runs inside an application master process which is managed by YARN on the cluster.

Yaron
  • 10,166
  • 9
  • 45
  • 65