3

I am trying to run a PySpark job on a Mesosphere cluster but I cannot seem to get it to run. I understand that Mesos does not support cluster deploy mode for PySpark applications and that it needs to be run in client mode. I believe this is where the problem lies.

When I try submitting a PySpark job I am getting the output below.

... socket.hpp:107] Shutdown failed on fd=48: Transport endpoint is not connected [107]

I believe that a spark job running in client mode needs to connect to the nodes directly and this is being blocked?

What configuration would I need to change to be able to run a PySpark job in client mode?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Jeff
  • 31
  • 1

1 Answers1

0

When running PySpark in client mode (meaning the driver is running where you invoke Python) the driver becomes the Mesos Framework. When this happens, the host the framework is running on needs to be able to connect to all nodes in the cluster, and they need to be able to connect back, meaning no NAT.

If this is indeed the cause of your problems, there are two environment variables that might be useful. If you can get a VPN in place, you can set LIBPROCESS_IP and SPARK_LOCAL_IP both to the IP of the host machine that cluster nodes can use to connect back to the driver.

tarnfeld
  • 25,992
  • 41
  • 111
  • 146