2

I am using Spark 2.4.3 on five nodes in client mode and standalone mode for testing purposes and I am assigned to a limited range of ports. Hence I have configured all ports which are possible according to the docs to avoid that Spark takes arbitry ports outside of my released code range:

in spark-env.sh:

SPARK_MASTER_PORT=master-port<br>
SPARK_WORKER_PORT=worker-port

at the command line for spark-submit:

-master master-ip:master-port 
--conf spark.blockManager.port=block-manager-port 
--conf spark.driver.blockManager.port=block-manager-port  
--conf spark.driver.port=driver-port

But now I find in the log for the Driver:

... INFO CoarseGrainedSchedulerBackend$DriverEndpoint: 
Registered executor NettyRpcEndpointRef(spark-client://Executor) (worker-ip:arbitrary-port) 
with ID 4

This means that spark uses an arbitrary port for the executor - with a port number that is outside of the range which is assigned to me - though I have all possible ports configured.

This results in the following log entry for the driver:

... INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations
for shuffle 2 to worker-ip:arbitrary-port

Otherwise at some worker log there is to find:

... INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = 
NettyRpcEndpointRef(spark://MapOutputTracker@driver-ip:driver-port)
...
... INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 13, fetching them
... INFO MapOutputTrackerWorker: Got the output locations

So everything seems to work and there are no other warnings or errors in the logs concerning this issue. But I am new to spark and I am afraid of loss of data. Please could you give any advise if I could ignore this issue or what I could do to force Spark to use ports in my released code range for the executors?

I searched the docs and the web but there are no infos how to fix this. I analyzed the Spark code for the classes which are cited in the above log entries (CourseGrainedSchedulerBackend:DriverEndpoint.receiveAndReply, MapOutputTrackerMasterEndpoint.receiveAndReply), but could not figure out from where Spark gets the arbitrary port number.

0 Answers0