Spark Standalone Mode: Is there a way to programmatically get cores/memory information for each worker from Spark's localhost:8080

Question

I need a mechanism to be able to automatically known the current Spark cluster's status in my code in order to decide how many resource should my code request.

I saw this: Spark: get number of cluster cores programmatically

But:

Their answer is wrong, java.lang.Runtime.getRuntime.availableProcessors tell you how many cores are there on the Physical machine, however you can start a Spark worker that has # cores of worker < # actual number of cores on the machine. This is even a common practice in Kubernetes.
There are no way to extract the memory. Again, you can't call java.lang.Runtime because it again only show information regarding to the host. And it is a even more common practice to make the size mismatch since all those issue you get in Java when memory goes bigger than 30GB.

So is there a way that I can pragmatically extract the exact information like they are in localhost:8080?

I can not move to YARN in the meantime, I know this would solve the problem but it is too complicated.

I know I can parse the result from the URL but that's too complicated.

Some attempts that didn't work:

sparkContext.executorMemory() this only returns the memory you requested.
sparkContext.defaultParallelism() this returns you correctly the total number of cores, but not the number of cores on one worker.

Tamaki Sakura · Accepted Answer · 2022-01-14T03:32:13.450

Apparently, if you can assume all your worker are the same - which is a significantly better assumption than all the above ones but can still be wrong some times - you can use sparkContext.defaultParallelism() / (sparkContext.statusTracker().getExecutorInfos().length - 1) to get the number of cores.

To get the memory, the easiest way is to hack based on environmental variable:

System.getenv("SPARK_WORKER_MEMORY")

and just run it remotely on Spark Worker

Spark Standalone Mode: Is there a way to programmatically get cores/memory information for each worker from Spark's localhost:8080

1 Answers1