I need a mechanism to be able to automatically known the current Spark cluster's status in my code in order to decide how many resource should my code request.
I saw this: Spark: get number of cluster cores programmatically
But:
Their answer is wrong,
java.lang.Runtime.getRuntime.availableProcessors
tell you how many cores are there on the Physical machine, however you can start a Spark worker that has # cores of worker < # actual number of cores on the machine. This is even a common practice in Kubernetes.There are no way to extract the memory. Again, you can't call
java.lang.Runtime
because it again only show information regarding to the host. And it is a even more common practice to make the size mismatch since all those issue you get in Java when memory goes bigger than 30GB.
So is there a way that I can pragmatically extract the exact information like they are in localhost:8080?
I can not move to YARN in the meantime, I know this would solve the problem but it is too complicated.
I know I can parse the result from the URL but that's too complicated.
Some attempts that didn't work:
sparkContext.executorMemory()
this only returns the memory you requested.sparkContext.defaultParallelism()
this returns you correctly the total number of cores, but not the number of cores on one worker.