1

The Spark web UI displays great information about the total and active number of cores and tasks. How can I get this information programmatically in Java Spark so that I can display job progress to end users?

I did read about the "append /json/" trick to extract JSON versions of web UI pages from the master, and I can get the total number of cores that way...

But all the information about active cores and tasks seems to be in the driver UI pages. I tried the "/json/" trick on the driver UI pages and it just redirects me back to the HTML pages.

DanJ
  • 1,654
  • 1
  • 16
  • 23
  • 1
    On vanilla spark/ EMR, which does not have auto-scaling the number of cores will be constant through the life cycle of the application. You can get the active Spark jobs, stages and tasks through `SparkStatusTracker`: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkStatusTracker – Sai Apr 17 '19 at 16:32
  • 1
    Ah thank you Sai, SparkStatusTracker looks like exactly what I need. We also found the REST API for driver JVMs -- once we figure out how to cure the exceptions that is generating, that looks like another way to do it. – DanJ Apr 18 '19 at 16:04

1 Answers1

1

Looks like we have discovered two different ways to reveal this information:

1) Retrieve the SparkStatusTracker from the SparkContext (thank you Sai):

JavaSparkContext javaSparkContext = ...;
JavaSparkStatusTracker javaSparkStatusTracker = javaSparkContext.statusTracker();
for (int stageId : javaSparkStatusTracker.getActiveStageIds()) {
  SparkStageInfo sparkStageInfo = javaSparkStatusTracker.getStageInfo(stageId);
  int numTasks = sparkStageInfo.numTasks();
  int numActiveTasks = sparkStageInfo.numActiveTasks();
  int numFailedTasks = sparkStageInfo.numFailedTasks();
  int numCompletedTasks = sparkStageInfo.numCompletedTasks();
  ...
}

2) Consult the REST API available from the driver JVM:

https://spark.apache.org/docs/latest/monitoring.html#rest-api

DanJ
  • 1,654
  • 1
  • 16
  • 23