0

I am trying to collect some statistics and performance metrics about the cloud tasks and jobs that I am submitting to the Azure Batch pool. For that, I am using the built-in TaskExecutionInformation and TaskStatisitics classes, but I am confused about how some metrics are calculated through these classes. Specifically, I want to know how long each of my tasks takes to execute, so I checked the wallclocktime built into task.statistics, and the difference between the start and end times built into task.ExecutionInformation and these two were different.

  1. How is wall clock time calculated in task statistics? And why is it different from the timespan difference between the start and end times obtained through the task execution information?

  2. Also, I noticed a large variance of the wall clock time of task execution (from 0.6 sec to 21 sec in my scenario) for the same task processing. What could be causing such a large variance?

Thanks!

J.B
  • 137
  • 2
  • 16
  • Do you have any resource files tied to the tasks? How big is the delta you observed? – fpark Jul 07 '17 at 17:29
  • @fpark I do not have resource files tied to the tasks, but in the command line that I run on the compute nodes, I pass a blob Uri as an argument (my task processing for testing is just pulling that blob and saving it to local disk of the compute node). There was a large variance for the delta as well, but it was very different from the wall clock time. For example, for a 1 second wallclocktime, I have a 8-9 seconds delta. – J.B Jul 07 '17 at 17:42
  • For #2, by wall clock time do you mean the executionInformation time or the wallclock time from statistics? – fpark Jul 07 '17 at 18:19
  • I mean the wallclock time from statistics. What I am getting from the executionInformation is the start time and end time of task execution. – J.B Jul 07 '17 at 18:43

1 Answers1

0
  1. Wallclock time from statistics is the difference between the process end time and its creation time. These times are from the process created as specified by the task's commandline. The time difference that you can compute from executionInformation includes more than just the process execution time. This is the time from when the task is picked up by the node to when the task completes on the node with metadata and state updates that the Batch service requires. This time could include things like downloading resource files.
  2. Since you are downloading the blob as part of your process itself, the variance can come from retrieving this data from Azure Storage (or any other noise source that your process involves). If you are able to move the blob as a resource file and your process does not contain any other variable execution portions, you should see more consistent wall clock times between verbatim tasks (as reported by the task statistics wall clock time).
fpark
  • 2,304
  • 2
  • 14
  • 21
  • Thanks for your answer. It's very helpful! I have a follow-up note for 2. So, I checked the end-to-end latency in the logs for the blob storage showing the time it takes to read a blob storage request and respond to it, and this time is pretty much the same with milliseconds difference across all the tasks, so it doesn't seem from these logs that the Azure storage could be causing the variance in the wallclock time. – J.B Jul 08 '17 at 00:40
  • I tried what you suggested in #2, moving my blobs as resource files then doing processing on them. While the variance wasn't as big as before (from 0.6 to 21s), it was still relatively significant varying between 0 and 1.09s while I was expecting a difference of 0.2s between my different runs/tasks. Considering that I don't have any variable execution portions, what could possibly be the problem? Or is this normal in Azure Batch? – J.B Jul 11 '17 at 00:29