0

I want to get some statistics about the job I'm running on my pool, and for that I am trying to use the JobStatistics class, but I have been getting job.Statistics as null in most of my runs except for few where the result was magically not null. I read in a documentation (https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.batch.cloudjob.statistics?view=azurebatch-6.1.0#Microsoft_Azure_Batch_CloudJob_Statistics) that for the statistics results not to be null, I need to use an expand clause with DetailLevel, but each time I do, I get the error: "operation returned an invalid status code 'badrequest' ". This is what I have for that.

ODATADetailLevel detailExJob = new ODATADetailLevel();
        detailExJob.SelectClause = "id,executionInfo,stats";
        detailExJob.ExpandClause = "id,executionInfo,stats";
        await job.RefreshAsync(detailExJob);

What am I missing here? How can I get job.Statistics not to be null?

Thanks!

J.B
  • 137
  • 2
  • 16

1 Answers1

1

I'll try to answer your question, but it looks like you have two separate issues.

  1. Job lifetime statistics may not be immediately available. The Batch service performs periodic roll-up of statistics. I believe the typical delay is about 30minutes, but this is not documented.
  2. The expand clause currently only supports stats. If you modify your detailExJob.ExpandClause statement to be assigned just "stats", then your job query should work. Moreover, you can simplify your detail level object to omit the expand clause altogether since you specified stats in the select clause.
fpark
  • 2,304
  • 2
  • 14
  • 21
  • Thanks for your answer! Since job lifetime statistics take 30 minutes to refresh, would you say it's not the best thing to use if I want to get information about the job execution time right after the job completes? In that case, what would you recommend, should I just timestamp when my job starts and when it ends? – J.B Jul 14 '17 at 02:31
  • If your job completes (as in it transitions to completed state after all your tasks finish, you will need to make this action perform automatically), then you can just get the [job execution information](https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.batch.jobexecutioninformation?view=azurebatch-7.0.0) and take the difference between the start and end times. – fpark Jul 14 '17 at 14:48
  • Would the difference between the start and end times provided by the job execution information be the same as the wallclock time or does it include an overhead? – J.B Jul 14 '17 at 15:10
  • It includes overhead, similar to task execution information. For wallclock measurements at the job level, you'll need to sum it yourself (enumerating all tasks in the job and expanding task stats to get wallclock time) or use the job lifetime stats which has the roll-up delay. – fpark Jul 14 '17 at 15:21
  • Oh ok! But for wall clock measurements at the job level, I can't just sum the wallclock time of all the tasks because they run in parallel across the different compute nodes, right? – J.B Jul 14 '17 at 15:39
  • Correct, if you're looking for absolute job execution time, the information provided by execution information is your best bet if you need something from the system (even the job lifetime stats won't help you here as it's an aggregate, not a min/max difference). Otherwise, you'll need to implement something yourself in the program being executed. – fpark Jul 14 '17 at 16:15
  • So I guess that the time difference between start and end times provided by execution information would include the overhead of downloading the necessary files to the compute node and not only how long it took for the tasks to complete after all necessary files were downloaded. But if you want to know only how long all the tasks took to complete, I'd need to use some timestamp for my tasks and jobs to mark that. Am I understanding this correctly? – J.B Jul 14 '17 at 16:41
  • Correct (for the first part of your comment). I'm not sure you can decouple the overhead from just the process wallclock time in a meaningful fashion for all tasks running in parallel to determine parallel first/last walltime. Not only do you have time sync issues (time drift) if you're timestamping, but each task will have varying non-deterministic overhead that will offset the process wallclock time for each task. – fpark Jul 14 '17 at 17:12
  • Oh ok! That makes sense! Thanks a lot! – J.B Jul 14 '17 at 20:20