I am running multiple array jobs using slurm. For a given array job id, let's say 885881, I want to list the count of failed and completed number of jobs. Something like this:
Input:
<some-command> -j 885881
Output: Let's say we have 200 jobs in the array.
count | status
120 | failed
80 | completed
Secondly, it'd be great if I can get the unique list of reasons due to which tasks failed.
Input:
`<some-command> -j 885881`
Output:
count | reason
80 | OUT_OF_MEMORY
40 | TIMED_OUT
I believe sacct
command can be utilized to somehow get these results, but not sure how.