1

I have run a slurm job array (9714509) and it failed with a Mixed, ExitCode [0-1]. Using the code from here, I can see that only one job failed:

$ sacct -n -X -j 9714509 -o state%20 | sort | uniq -c
     25            COMPLETED 
      1               FAILED 

Is there a way to get which task number has failed, because checking the individual log files will take too long.

justinian482
  • 845
  • 2
  • 10
  • 18

1 Answers1

1

Remove the | uniq -c part, which is the command that does the counting and replace it with a fileter on FAILED. Also add the job id in the output of sacct:

sacct -n -X -j 9714509 -o state%20,jobid%20 | grep FAILED

should output what you need.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110