0

I recently started working with slurm and came up with question regarding submitting a job.

I have submitted sbatch file via sbatch myfile.sbatch command but the job doesn't start running where it keeps showing "pending, reason: resources" even though resources are available (available nodes in GPU). I have also appended part of the status of my job as follows via scontrol show job my-job-ID:

JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=50-00:00:00 TimeMin=N/A
   SubmitTime=2023-06-21T22:01:44 EligibleTime=2023-06-21T22:01:44
   AccrueTime=2023-06-21T22:01:44
   StartTime=2024-06-19T17:12:44 EndTime=2024-08-08T17:12:44 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2023-06-21T22:10:18
   Partition=gpu-geforce

I found that the start time is 1 year after the submission date, even if it seems like it is submitted well. Is it because GPUs are already allocated with other jobs? I have checked via squeue command but wasn't able to find running jobs.

I would be grateful if you could let me know which part might be causing error or any configurations to modify to resolve this issue.

Thank you!

I tried scontrol show job to check the job status but it is pending (reason: resource), while the start date looks it requires so much time to start running.

I also looked if there are any missing jobs running using squeue but wasn't able to find them.

0 Answers0