I recently started working with slurm and came up with question regarding submitting a job.
I have submitted sbatch file via sbatch myfile.sbatch
command but the job doesn't start running where it keeps showing "pending, reason: resources" even though resources are available (available nodes in GPU). I have also appended part of the status of my job as follows via scontrol show job my-job-ID
:
JobState=PENDING Reason=Resources Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=50-00:00:00 TimeMin=N/A
SubmitTime=2023-06-21T22:01:44 EligibleTime=2023-06-21T22:01:44
AccrueTime=2023-06-21T22:01:44
StartTime=2024-06-19T17:12:44 EndTime=2024-08-08T17:12:44 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2023-06-21T22:10:18
Partition=gpu-geforce
I found that the start time is 1 year after the submission date, even if it seems like it is submitted well. Is it because GPUs are already allocated with other jobs? I have checked via squeue
command but wasn't able to find running jobs.
I would be grateful if you could let me know which part might be causing error or any configurations to modify to resolve this issue.
Thank you!
I tried scontrol show job
to check the job status but it is pending (reason: resource), while the start date looks it requires so much time to start running.
I also looked if there are any missing jobs running using squeue
but wasn't able to find them.