4

We observe this strange behavior with some jobs on the cluster running torque pbs and maui: some jobs are switching between (R)unning and (Q)ueued state. Tried google'ing around and didn't find any hints. What could be the reason? Of note, that jobs are different in their nature: some are using TensorFlow and python, others are C++ executables..

MadH
  • 1,498
  • 4
  • 21
  • 29
  • Any recent system changes? Versions? – clusterdude Jul 01 '17 at 05:17
  • @clusterdude no changes. I'm new to maintenance of these things and didn't change anything yet. And the person who used to maintain these things has left the company :( – MadH Jul 03 '17 at 09:52

1 Answers1

0

Not enough here to say, but I'd guess they're not really running. The pbs_mom logs and syslogs should give clues.

clusterdude
  • 616
  • 3
  • 16