-1

We are using Son of Grid Engine 8.1.8 on a grid of Debian 7.8 machines installed from deb packages. This has been working well until today, when a user submitted a processing stream, but all parts end up stuck in the queue. The only running jobs are a couple of QRLOGIN jobs. There are plenty of cores, but still the jobs are pending with status qw or hqw. qstat -explain does not provide information I find useful, hence my post here.

Any advice on how to diagnose and fix this issue would be appreciated.

1 Answers1

0

The problem was due to gridengine's misperception that the CPUs were railed. Solved by bouncing the nodes.