How does SGE (Sun Grid Engine) Monitor VMEM (Virtual Memory) Usage for Jobs?

Question

SGE enables users to set limits on virtual memory/vmem usage (e.g. the h_vmem argument for a job submission).

But how exactly does SGE monitor VMEM usage and send a kill signal if it is exceeded? Does it poll at some frequency? Add up some kernel provided value across a process tree? How does this work mechanistically? Even an incomplete explanation or simple pointer to source code would be greatly appreciated.

score 1 · Answer 1 · answered Dec 16 '18 at 02:31

I am not familiar with how SGE works in detail, simply used to administer a small cluster that used it a while ago. However, what you ask reminds me of the following script I use often to report memory of a process:

https://github.com/jhclark/memusg

Basically, a command run within the qsub script is a child process of this script and/or the SGE monitor (qmon). Therefore, there is likely a method somewhere that monitors memory usage insimilar manner as the Python code linked above. The relevant section in the code is:

proc = Popen(child_command, stdin=None, stdout=None, stderr=None, env=None, shell=True)

vmpeak = -1
while proc.returncode == None:
    vmpeak = max(get_vsize(sid), vmpeak)
    log("Waiting for child to exit. vmpeak={}".format(vmpeak))
    proc.poll()
    sleep(0.1) # Time in seconds (float)

out.write("memusg: vmpeak: {} kb\n".format(vmpeak))

Where child_command is the actual command we want to run. The code starts a process using this command and monitors it at regular intervals, and in this case reporting the max memory when the process completes. It would be trivial to do change this code to break out of the loop and kill the child process if memory exceeds some maximum.

Hope this helps.

Hi Vince, thanks for your explanation. In this case I was very curious about how exactly SGE did it so I could replicate the examination of the process tree and polling interval, but appreciate the note here. — evolvedmicrobe, Dec 18 '18 at 06:04
Maybe try to search for code in the source for open grid scheduler? https://sourceforge.net/p/gridscheduler/code/HEAD/tree/trunk/source/. — Vince, Dec 18 '18 at 15:56

How does SGE (Sun Grid Engine) Monitor VMEM (Virtual Memory) Usage for Jobs?

1 Answers1