Monitoring resource usage on a shared resource HPC using Python

Question

I have been trying to write a script that monitors the usage of resources on jobs that I run on shared resource (i.e, nodes can be shared by many users based on the requested cpus and memory) HPC clusters. The motivation behind doing so is that it would help me make decisions such as if I should split the jobs into small jobs, am I requesting the appropriate number of CPUs and memory etc. As of now, I have managed to do it using psutil and below is my script

#!/usr/bin/env python
from __future__ import division, print_function
import psutil
from argparse import ArgumentParser
import time

def getUsage():
    cpu = psutil.cpu_percent()
    memory = psutil.virtual_memory().percent
    return cpu, memory, psutil.disk_io_counters().read_bytes, psutil.disk_io_counters().write_bytes

def main(args):

    the_file = open(args.name + ".txt", 'a')
    the_file.write("cpu,memory,read,write\n")
    while True:
        cpu, memory, read, write = getUsage()
        the_file.write("{},{},{},{}\n".format(cpu, memory,read,write))
        time.sleep(args.sleepTime*60)


if __name__ == "__main__":
    parser = ArgumentParser("monitor", description="monitors cpu and memory usage")
    parser.add_argument("--name", required=True, help="name of the file that contains the usage info")
    parser.add_argument("--sleepTime", default=1, type=float, help="the sleep time between every entry in minutes")
    args = parser.parse_args()

    main(args)

and an example output plot using the data from the script is shown below

However, my problem is that the plot is showing the node usage as a whole an just of the resources I have requested on the shared resource cluster, and I was wondering if anyone knows a way around this?

Monitoring resource usage on a shared resource HPC using Python

0 Answers0