3

I am trying to figure out how best to monitor usage of our HPC resources. Specifically, I am trying to identify cpu usage, disk space consumed, and number of jobs run by group.

The pbs format allows the "-W" group_list flag to identify the group the script belongs to. I want to use this to monitor the cluster usage, but I can't find documentation on how to track this over time.

gmond and gmetric offer some functionality - I can see the parameters I'm interested in, but I can't figure out how to group these by the -W group_list flag or by user or some other metric.

Any advice?

  • What scheduler are you using? Moab or Maui? – Vince Apr 19 '17 at 14:24
  • Torque. One of the other free versions of MOAB. There is a file in /var/spool/torque that shows some resource usage, but it needs to be parsed and summarized. – nietzschemouse Apr 20 '17 at 15:11
  • Free Moab? You mean Maui? – Vince Apr 20 '17 at 15:15
  • Is there a reason you don't want to parse the accounting file? Are you looking for a turnkey solution? There is a reporting framework available for purchase that works with Moab. I guess I'm just not exactly sure what kind of answer you're seeking. – dbeer Apr 20 '17 at 19:05
  • The turnkey solution would be ideal. I can parse out the accounting file and then summarize it with a script, but I'm checking to see if that's more effort than necessary. I'm looking into purchasing MOAB or Univa for this reason. – nietzschemouse Apr 21 '17 at 15:51
  • In case you are using Maui (free alternative to Moab), have you looked at: http://docs.adaptivecomputing.com/maui/9.2accounting.php. Specifically, section 9.2.3: Profiling Historical Usage. – Vince Apr 26 '17 at 14:09
  • Thanks Vince. I'm using Torque, though, so that's not quite applicable. – nietzschemouse May 03 '17 at 15:34
  • Torque does have accounting files: http://docs.adaptivecomputing.com/torque/4-1-7/Content/topics/9-accounting/accountingRecords.htm. I did not find any way to monitor usage. You may need to write your own service... – Vince May 09 '17 at 15:17
  • Closest match I found was: https://github.com/tabaer/pbstools. – Vince May 09 '17 at 15:24

0 Answers0