i am working on my school diploma thesis. The main goal is to create web based application where logged users could see free and busy nodes, turn them on and off, see what process they are running etc. Figured out that i could do something like this - write some cron daemon that would run every 30seconds or so, and it could run ping utility for each node to find out if it is on or off, then write results to some file. Then from my web app (i will write in PHP) i could read the info. Will it be a good solution? How would you suggest me to do it? And finally, is there any existing solutions (it may not be a definetly ewb based) for managment of cluster nodes?
2 Answers
In the past I have used Ganglia for node availability and load monitoring. It won't tell you what jobs are running but it will show the health of your cluster.
Nagios is something else that I have used with my clusters however, it is a bit more than just cluster monitoring. It can monitor processes, disk space, memory and anything you can script or find a script for. This is also web based.
As for job schedulers there are a few options depending how you would like to configure things. Options would include but are not limited to: OpenPBS, TORQUE, PBSPro, Maui Cluster Scheduler, SLURM, Sun Grid Engine. These are all ones that I am aware that centers are currently using for HPC scheduling. Wikipedia has a list but I don't believe all that are listed are for HPC scheduling. http://en.wikipedia.org/wiki/Job_scheduler
Sites:
Ganglia http://ganglia.sourceforge.net/
Nagios http://www.nagios.org/

- 336
- 2
- 15
Check out DRMAA. It's a general API for job submission and control that's becoming the standard between workload managers. As far as controlling the nodes themselves, that's highly dependant of the system you are using. Most have some sort of API you can use to interface with them and perform the same operations as you would with command-line tools.
Your project does sound interesting, I wish you luck.

- 12,184
- 7
- 48
- 69