I'm designing a head node whose primary function is to submit jobs to the Torque/Maui scheduler and secondary function is to run test jobs. Unfortunately, most hardware selection guides for clusters were written in 2000~2004 and are mostly irrelevant nowadays. I've been able decide most parts of the hardware configuration easily (e.g. NICs based on interconnect) but I don't understand how to choose the HDD/memory/processors.
HDDs: Since I'm using network storage, am I correct that the size/type (SSD vs spindle) of HDD hardly matters, since these only need to meet the requirements of a typical boot drive?
Memory: Assuming the test jobs are not memory-intensive, is there any performance advantage from having a large amount of memory on the head node? Job scheduling doesn't seem memory-intensive. If not, what's a rule of thumb to use to decide how much memory I need?
Processor: Taking the test jobs out of the equation, are there any advantages for having more cores or higher clock frequencies on the processor? I'd imagine that that job scheduling is not computationally-intensive and hardly benefits from a faster processor or parallelism.
Redundancy: How do you avoid the head nodes from being a SPOF? By having 2 or more head nodes? Do I leave the redundant head nodes completely passive (unused) - otherwise I imagine it will be extremely messy trying to recover from a dead head node? Is heterogeneity (different hardware specs) acceptable across head nodes? Is there any need for RAID mirroring of the boot drives on the head nodes?