I've got a number of servers used for HPC / cluster computing and I noticed that, given the fact that part of the computations they run use huge files over NFS, this causes significant bottlenecks. I'm wondering how to address the issue.
The setup:
- 34 servers with Debian Squeeze (42 Gb RAM each)
- 12 physical cores per machine + HT
- 2 "head" machines (head1 and head2) with 500 Gb drives each
- 32 "slave" machines which do PXE boot from head1
- head1 exports the NFS file system for the 32 PXE servers
- head2 exports a "data" directory via NFS which contains data files for all the other machines
- the "data" directory contains very large files (5+ Gb)
- connectivity between the machines: Gigabit Ethernet
- most machines are not in the same physical rack
- Uses the Open Grid Scheduler (aka Grid Engine) for batch job processing
One of the computations that this cluster runs involves, for each of the "slaves", reading a very large sets of files (3Gb + 3Gb + 1.5 Gb + 750M) before starting the various calculations. I've noticed that when this happens, most of the slaves are actually spending significant time (several minutes) when reading these (while the actual computation is much faster).
Currently, I've raised the number of threads in NFS daemon of head2 and put rsize
and wsize
to 32k in the slave mount options, but still it's a significant bottleneck.
What can I do to improve performance, or should I let the slaves host these files on their hard disks? Or should I go with an entirely different FS for storage?