-2

I have set up a small cluster with 1 head node and 3 compute nodes. My client machine is a Windows 2016 Server which I use to submit Workbook offloading jobs. My problem - the HPC is extremely slow; if I run the job on my local machine, it runs faster than on the HPC cluster ... about 10 times faster! The configuration of my nodes is as follows:

Headnode: 2vCPU and RAM 8GB

Compute nodes: 1vCPU and RAM 4GB each

I have a suspicious the issue could be with the communication between the nodes and the network. Or something entirely different. Can someone please help?

Thanks in advance!

Community
  • 1
  • 1
KMLN
  • 79
  • 2
  • 3
  • 14

1 Answers1

0

From my HPC work: I have experienced a lot of performance drop because of the interconnect (network switch) within my clusters. It could be that your interconnects are not fast enough to truly take advantage of the hardware. Being that the data has to get out to the other nodes, having a slow interconnect will hinder your speed/performance. Currently, most HPC systems have some sort of specialized network interconnect with extremely fast speeds (usually infiniband) which allows nodes to quickly send data to each other. I would recommend that you check your network switch and make sure it isn't anything below 1 Gigabit Ethernet speeds.

Here is a link to one of my publications: https://www.raspberrypi.org/magpi/benchmarking-raspberry-pi-cluster/

Towards the end, you can see how low ethernet bandwidth hinders the performance of my cluster.

rmdcoding
  • 31
  • 3