Supercomputing: smaller number of nodes and more cpus/node vs. larger number of nodes and less cpus per node

Question

On a supercomputer, you have a set of nodes, and for each nodes you have some amount of CPUs. Is it generally better if to use, say, 20 CPUS for 1 node, as opposed to 2 nodes with 10 CPUs each? In both cases, there are 20 CPUs total.

Is the communication time between CPUs on a node a lot faster than CPUs across 2 different nodes?

At the end, it depends on your requirements. But intra-node communication is always faster than inter-node communication. — Poshi, May 21 '20 at 09:46
@Poshi I don't have any requirements per-se. It was more of a general question just wondering if I should ever use 10CPUs each for 2 nodes over 20CPUs for 1 node (assuming the nodes are capable of up to 20 CPUs)? — 24n8, May 21 '20 at 17:24
As a general question, you got a general answer. Anyways, you should never use any number of CPUs unless you have something to do. And when you have something to do, yo have the requirements specific for that calculus. There are memory needs? Disk space needs? Network needs? There is communications between threads? Is it frequent? Is hyperthreading activated? If so, are you reusing functional units? A good answer for this question have to be tied to a specific problem. — Poshi, May 21 '20 at 19:23

score 1 · Answer 1 · answered Jun 26 '20 at 12:05

As a general rule of thumb, it is better to use 20 cpus in 1 node since intra-node communication is faster than inter-node communication.

This generally depends upon the problem definition. If you want to use a shared memory programming model (create threads/tasks etc), then 1 node with 20 cpus will be better. You can take advantage of shared memory, caching, less communication overheads. But if your application requires both shared and distributed memory (processes spread among nodes), then using multiple nodes may be beneficial.

But if your problem (shared/distributed) only requires resources of a single node to solve it, then as a generic rule don't take extra nodes, because you don't get any benefit from it. Even if your application uses distributed memory paradigm, use single node because the intra-node communication is very fast and optimised.

As @Poshi's comment, more concrete answer is problem specific. It requires understanding the problem and profiling the application to come up with a specific solution.

Supercomputing: smaller number of nodes and more cpus/node vs. larger number of nodes and less cpus per node

1 Answers1