Questions tagged [hpc]

High Performance Computing encompasses using "supercomputers" with high numbers of CPUs, large parallel storage systems and advanced networks to perform time-consuming calculations. Parallel algorithms and parallelization of storage are essential to this field, as well as issues with complex, fast networking fabrics such as Infiniband.

High Performance Computing(HPC) encompasses many aspects of traditional computing and is utilized by a variety of fields including but not limited to particle physics, computer animation/CGI for major films, cancer/genomics research and modeling the climate. HPC systems, sometimes called 'supercomputers' are typically large numbers of high-performance servers with large numbers of CPUs and cores, interconnected by a high speed fabric or network.

A list of the top500 fastest computers on the planet is maintained as well as a list of the 500 most energy efficient computers. The performance of these systems is measured using the LINPACK benchmark, though a new benchmark using a conjugate gradient method, which is more representative of modern HPC workloads. IBM, Cray and SGI are major manufacturers of HPC systems and software though it should be noted that over 70% of the systems on the top500 list are based on Intel platforms.

Interconnect fabric technology is also crucial to HPC systems, many of which rely on internal high-speed networks made up of Infiniband or similar low-latency high-bandwidth networks. In addition to interconnect technology, GPU and coprocessors have recently been gaining in popularity for their ability to accelerate certain types of workloads.

Software is an additional concern for HPC systems as typical programs are not equipped to run on such a large scale. Many hardware manufacturers also produce their own software stacks for HPC systems which include compilers, drivers, parallelization and math libraries, system management interfaces and profiling tools specifically designed to work with the hardware they produce.

Most HPC systems use a highly modified linux kernel that is stripped down to only the essential components required to run the software on supplied hardware. Many modern HPC systems are setup in a 'stateless' manner, which means that no OS data is stored locally on compute nodes and an OS image is loaded into RAM typically over the network using PXE boot. This functionally allows the nodes to be rebooted into a clean, known-good working state. This is desirable in HPC systems as it is sometimes difficult to effectively cleanup processes that were running across several nodes in parallel cleanly.

116 questions

votes

1 answer

Is Ondemand Governor enabled in current HPC clusters?

Will enabling Ondemand GOvernor on HPC cluster help save power ? Are sleep states (C-states) enabled in HPC platforms ? If not, what is the reason behind this ?

electrical-power hpc acpi

asked Nov 05 '10 at 15:50

kashyapa

votes

2 answers

WEB based HPC cluster node management

i am working on my school diploma thesis. The main goal is to create web based application where logged users could see free and busy nodes, turn them on and off, see what process they are running etc. Figured out that i could do something like this…

linux cluster hpc

asked Apr 08 '10 at 21:00

Skuja

votes

2 answers

Windows HPC Server 2008: private network across VMs?

Windows HPC Server 2008 provides the option to automatically deploy OS images to new cluster node, using Windows Deployment Services. However, this requires the HPC cluster to be set up with a "private network" network topology. From HPC Cluster…

windows-server-2008 windows-server-2008-r2 hpc

asked Feb 23 '10 at 01:05

Max

votes

5 answers

setting up a cluster

I have 5 PC(s) connected over a LAN through a switch. I want to connect them to form a HPC cluster. The OS may be any Linux version (currently I have installed Ubuntu 8.10, 9.10 and Fedora 10) Purpose of the Cluster 1. To execute my C code developed…

linux cluster hpc

asked Feb 17 '10 at 16:58

Vaibhav

votes

0 answers

Infiniband fabric with 3 nodes - newbie

I am trying to connect 3 HP z840 workstations using: Mellanox ConnectX-3 VPI 40 / 56GbE Dual-Port QSFP Adapter MCX354A-FCBT Mellanox SX6005 12-port Non-blocking Unmanaged 56Gb/s Description of machines to be connected: oak-rd0-linux (main node…

hpc infiniband mellanox

asked Jan 24 '22 at 22:36

theenemy

votes

1 answer

How can I set up interactive-job-only or batch-job-only partition on a SLURM cluster?

I'm managing a PBS/torque HPC cluster, and now I'm setting up another cluster with SLURM. On the PBS cluster, I can set a queue to accept only interactive jobs by qmgr -c "set queue interactive_q disallowed_types = batch" and to accept only batch…

hpc slurm torque pbs

asked Jan 21 '22 at 05:44

wdg

votes

0 answers

Lustre glitch: latency of minutes

Using a HPC lustre filesystem, we occasionally experience glitchiness where even simply opening a terminal and typing "ls" can take minutes to return. That is, any process that involves the filesystem has random massive latency (but generally…

distributed-filesystems hpc lustre

asked Sep 07 '21 at 08:36

benjimin

votes

0 answers

Current single system image solutions

I'm designing a cluster for a small research institute. Since our computations require a large amount of memory, I'm looking for a solution that will allow our applications access to the whole memory distributed across different nodes. The access…

virtualization hpc numa single-system-image

asked Apr 09 '21 at 16:20

Piotr M

votes

1 answer

Considerations using consumer class (high-end) GPU in server?

Motivation: First of all, even if I have some knowledge of computer science, software development and server Linux administration, I never looked into a server hardware and I am a total "newbie" to it. Sorry if this question is trivial to most of…

hpc gpu

asked Jan 07 '21 at 22:26

Adrian Maire

votes

2 answers

Infiniband drivers : OFED or distro included?

I'm setting up a Linux cluster with infiniband network, and I'm quite a newby in infiniband wolrd, any advice is more than welcome ! We are currently using Mellanox OFED drivers, but our infiniband cards are old and not recognized by the latest…

centos infiniband hpc mellanox

asked Jan 06 '21 at 00:43

nirnaeth

votes

1 answer

SLURM with "partial" head node

I am trying to install SLURM with NFS on a small ubuntu 18.04 HPC cluster, in a typical fashion, e.g. configure controller (slurmctld) and clients (slurmd) and shared directory, etc. What I am curious about is, is there a way to set it up such that…

hpc slurm

asked Jan 02 '21 at 03:18

rage_man

votes

1 answer

HTCondor high availability

I am currently trying to make the job queue and submission mechanism of a local, isolated HTCondor cluster highly available. The cluster consists of 2 master servers (previously 1) and several compute nodes and a central storage system. DNS, LDAP…

hpc condor

asked Sep 30 '20 at 16:28

Christian Hennen

vote

1 answer

ifconfig apparently showing wrong RX/TX values for InfiniBand HCA

Recently, I executed a watch -n 1 ipconfig on one of our Linux cluster computing nodes while it was running a 48-process MPI run, disributed over several nodes. Oddly, while Ethernet packets seem to be counted correctly (a few kb/s due to the SSH…

linux hpc infiniband

asked Mar 31 '17 at 14:03

andreee

vote

2 answers

Containers for HPC batch processing

We are facing the problem that a lot of people want to run different scientific software on our high performance computing cluster. Every user requires a different set of libraries and library versions and we do not want the administrator to deal…

containers hpc batch-processing

asked Feb 13 '17 at 14:30

J. Doe

vote

1 answer

Slurm: Have two separate queues for GPU and CPU-only jobs

At the moment, we have set up Slurm to manage a small cluster of six nodes with four GPUs each. That has been working great so far, but now we want to utilize the Intel Core i7-5820K CPUs for jobs which only require CPU processing power. Each CPU…

ubuntu cluster hpc job-scheduler job-control

asked May 19 '16 at 15:56

Micha

Prev 1 2

4 5 6 7 8 Next