Questions tagged [hpc]

High Performance Computing encompasses using "supercomputers" with high numbers of CPUs, large parallel storage systems and advanced networks to perform time-consuming calculations. Parallel algorithms and parallelization of storage are essential to this field, as well as issues with complex, fast networking fabrics such as Infiniband.

High Performance Computing(HPC) encompasses many aspects of traditional computing and is utilized by a variety of fields including but not limited to particle physics, computer animation/CGI for major films, cancer/genomics research and modeling the climate. HPC systems, sometimes called 'supercomputers' are typically large numbers of high-performance servers with large numbers of CPUs and cores, interconnected by a high speed fabric or network.

A list of the top500 fastest computers on the planet is maintained as well as a list of the 500 most energy efficient computers. The performance of these systems is measured using the LINPACK benchmark, though a new benchmark using a conjugate gradient method, which is more representative of modern HPC workloads. IBM, Cray and SGI are major manufacturers of HPC systems and software though it should be noted that over 70% of the systems on the top500 list are based on Intel platforms.

Interconnect fabric technology is also crucial to HPC systems, many of which rely on internal high-speed networks made up of Infiniband or similar low-latency high-bandwidth networks. In addition to interconnect technology, GPU and coprocessors have recently been gaining in popularity for their ability to accelerate certain types of workloads.

Software is an additional concern for HPC systems as typical programs are not equipped to run on such a large scale. Many hardware manufacturers also produce their own software stacks for HPC systems which include compilers, drivers, parallelization and math libraries, system management interfaces and profiling tools specifically designed to work with the hardware they produce.

Most HPC systems use a highly modified linux kernel that is stripped down to only the essential components required to run the software on supplied hardware. Many modern HPC systems are setup in a 'stateless' manner, which means that no OS data is stored locally on compute nodes and an OS image is loaded into RAM typically over the network using PXE boot. This functionally allows the nodes to be rebooted into a clean, known-good working state. This is desirable in HPC systems as it is sometimes difficult to effectively cleanup processes that were running across several nodes in parallel cleanly.

116 questions

vote

1 answer

ssh port forwarding (tunneling in HPC)

I have an application server that runs on a compute node. The server opens a port (9000) and I then run a command for tunneling between my local machine and the server: ssh -N -f -L 9000:compute-node:9000 user@myhpc Once this is done I can…

ssh ssh-tunnel hpc

asked Apr 08 '21 at 10:58

moth

vote

1 answer

HPC cluster master node as virtual machine

For a given small HPC cluster (~16 nodes) a master node is used as a front-end for users to login and interact with SLURM, and not as a computing node. The master node is currently a bare-metal server. Since the cluster is so small, the idea came up…

virtual-machines cluster hpc

asked Feb 11 '21 at 00:29

Alejandro Arcila

vote

1 answer

Wrong LDAP user ID is mapped into Slurm account management service

I configured a Slurm head node as follows: sssd to contact openLDAP slurmctld/slurmdbd/slurmd/munged to act as the Slurm controller and compute node ...where ray.williams is an LDAP user. Its UID can be mapped on the node. SSH login works…

ldap sssd hpc slurm

asked Nov 09 '20 at 17:21

Nicolas De Jay

vote

1 answer

Single-node SLURM server: restrict interactive CPU usage

I have SLURM setup on a single node, which is also a 'login node'. I would like to restrict interactive CPU usage, e.g. outside the scheduling system. I found the following article which suggests to use cgroups for this:…

linux cgroup hpc resource-management slurm

asked Mar 24 '20 at 10:58

Compizfox

votes

1 answer

What does "CPU Minutes" mean exactly?

I'm actually trying to report cluster utilization in Slurm but i don't understand the metric CPU Minutes. [root@XXXX]# sreport cluster Utilization Start=2018-12-01…

cluster cpu-usage hpc slurm

asked Jan 07 '19 at 10:39

m4hmud

votes

1 answer

Exascale Power Consumption

I have read a lot of articles about exascale and found out that it may consumes approximately 20MW power envelope. Is it a daily basis or a yearly basis or every second? Please enlighten me. Here are the papers I have…

electrical-power hpc

asked Aug 29 '18 at 13:36

alyssaeliyah

votes

1 answer

Configure Singularity to do headless rendering / use OpenGL / glxgears / glxinfo

I want to do headless rendering on a server where I do not have root permissions. Therefore, I created a Singularity container like this: Bootstrap: docker From: nvidia/cuda:9.0-runtime-ubuntu16.04 %post apt-get update && apt-get -y install \ …

hpc xorg cuda xvfb

asked Jun 07 '18 at 13:01

thigi

votes

0 answers

How to handle mpi head node failure?

There is app which starting with mpirun. If compute node fail then all processes crush, but if only head node fail(for example reboot) then processes will stuck on compute nodes. How to get rid of this zombie processes automatically?

kill hpc

asked Nov 07 '17 at 13:50

Severgun

votes

0 answers

Ideal configuration for a head node?

Which hardware should I concentrate on, when assembling a head node for an HPC cluster? The main task for the head node is to relay instructions to the compute nodes which will be running artificial intelligence algorithms. Ubuntu 14.04 LTS will be…

ubuntu networking hardware infiniband hpc

asked Jun 05 '17 at 07:13

Rushat Rai

votes

1 answer

SSH vs qlogin to use all processors of a computing node

I have a SGE cluster consisting of four computing nodes, each with 20 processors. I do not mind to give one particular user the full capabilities of one specific node, i.e. I do not mind he/she uses all the 20 processors. My question then is, should…

ubuntu ssh cluster hpc job-scheduler

asked Aug 01 '16 at 09:43

Paco el Cuqui

votes

0 answers

Deployment of Base-Node via iSCSI in Server 2012R2 HPC cluster fails (can not join domain)

We are currently evaluating Server 2012R2 with HPC Pack for an upcoming project. Sadly we are stuck at deploying the base node. The node boots via PXE (iPXE) and connects to iSCSI, installs Windows but then seems unable to join the domain. Once the…

windows-server-2012-r2 domain-controller deployment iscsi hpc

asked Jul 14 '15 at 22:08

Holly

votes

2 answers

Numerous pbs_server errors in /var/log/messages

On supercomputer's management node we receive numerous errors such as: pbs_server: LOG_ERROR::is_request, bad attempt to connect from 10.10.0.254:1023 (address not trusted - check entry in server_priv/nodes) And after them nearly every minute…

networking hpc pbs

asked Sep 26 '13 at 10:43

Dmitriy Vinokurov

votes

1 answer

Running jobs in a HPC cluster

I'm quite new to HPC environment. Is there any difference in running a job on a node utilizing 8 cores and running the same job on 8 nodes utilizing I core in terms of performance or walltime used. PS: I'm working on a project which involves…

linux bash cluster shell hpc

asked Aug 29 '13 at 23:52

Ashwin

votes

0 answers

Microsoft HPC: mixing windows and linux blades

I have a working Windows HPC cluster with 32 blades, all of them are using Windows HPC. My question is: can I install Linux on 16 blades and keep the other 16 on Windows? Is there a specific version of Linux that I can use? update What would I like…

linux windows hpc

asked Apr 17 '13 at 07:09

Delta

votes

2 answers

Running ScaleMP on top of OpenStack

Looking for a feedback if anyone has already played with running ScaleMP linux appliances in OpenStack (KVM)? A short description of the setup (w/ or w/o InfiniBand, total amount of RAM, etc) and its performance for matrix vector multiplication…

openstack hpc

asked Mar 04 '13 at 14:58

Yauhen Yakimovich

Prev 1 2 3 4 5

7 8 Next