Questions tagged [hpc]

High Performance Computing encompasses using "supercomputers" with high numbers of CPUs, large parallel storage systems and advanced networks to perform time-consuming calculations. Parallel algorithms and parallelization of storage are essential to this field, as well as issues with complex, fast networking fabrics such as Infiniband.

High Performance Computing(HPC) encompasses many aspects of traditional computing and is utilized by a variety of fields including but not limited to particle physics, computer animation/CGI for major films, cancer/genomics research and modeling the climate. HPC systems, sometimes called 'supercomputers' are typically large numbers of high-performance servers with large numbers of CPUs and cores, interconnected by a high speed fabric or network.

A list of the top500 fastest computers on the planet is maintained as well as a list of the 500 most energy efficient computers. The performance of these systems is measured using the LINPACK benchmark, though a new benchmark using a conjugate gradient method, which is more representative of modern HPC workloads. IBM, Cray and SGI are major manufacturers of HPC systems and software though it should be noted that over 70% of the systems on the top500 list are based on Intel platforms.

Interconnect fabric technology is also crucial to HPC systems, many of which rely on internal high-speed networks made up of Infiniband or similar low-latency high-bandwidth networks. In addition to interconnect technology, GPU and coprocessors have recently been gaining in popularity for their ability to accelerate certain types of workloads.

Software is an additional concern for HPC systems as typical programs are not equipped to run on such a large scale. Many hardware manufacturers also produce their own software stacks for HPC systems which include compilers, drivers, parallelization and math libraries, system management interfaces and profiling tools specifically designed to work with the hardware they produce.

Most HPC systems use a highly modified linux kernel that is stripped down to only the essential components required to run the software on supplied hardware. Many modern HPC systems are setup in a 'stateless' manner, which means that no OS data is stored locally on compute nodes and an OS image is loaded into RAM typically over the network using PXE boot. This functionally allows the nodes to be rebooted into a clean, known-good working state. This is desirable in HPC systems as it is sometimes difficult to effectively cleanup processes that were running across several nodes in parallel cleanly.

116 questions
0
votes
1 answer

OpenLDAP implementation allows only root user to set passwords of accounts

I'm working application that requires the use of AWS ParallelCluster assets for some high performance processing. After the initial setup, we need to be able to add/remove user accounts and I am trying to set that up according to these instructions…
0
votes
0 answers

What can be a reason for different clock speeds between sockets on 2 x Xeon Scalable 6148?

I have server with dual Xeon Scalable 6148 CPUs running HPC application. Base clock: 2.4GHz All core Turbo: 3.1 GHz Some processing threads are not scaling well and are sensitive to cpu clock. I was playing little with setting affinity and…
terion
  • 1
0
votes
1 answer

Ubuntu server vs Ubuntu desktop for Beowulf cluster

I want to create a beowulf cluster using Ubuntu 18. Looking at some guides they all seem to use ubuntu server for this an my question is why? Is it not possible to use ubuntu desktop for the client nodes or is it more for a performance purpose? The…
-1
votes
1 answer

How is it that Summit at Oak Ridge National Lab has 2,414,592 cores?

Top500 says that Summit has 2,414,592 cores: https://www.top500.org/system/179397. But they have 4608 nodes, 9216 chips (each node has 2 chips), and 22 cores per chip. This is 202,752 cores. Where exactly does the number 2,414,592 come…
user1271772
  • 101
  • 4
-1
votes
2 answers

What is the OSPM Power management feature in current HPC clusters?

What does the operating system do in ordder to manage power in current HPC clusters ? What are the functionalities embedded in current HPC clusters in order to save power
kashyapa
  • 337
  • 4
  • 17
-1
votes
1 answer

Speeding up SAS data rate from 12Gbps

I'm curious about SAS data transfer speed. Maximum is 12Gbps in the whole bus (not per drive) as far as I understand, but I have a scenario where I would like to have a faster data rate (hopefully around 40 to 80 gbps), stored into RAID-10 (thinking…
zRISC
  • 13
  • 2
-1
votes
1 answer

Using HPC managers like Slurm on multiple servers in LAN

I have access to a group of servers connected with a 1Gb LAN, and each of them has 40+ cores and Ubuntu OS. They all have a common NAS. I installed SLURM on a few of them and configured it so that each server is both a control and a compute node,…
-2
votes
2 answers

memory usage per user in SGE cluster

I would like to automate the estimation of monthly memory usage of all jobs performed by a given user in my cluster (SGE, ubuntu). I have seen there are many tools to compute the current memory usage for a particular user, but I want to calculate…
-3
votes
1 answer

Solutions for monetizing excess CPU cycles

My company has a big (relatively) computer farm, say, 100 physical servers (dual CPU hexacore e5 xeons with 160 Gb RAM) leased from some hardware provider (say Leaseweb or OVM) on monthly basis, means, on 1st January I pay for all 100 servers to use…
rlib
  • 195
  • 1
  • 1
  • 8
-3
votes
1 answer

Why do supercomputers take up a whole room?

Some pictures regarding the Texas Advanced Computing Center (one I am currently interested in) here: http://www.tacc.utexas.edu/resources/hpc/ If you see the two supercomputers they have, Lonestar and Ranger, they are not like the normal computers…
-5
votes
1 answer

why all hard drives in a HPC cluster must be of the same size and part number?

One of the hard drives of my HPCC is broken and I have to buy another one. I've heard it must be of the exactly same size and part number. Could anyone explain me why? Thanks
1 2 3 4 5 6 7
8