Questions tagged [hpc]

High Performance Computing encompasses using "supercomputers" with high numbers of CPUs, large parallel storage systems and advanced networks to perform time-consuming calculations. Parallel algorithms and parallelization of storage are essential to this field, as well as issues with complex, fast networking fabrics such as Infiniband.

High Performance Computing(HPC) encompasses many aspects of traditional computing and is utilized by a variety of fields including but not limited to particle physics, computer animation/CGI for major films, cancer/genomics research and modeling the climate. HPC systems, sometimes called 'supercomputers' are typically large numbers of high-performance servers with large numbers of CPUs and cores, interconnected by a high speed fabric or network.

A list of the top500 fastest computers on the planet is maintained as well as a list of the 500 most energy efficient computers. The performance of these systems is measured using the LINPACK benchmark, though a new benchmark using a conjugate gradient method, which is more representative of modern HPC workloads. IBM, Cray and SGI are major manufacturers of HPC systems and software though it should be noted that over 70% of the systems on the top500 list are based on Intel platforms.

Interconnect fabric technology is also crucial to HPC systems, many of which rely on internal high-speed networks made up of Infiniband or similar low-latency high-bandwidth networks. In addition to interconnect technology, GPU and coprocessors have recently been gaining in popularity for their ability to accelerate certain types of workloads.

Software is an additional concern for HPC systems as typical programs are not equipped to run on such a large scale. Many hardware manufacturers also produce their own software stacks for HPC systems which include compilers, drivers, parallelization and math libraries, system management interfaces and profiling tools specifically designed to work with the hardware they produce.

Most HPC systems use a highly modified linux kernel that is stripped down to only the essential components required to run the software on supplied hardware. Many modern HPC systems are setup in a 'stateless' manner, which means that no OS data is stored locally on compute nodes and an OS image is loaded into RAM typically over the network using PXE boot. This functionally allows the nodes to be rebooted into a clean, known-good working state. This is desirable in HPC systems as it is sometimes difficult to effectively cleanup processes that were running across several nodes in parallel cleanly.

116 questions
3
votes
1 answer

Display or install Windows product key on command line

We are currently in the process of setting up several Windows HPC clusters using Windows 2008 HPC R2. We would like to be able to perform Windows licensing commands across the clusters via the command line, i.e. display the current license status,…
ajdecon
  • 1,301
  • 4
  • 14
  • 21
3
votes
2 answers

Alternative to ScaleMP?

Anyone know of an alternative to ScaleMP? They let several x86 boxes boot as one large box. Theoretically AMD's hypertransport should allow the same thing. Any other companies or OSS projects doing this?
Brian Makin
  • 133
  • 1
  • 4
3
votes
1 answer

How to get the best LINPACK result and conquer the Top500?

Given a large Linux HPC cluster with hundreds/thousands of nodes. What are your best practices to get the best possible LINPACK benchmark (HPL) result to submit for the Top500 supercomputer list? To give you an idea what kind of answers I would…
knweiss
  • 4,015
  • 24
  • 20
3
votes
1 answer

What applications can be used in a Red Hat/CentOS cluster?

When I look at the Red Hat cluster manuals 1 2, they only explain how to install it but not what applications can use it. I am new to clusters, so I don't know these things =) Let's say I want to 3 node high performance cluster; What applications…
Sandra
  • 10,303
  • 38
  • 112
  • 165
3
votes
0 answers

Bad multicore performance on DL360 Gen 10 with 2xXeon 6154

I have some issue on the multi-core performance of some server. The server are HPE DL360 Gen10, mounting 2x Xeon Gold 6154 (18 cores). When i refer to the performances they are slower then some older counterpart on HPC computation (CFD, in…
vimax87
  • 31
  • 2
3
votes
2 answers

What is the overhead of ZFS RAIDz1/2 in HPC SSD Environment?

Example hardware / host: Modern 64 Core CPU, 128GB Memory 8 x Micron Pro 15.36TB u.2 SSDs SSDs connected by dedicated Oculink per device (no backplane or PCIe sharing) Ubuntu 20.04 Use case: A backup server for hundreds of hosts. Backup is…
epea
  • 406
  • 1
  • 9
  • 19
2
votes
1 answer

Are processors more efficient at lower temperatures

I couldn't find a stackexchange site that's more suited for this question, I apologize. This doesn't necessarily have a lot to do with servers and stability.... I do not have problems with stability and I don't have problems with overheating, but I…
xyious
  • 343
  • 3
  • 12
2
votes
1 answer

How many infiniband adapters should be used in multi socket servers?

Should dual socket motherboards have an infinity band adapter for each CPU? That is, should there be two infiniband band adapters, one in each CPU's PCIe slot. Would this eliminate the signal going through QPI or is the time for the signal to travel…
Darthtrader
  • 311
  • 1
  • 6
  • 12
2
votes
2 answers

Why configure cluster nodes to reboot when out of memory?

I have access to a research HPC cluster which is configured so that if your job tries to use more memory than the node has available the node crashes and automatically reboots. This appears to be common practice, e.g. see…
lost
  • 123
  • 3
2
votes
1 answer

Parallel Processing and Disk IO for performance. More cores or more servers?

I have a large analysis job on an AWS EC2 instance (c3.8xlarge) on Ubuntu 12.04. The objective is to load the server at 100% CPU, running as many jobs as memory allows (varying amounts but generally 1-3gb per job). My initial thought was to…
monkeymatrix
  • 167
  • 1
  • 2
  • 7
2
votes
1 answer

Building Windows server cluster hpc

I'm trying to set up a Windows Server cluster with a head node and 2x compute nodes. So far, i've managed to install : (thanks to this tutorial http://msdn.microsoft.com/en-us/library/jj884142.aspx) -Head node with Windows Server 2012 Datacenter R2…
user236173
2
votes
1 answer

HPC Cluster (SLURM): recommended ways to set up a secure and stable system

I'm working with a SLURM driven HPC Cluster, containing of 1 control node and 34 computation nodes and since the current system is not exactly very stable I'm looking for guidelines or best practices on how to build such a cluster in a way that it…
basilikum
  • 217
  • 3
  • 11
2
votes
1 answer

Microsoft HPC cluster - AD or AD LDS?

The instructions for deploying an HPC Cluster (e.g. step 1.5 on this page in TechNet) are very clear that HPC cluster nodes "must be members of an Active Directory domain". Does the Active Directory Lightweight Directory Services provide this? That…
Glen Little
  • 455
  • 3
  • 7
  • 17
2
votes
2 answers

Simulate HPC application data to test WAN filesystem performance over a large link

So here is the setup: we've got temporary access to a very large TCP WAN connection and we want to use this pipe to do WAN filesystem testing. We would like to generate massive amounts of data on the fly, writing it to the filesystem on the other…
2
votes
1 answer

Management of available file descriptors within a Hadoop cluster

I'm currently in charge of a rapidly-growing Hadoop cluster for my employer, currently built upon release 0.21.0 with CentOS as the OS for each worker and master node. I've worked through most of the standard configuration issues (load-balancing, IO…
MrGomez
  • 163
  • 6