Questions tagged [hpc]

High Performance Computing (HPC) refers to the use of supercomputers and computer clusters to solve a wide range of computationally intensive problems.

Systems with benchmark performance of 100s of teraflops are usually considered to be supercomputers. A typical feature of these supercomputers is that they have a large number of computing nodes, typically in the range of O(10^3) to O(10^6)). This distinguishes them from small-to-midsize computing clusters, which usually have O(10) to O(10^2) nodes.

When writing software that aims to make effective use of these resources, a number of challenges arise that are usually not present when working on single-core systems or even small clusters:


Higher degree of parallelization required

According to the original Sep-1966 formulation of the "classical" Law of diminishing returns -aka- Amdahl's Law, the maximum speedup one can achieve using parallel computers is restricted by the fraction of serial processes in your code (i.e. parts that can not be parallelized). That means the more processors you have, the better your parallelization concept has to be. The contemporary re-formulation, not ignoring add-on costs for process-spawning overheads, parameter / results SER/DES costs and the add-on costs of communications and last but not least the facts of resources-respecting, atomicity-of-work effects in the re-formulated overhead-strict revised Amdahl's Law, the add-on costs-adjusted comparisons more closely reflect the actual net-speedup benefits of True-[PARALLEL] code-execution(s), not ignoring the respective classes of add-on costs, related to the way, how such sections become prepared & executed.


Specialized hardware and software

Most supercomputers are custom-built and use specialized components for hardware and/or software, i.e. you have to learn a lot about new types of architectures if you want to get maximum performance. Typical examples are the network hardware, the file system, or the available compilers (including compiler optimization options).


Parallel file I/O becomes a serious bottleneck

Good parallel file systems handle multiple requests in parallel rather well. However, there is a limit to it, and most file systems do not support the simultaneous access of thousands of processes. Thus reading/writing to a single file internally becomes serialized again, even if you are using parallel I/O concepts such as MPI I/O.


Debugging massively parallel applications is a pain

If you have a problem in your code that only appears when you run it with a certain number of processes, debugging can become very cumbersome, especially if you are not sure where exactly the problem arises. Examples for process number-dependent problems are domain decomposition or the establishment of communication patterns.


Load balancing and communication patterns matter (even more)

This is similar to the first point. Assume that one of your computing nodes takes a little bit longer (e.g. one millisecond) to reach a certain point where all processes have to be synchronized. If you have 101 nodes, you only waste 100 * 1 millisecond = 0.1 s of computational time. However, if you have 100,001 nodes, you already waste 100 s. If this happens repeatedly (e.g. every iteration of a big loop) and if you have a lot of iterations, using more processors soon becomes non-economical.


Last but not least, the power

Thermal ceilings and power-"capping"-strategies are another dimension in fine-tuning the arena. End-to-end performance rules. The thermal-constrained and/or power-capping limitation pose another set of parameters, that decide on how to efficiently compute HPC-workloads withing the time- and capped-electric-power-constrained physical HPC-computing infrastructure. Because of many-fold differences, the scenarios do not obey an easily comprehensible choice, mostly being the very contrary ( contra-intuitive as per what is the optimum thermal- and power-capping configuration of the HPC-workload distribution over the computing infrastructure ), repeated workloads typically adapt these settings, as experience is being gathered ( like in weather-modelling ), as no sufficiently extensive ( so as to become decisive ) prior-testing was possible.

1502 questions
-1
votes
1 answer

"jaxDecomp installation error" Run setup.py,command execution error #3

error message: CMake Error at CMakeLists.txt:5 (find_package): By not providing "FindNVHPC.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "NVHPC", but CMake did not find one. Could not find…
-1
votes
1 answer

Caused by: java.lang.UnsatisfiedLinkError: no jnind4jcuda in java.library.path

I am working with DeepLearning4j library. I am running everything on HPC and I generate a jar file to submit with spark-submit. I am using the version M1.1. Everything was fine with the CPU but when I switched to GPU, I got this error: Exception in…
-1
votes
1 answer

UserWarning: CUDA initialization:

I have installed Pytorch 1.8.1+cu102 using a virtual environment on a HPC cluster. torch.cuda.is_available() is giving me the below output UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10010). Please…
sai prasanna
  • 11
  • 1
  • 1
-1
votes
1 answer

How do I compare elements in a list without nested loops?

So I have a ginormous list of 466,550 words and I need to compare the words with each other to check the similarity between them. Similarity between two words is defined as the number of common characters between them in my case. But if I try : for…
-1
votes
1 answer

How to use multiple hostnames in single bash variable?

Working script for single host I sourced the following bash scripts inside .bashrc and is working fine with single hostname host1. I can do scp, rsync and other remote commands without any problem. But I want to use it for multiple hostnames eg.…
sai
  • 87
  • 8
-1
votes
1 answer

How can I find hpcviewer in order to visualize trace data generated by hpcrun?

My question is about hpcviewer which is a tool to visualize trace data generated by hpcrun. I succeeded to install hpctoolkit but I have a problem finding hpcviewer. To test the toolkit, I created a simple hello_world program in C (with OpenMP) and…
hakimo2
  • 143
  • 12
-1
votes
1 answer

Using version control accross production and test environments

We have two physically separated management servers for two separate computer clusters (all systems run CentOS 8) and both management servers running xcat. One is a test environment (call test) and the other is the production (call prod)…
irritable_phd_syndrome
  • 4,631
  • 3
  • 32
  • 60
-1
votes
3 answers

Undergraduate project related to High Performance Computing or similar fields

I am looking for ideas for my undergraduate project and I quite like the area of High Performance Computing , has got a lot of scope for research . Are there any ideas / already existing open source projects worth looking at ?
Yeswantth
  • 81
  • 1
  • 7
-1
votes
1 answer

Packing Arrays using MPI_Pack

I am trying to pack an array and send it from one process to another. I am doing the operation on only 2 processes. I have written the following code. #include #include #include #include "mpi.h" int main( int argc,…
Turing101
  • 347
  • 3
  • 15
-1
votes
1 answer

How does the scratch space differ from the normal disk space in the home node disk space?

I am new to HPC and I am struggling in setting up scratch space. In the cluster I am working with, I need to set-up Scratch space using the SLURM workload manager. And I am struggling with the following questions? How does the scratch space differ…
-1
votes
1 answer

Excel VBA script powered by supercomputers?

Is there a software/service/hardware available to run simple VBA scripts on a supercomputer (a remote cluster of CPUs)? I mean without installing the complex official HPC Excel extensions and spending time learning it... I mean something as easy to…
6diegodiego9
  • 503
  • 3
  • 14
-1
votes
1 answer

How to submit jobs when certain jobs has finished?

I submit jobs to a cluster (high-performance computer) using file1.sh and file2.sh. The content of file1.sh is qsub job1.sh qsub job2.sh qsub job3.sh ... qsub job999.sh qsub job1000.sh The content of file2.sh is qsub job1001.sh qsub job1002.sh qsub…
lanselibai
  • 1,203
  • 2
  • 19
  • 35
-1
votes
2 answers

Select slurm jobs based on sacct data

On a cluster using slurm I am trying to create a list of jobs that were submit in a certain time interval so that I can cancel them. By hand I can do this using: sacct --format="JobID,Submit" which will give me a list JobID's and the corresponding…
Kvothe
  • 233
  • 1
  • 8
-1
votes
2 answers

open mpi not enough slots available

I'm running a simple hello world program written in C on mpi and the problem I'm having is that i can't seem to execute 10 processes for this simple program. #include #include "mpi.h" int main(int argc, char *argv[]) { int rank;…
Maxxx
  • 3,688
  • 6
  • 28
  • 55
-1
votes
1 answer

Collecting an MPI Trace

How can I collect an MPI communication trace on Supercomputers? I need text files with details of each message (say sender, receiver, size, etc.) that I can parse. I was using following command for Intel MPI and do not see any text files. mpirun…
SummonersRift
  • 51
  • 1
  • 1
  • 7