Questions tagged [hpc]

High Performance Computing (HPC) refers to the use of supercomputers and computer clusters to solve a wide range of computationally intensive problems.

Systems with benchmark performance of 100s of teraflops are usually considered to be supercomputers. A typical feature of these supercomputers is that they have a large number of computing nodes, typically in the range of O(10^3) to O(10^6)). This distinguishes them from small-to-midsize computing clusters, which usually have O(10) to O(10^2) nodes.

When writing software that aims to make effective use of these resources, a number of challenges arise that are usually not present when working on single-core systems or even small clusters:


Higher degree of parallelization required

According to the original Sep-1966 formulation of the "classical" Law of diminishing returns -aka- Amdahl's Law, the maximum speedup one can achieve using parallel computers is restricted by the fraction of serial processes in your code (i.e. parts that can not be parallelized). That means the more processors you have, the better your parallelization concept has to be. The contemporary re-formulation, not ignoring add-on costs for process-spawning overheads, parameter / results SER/DES costs and the add-on costs of communications and last but not least the facts of resources-respecting, atomicity-of-work effects in the re-formulated overhead-strict revised Amdahl's Law, the add-on costs-adjusted comparisons more closely reflect the actual net-speedup benefits of True-[PARALLEL] code-execution(s), not ignoring the respective classes of add-on costs, related to the way, how such sections become prepared & executed.


Specialized hardware and software

Most supercomputers are custom-built and use specialized components for hardware and/or software, i.e. you have to learn a lot about new types of architectures if you want to get maximum performance. Typical examples are the network hardware, the file system, or the available compilers (including compiler optimization options).


Parallel file I/O becomes a serious bottleneck

Good parallel file systems handle multiple requests in parallel rather well. However, there is a limit to it, and most file systems do not support the simultaneous access of thousands of processes. Thus reading/writing to a single file internally becomes serialized again, even if you are using parallel I/O concepts such as MPI I/O.


Debugging massively parallel applications is a pain

If you have a problem in your code that only appears when you run it with a certain number of processes, debugging can become very cumbersome, especially if you are not sure where exactly the problem arises. Examples for process number-dependent problems are domain decomposition or the establishment of communication patterns.


Load balancing and communication patterns matter (even more)

This is similar to the first point. Assume that one of your computing nodes takes a little bit longer (e.g. one millisecond) to reach a certain point where all processes have to be synchronized. If you have 101 nodes, you only waste 100 * 1 millisecond = 0.1 s of computational time. However, if you have 100,001 nodes, you already waste 100 s. If this happens repeatedly (e.g. every iteration of a big loop) and if you have a lot of iterations, using more processors soon becomes non-economical.


Last but not least, the power

Thermal ceilings and power-"capping"-strategies are another dimension in fine-tuning the arena. End-to-end performance rules. The thermal-constrained and/or power-capping limitation pose another set of parameters, that decide on how to efficiently compute HPC-workloads withing the time- and capped-electric-power-constrained physical HPC-computing infrastructure. Because of many-fold differences, the scenarios do not obey an easily comprehensible choice, mostly being the very contrary ( contra-intuitive as per what is the optimum thermal- and power-capping configuration of the HPC-workload distribution over the computing infrastructure ), repeated workloads typically adapt these settings, as experience is being gathered ( like in weather-modelling ), as no sufficiently extensive ( so as to become decisive ) prior-testing was possible.

1502 questions
-1
votes
1 answer

How to add an additional rule for c objects in a c c++ mixed makefile?

I added some c code into a c++ code base on a windows machine; it was working on windows visual studio fine; but I'm having a hard time trying to get it to run on linux. Below file is a subdir.mk that is run with a makefile. I'm editing this file…
MadHatter
  • 321
  • 7
  • 17
-1
votes
1 answer

screen command in Unix

I have logged onto the HPC, and then used : screen -list It showed the following. > There are screens on: > 40032.pts-45.willow (16/06/17 13:59:42) (Detached) > 37414.pts-45.willow (15/06/17 15:01:30) (Detached) > …
user2669497
  • 157
  • 2
  • 11
-1
votes
1 answer

avx slower then sse multimedia extensions

I am programming a perfect program to parallelize with multimedia extensions. The program consists of transforming an image, so i go over a matrix and i modify each pixel inside it. For go over faster, i use multimedia extensions: At first i used…
-1
votes
2 answers

How come when I import of two functions from the same module, the import works only for one the two?

Intro I am running a python script on an cluster. I run everything in virtualenv and in the code I am importing two functions from the same module (written in SC_module.py): ex. SC_module.py def funA(): def funB(): In the script script.py I have…
s1mc0d3
  • 523
  • 2
  • 15
-1
votes
1 answer

Figuring out the number of processors for using openmpi

I have compiled a weather forecasting software with openmpi in double precision on Ubuntu 14.04 and Intel ifort compiler. However I am not able to figure out few issues. I need to figure out the number of processors I need to send to mpirun. This…
gansub
  • 1,164
  • 3
  • 20
  • 47
-1
votes
1 answer

How to detect which HPC scheduler (Torque, Sun Grid Engine etc) am I using?

I need to run a different script depending on the type of scheduler, which necessitates a reliable way to detect if the scheduler is Torque, SGE or something else. Something like $SHELL telling which shell I am using. or something like name. I am…
-1
votes
2 answers

Test if the program uses MPI (distributed) correctly?

How do I check that a program is using MPI when it runs? Specifically, how can I verify the program is running on multiple processors? Also, how can I figure out if my program is correctly running across multiple nodes?
Coheen
  • 39
  • 7
-1
votes
1 answer

Calculating time of a job in HPC

I'm start to use Cloud resources In my project I need to run a job, then I need to calculate the time between the begin of the execution of the job in the queue and the end of the job To I put the job in the queue, I used the command: qsub myjob How…
-1
votes
2 answers

Running NetLogo headless on HPC, how to increase CPU usage?

I was running NetLogo headless on HPC using behaviourspace. Some non-NetLogo other user on the HPC complained to me that I am not utilizing the CPU cores to very little extent and should increase. I don't know exactly know to how to do so, please…
Abhishek Bhatia
  • 9,404
  • 26
  • 87
  • 142
-1
votes
1 answer

I am unable to ping to the manual proxy from a node

I am using a HPC system with a master node and 8 other compute nodes. I have to download a LibXML perl module using CPAN on each of the compute nodes. I am able to ping the proxy from the master node. But, I am unable to do it from the compute…
-1
votes
3 answers

Ways to accelerate reduce operation on Xeon CPU, GPU and Xeon Phi

I have an application where reduce operations (like sum, max) on a large matrix are bottleneck. I need to make this as fast as possible. Are there vector instructions in mkl to do that? Is there a special hardware unit to deal with it on xeon cpu,…
hrs
  • 487
  • 5
  • 18
-1
votes
1 answer

How to solve: As I am getting error percentage as 199 % using fortran in hpc?

program EConstant include 'mpif.h' INTEGER n,ierr,lcv,rank,size,i DOUBLE PRECISION INTEGER factor, reduc DOUBLE PRECISION INTEGER redat, redrl, repnt DOUBLE PRECISION actval, actdiff,erpnt DOUBLE PRECISION este, reldiff …
Sun
  • 89
  • 1
  • 2
  • 11
-2
votes
1 answer

How to fix perl locale setting error while running scripts in slurm clusters?

I wanted to run a program called trinity, which is written in partly in perl using the high performance cluster at my institute. I used conda to install trinity and tried to run it by submitting the job .sh file in slurm. But the job would abort…
-2
votes
2 answers

Application based(which is used) OpenMPI

Please help me to find some working application which is using openmpi. I need any name of application which have widely/worldwide usage and based on openmpi (using it). At least the name of that kind application will be enough. Thanks
Davit Siradeghyan
  • 6,053
  • 6
  • 24
  • 29
-2
votes
1 answer

If I have a GPU server with 1PFlop/s single-precision floating-point power, can I do double-precision calculations?

If I have a GPU server with 1PFlop/s single-precision floating-point power, can I do double-precision calculations? If so, how much computing power is equivalent to double precision calculation?
1 2 3
99
100