Questions tagged [parallelism-amdahl]

Amdahl's law, also known as Amdahl's argument, is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors. The law is named after computer architect Gene Amdahl, and was presented at the AFIPS Spring Joint Computer Conference in 1967.

Amdahl's law, also known as Amdahl's argument, is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in to predict the theoretical maximum speedup using multiple processors. The law is named after computer architect Gene Amdahl, and was presented at the AFIPS Spring Joint Computer Conference in 1967.

The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. For example, if a program needs 20 hours using a single processor core, and a particular portion of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (95%) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour. Hence the speedup is limited up to 20x.

106 questions
1
vote
1 answer

If my 8 core CPU supports 16 threads, would 16 be a better number than 8 for number of processes in a Pool?

I am using multi-processing in python 3.7 Some articles say that a good number for number of processes to be used in Pool is the number of CPU cores. My AMD Ryzen CPU has 8 cores and can run 16 threads. So, should the number of processes be 8 or…
1
vote
1 answer

Convert for-loops to parallel computing

I do a numerical simulation. I have 50,000 particles that crystallize in a box, in which there are several crystals (with periodic boundary condition). I try to find the biggest crystal in the box. The code works well, but it takes about 100 [s] on…
1
vote
1 answer

Joblib doesn't call a custom function when n_jobs > 1

I have an example with data. As you can see from the code, every call of a function fit_by_idx() has to print 'here', but it doesn't. All is ok when n_jobs=1, but if n_jobs is more, than joblib does not call the function. Code: import…
1
vote
1 answer

Why a task run in OpenMP threads actually takes longer than in serial?

I have writtem this code to estimate the value of an integral. A straightforward and simple for()-loop in parallel, using openmp. Whatever I do, I cannot reduce the running time in parallel to be less than in serial. What is the problem? lenmuta,…
1
vote
1 answer

Calculating the maximum speedup with parallelization

Amdahl's Law lets us calculate the maximum theoretical speedup of a programme when adding more and more processing capacity to our hardware. This is stated by T = 1 / ((1-P) + (P/N)) where (1-P) is the part of the programme that is sequential and…
1
vote
1 answer

How to use Amdahl's Law (overall speedup vs speedup)

Recall Amdahl’s law on estimating the best possible speedup. Answer the following questions. You have a program that has 40% of its code parallelized on three processors, and just for this fraction of code, a speedup of 2.3 is achieved. What is the…
1
vote
1 answer

How to find the optimal number of workers for parfor?

How to find the optimal number of workers for parfor on Amazon's virtual machine? For which cases I should use the number of the physical and for which the number of the logical cores? Is there any "Rule of Thumb" for this? I run a compiled code (…
1
vote
1 answer

High Memory Usage when python multiprocessing run in Windows

The code down below is a contrived example that simulates an actual problem I have that uses multiprocessing to speed up the code. The code is run on Windows 10 64-bit OS, python 3.7.5, and ipython 7.9.0 the transformation functions(these functions…
mathguy
  • 1,450
  • 1
  • 16
  • 33
1
vote
1 answer

joblib.Parallel() slower than single for skimage

I have to apply a 2D filter for every slice of a stack of images and I would like to parallelize the analysis. However, the code below runs slower than a normal for loop. Also, increasing n_jobs also increase the processing time, which is faster for…
Sav
  • 142
  • 1
  • 17
1
vote
2 answers

OpenMP benchmarking parallel computations

I'm trying to benchmark computing an f(x) while varying the number of threads with every iteration. f(x) = c * ln(x) * cos(x) n=10000000 for (int pp = 2; pp<17; pp++) { p = pp; int chunk = n/p; //acts like floor …
Oussama Ben Ghorbel
  • 2,132
  • 4
  • 17
  • 34
1
vote
1 answer

Optimal way to prepare data for Dask distributed client

I've got a function that effectively generates an image and stores in to the disk. The function has no arguments: def generate_and_save(): pass # generate and store image I need to generate a large number of images (say 100k), so I opt for…
1
vote
1 answer

Parallelize process of generating combinations

The Problem I need to create a list of combinations of items from a master list of length n. With a small number of items in the master list, this can be done without parallelization and happen quickly. However, when I try to use the multiprocessing…
1
vote
0 answers

Parallel version for DSP kernels

I developed auto-parallelizer for compiler-generated serial code ( see www.dalsoft.com ) and looking for the ways to apply this technology ( any suggestions? ). One possibility is to create parallel code for DSP filters. As an example I took…
1
vote
1 answer

Running a parallelised recursive python program on GPU while still maintaining functionality of global variables

My problem essentially boils down to "How do I get this bit of threaded Python code to run on my GPU instead of my CPU?" I'm working on a program that's similar to a Travelling Salesman problem where I recursively check each possible move (with…
1
vote
0 answers

Need help in optimising moving window entropy calculations

I am trying to calculate the entropy of 3D patches with sliding windows from a larger 3D array. I can't seem to find a way of optimising the code to run with any reasonable speed. My current working approach uses nested for loops taking each coord…