Determining the Parallel and Serial Region of Code and Calculating Speedup using Amdahl's Law

Question

I was trying to understand the working of Amdahl's law but got confused in the process. Consider the following problem:

Suppose
a program has a part at the beginning that is sequential in nature (must be executed by only one processor) and takes 3 ms. Also, there is a part at the end of the program that is sequential (must be executed by only one processor) and takes 4 ms. The rest of the code is divided into 5 equal parts that are executed in parallel on 5 processes and each of these parts takes 16 ms. Calculate speedup using Amdahl's law.

Here is how I approached this problem. I first calculated the serial and parallel fraction, where 0.3 is the serial part and 0.7 is the parallel part calculated from the following logic:

Serial Part = 3 ms + 4 ms = 7 ms

Parallel Part = 16 ms (Only taking once as the code executes parallel on 5 processors)

Total = 7 ms + 16 ms = 23 ms

Serial Fraction = 7 ms / 23 ms = 0.3 (approx)

Parallel Fraction = 16 ms / 23 ms = 0.7 (approx)

Now putting values in Amdahl's law:

Speedup = 1 / (S + P/N) (where N = Processors, S = Serial Fraction, P = Parallel Fraction)

Speedup = 1 / (0.3 + 0.7/5) = 2.27 (approx)

So is my approach correct or is there any other value of speedup for this problem?

score 0 · Answer 1 · edited Jun 06 '23 at 23:10

Let's start with a basic Flow-of-Work schedule, as if there were no additional resources, but to allow for a single ( a pure-[SERIAL] ) stream of running the whole amount of work.

This baseline schedule, not using any sort of concurrent or parallel orchestration, shows, that an initial 3 [ms]-sprint ( SSS ) is followed by a consecutive execution of five independent 16 [ms]-sprints ( marked by blocks of 16-P-s ) and the whole workflow terminates after a final 4 [ms]-sprint completes the baseline computing topology in about 87 [ms].

+-------+                                                                              +-------+
|       |                                                                              |       |
| START |                                                                              | EoJOB |
|       |                                                                              |       |
+-------+                                                                              +-------+
:         1         2         3         4         5         6         7         8      :  9
0....5....0....5....0....5....0....5....0....5....0....5....0....5....0....5....0....5....0....5
|                                                                                      ^
v                                                                                      |
=SSS                                                                                SSSS
   |                                                                                |
   |PPPPPPPPPPPPPPPP                                                                |
                   |PPPPPPPPPPPPPPPP                                                |
                                   |PPPPPPPPPPPPPPPP                                |
                                                   |PPPPPPPPPPPPPPPP                |
                                                                   |PPPPPPPPPPPPPPPP|

Amdahl's law defines a maximum speedup that is fair to be expected, if all [PARALLEL]-is-able units-of-work can & do run on sufficient enough & free in time additional processing resources ( five CPU-s as given in O/P ).

Schedule, now using at least those 5 free CPU resources on otherwise non-blocking processing fabric, running the computing topology in resources optimal orchestration, completes the same amount of work, yet in about only 27 [ms].

+-------+              +-------+
|       |              |       |
| START |              | EoJOB |
|       |              |       |
+-------+              +-------+
:         1         2  :      3     
0....5....0....5....0....5....0....5....
|                      ^               [ms]
v                      |
=SSS                SSSS
   |                |
   | CPU[A]         |
   |PPPPPPPPPPPPPPPP|
   |                |
   | CPU[B]         |
   |PPPPPPPPPPPPPPPP|
   |                |
   | CPU[C]         |
   |PPPPPPPPPPPPPPPP|
   |                |
   | CPU[D]         |
   |PPPPPPPPPPPPPPPP|
   |                |
   | CPU[E]         |
   |PPPPPPPPPPPPPPPP|

This is due to an advantage of running all the P-able blocks in true-[PARALLEL] fashion ( having in due time free & non-blocking access to 5+ CPU resources ).

Further we can see, that no matter how many additional CPU-resources were made available, beyond those very 5 CPUs for the very said 5 P-able sections, no further speedup would ever appear, as the P-able sections were already mapped onto CPU-resources [A:E] and any other CPU will not help them do anything faster or complete the whole computing topology any time sooner.

              1
S = -------------------------------- ~ 3.782 x if using 5+ CPU-resources
     ( 3 + 4 )       ( 5 x 16 )
     _________   +  ___________
        87               87
                   --------------
                         5                  <--- using 5+ CPU-resources to operate them in parallel

Q.E.D.

_{For more details
on Amdahl's law of diminishing returns ( adding more CPUs will make zero additional speedups ), on effects of atomicity of P-able work-units execution, on effects of setup/termination add-on overheads, you might want to read this}

Determining the Parallel and Serial Region of Code and Calculating Speedup using Amdahl's Law

1 Answers1