Let's start with a basic Flow-of-Work schedule, as if there were no additional resources, but to allow for a single ( a pure-[SERIAL]
) stream of running the whole amount of work.
This baseline schedule, not using any sort of concurrent or parallel orchestration, shows, that an initial 3 [ms]
-sprint ( SSS
) is followed by a consecutive execution of five independent 16 [ms]
-sprints ( marked by blocks of 16-P
-s ) and the whole workflow terminates after a final 4 [ms]
-sprint completes the baseline computing topology in about 87 [ms].
+-------+ +-------+
| | | |
| START | | EoJOB |
| | | |
+-------+ +-------+
: 1 2 3 4 5 6 7 8 : 9
0....5....0....5....0....5....0....5....0....5....0....5....0....5....0....5....0....5....0....5
| ^
v |
=SSS SSSS
| |
|PPPPPPPPPPPPPPPP |
|PPPPPPPPPPPPPPPP |
|PPPPPPPPPPPPPPPP |
|PPPPPPPPPPPPPPPP |
|PPPPPPPPPPPPPPPP|
Amdahl's law defines a maximum speedup that is fair to be expected, if all [PARALLEL]
-is-able units-of-work can & do run on sufficient enough & free in time additional processing resources ( five CPU-s as given in O/P ).
Schedule, now using at least those 5 free CPU resources on otherwise non-blocking processing fabric, running the computing topology in resources optimal orchestration, completes the same amount of work, yet in about only 27 [ms].
+-------+ +-------+
| | | |
| START | | EoJOB |
| | | |
+-------+ +-------+
: 1 2 : 3
0....5....0....5....0....5....0....5....
| ^ [ms]
v |
=SSS SSSS
| |
| CPU[A] |
|PPPPPPPPPPPPPPPP|
| |
| CPU[B] |
|PPPPPPPPPPPPPPPP|
| |
| CPU[C] |
|PPPPPPPPPPPPPPPP|
| |
| CPU[D] |
|PPPPPPPPPPPPPPPP|
| |
| CPU[E] |
|PPPPPPPPPPPPPPPP|
This is due to an advantage of running all the P-able blocks in true-[PARALLEL]
fashion ( having in due time free & non-blocking access to 5+ CPU resources ).
Further we can see, that no matter how many additional CPU-resources were made available, beyond those very 5 CPUs for the very said 5 P-able sections, no further speedup would ever appear, as the P-able sections were already mapped onto CPU-resources [A:E] and any other CPU will not help them do anything faster or complete the whole computing topology any time sooner.
1
S = -------------------------------- ~ 3.782 x if using 5+ CPU-resources
( 3 + 4 ) ( 5 x 16 )
_________ + ___________
87 87
--------------
5 <--- using 5+ CPU-resources to operate them in parallel
Q.E.D.
For more details
on Amdahl's law of diminishing returns ( adding more CPUs will make zero additional speedups ), on effects of atomicity of P-able work-units execution, on effects of setup/termination add-on overheads, you might want to read this