Suppose you want to perform two sums: one is a sum of 10 scalar variables, and one is a matrix sum of a pair of two-dimensional arrays, with dimensions 10 by 10. For now let’s assume only the matrix sum is parallelizable; What speed-up do you get with 10 versus 40 processors?
My Understanding:
10x10 matrix + 10 scalar variables = 110t
With 10 processors, (100/10)t + 10t = 20t
Speed-up=110/20=5.5;
With 40 processors, (100/40)t + 10t = 12.5t
Speed-up=110/12.5=8.8;
It is given in the solution book that we get about 55% of the potential speed-up with 10 processors, but only 22% with 40.
I understand the 55% but how does that 22% come?