Time Complexity for Spark/Distributed Algorithms

Question

If we have below time complexity

for some sequential algorithm, how can we express this time complexity for the same algorithm implemented in Spark (distributed version). Assuming that we have 1 master node and 3 worker nodes in the cluster?

Similarly, how can we express O(n^2) time complexity for Spark algorithm?

Moreover, how can we express Space Complexity in HDFS with replication factor 3?

Thanks in advance!!!!

score 1 · Answer 1 · answered Feb 26 '22 at 17:37

Ignoring orchestration and communication time (which is often not the case, ex. in case of sorting the whole data, the operation cannot be just "split" on different partitions).

Let's make another convenient assumption: the data is perfectly partitioned among the 3 partitions: every node holds n/3 data.

This said, I think we can consider an O(n^2) algorithm as sum of three O((n/3) ^ 2) partial computations (hence a final O((n/3) ^ 2)). This goes similarly for any other complexity ( O(n^2 log n) will be O((n/3)^2 log(n/3)) ).

As for the replication factor in hadoop, given the assumptions above, since the operations will be executed in parallel among replicas (!= from partitions), the complexity will be the same as an execution of a single "replica".

Time Complexity for Spark/Distributed Algorithms

1 Answers1