I was going through MMDS book that has an online MOOC by the same name. I'm having trouble understanding Communication Cost Model and the Join Operation Calculations mentioned in Topic 2.5 and am surprised by how poorly organized the book is as the MOOC covers the same topic within the "Advanced Topics/Computation Complexity of MapReduce" at the end of the course.
There's an exercise question (example did not help at all) that goes like:
We wish to take the join R(A,B) |><| S(B,C) |><| T(A,C) as a single MapReduce process, in a way that minimizes the communication cost. We shall use 512 Reduce tasks, and the sizes of relations R, S, and T are 220 = 1,048,576, 217 = 131,072, and 214 = 16,384, respectively. Compute the number of buckets into which each of the attributes A, B, and C are to be hashed. Then, determine the number of times each tuple of R, S, and T is replicated by the Map function.
Could you walk me through it. I don't know how he jumps from simple R+S+T to Lagrange's identities without having deliberated on intermediary steps.