normally when dealing with matrix a[size][size] mul vector b[size], nump = the number of processors. which way below is better? case 1: Divide matrix A into "nump" parts, each with "size/nump" rows, and let each process handle all the rows in one part. case 2: Using row distribution, whenever a process is idle, an unprocessed row of matrix A is distributed to that process for computation. i'm bothered by the cost of send/recv/broadcast. is there any proportional relation between size and the communication cost? or is there any way to predict the time complexity? or i could only use tools like vtune to test?
i'm bothered by the cost of send/recv/broadcast. is there any proportional relation between size and the communication cost? or is there any way to predict the time complexity? or i could only use tools like vtune to test?