Imagine I have M independent jobs, each job has N steps. Jobs are independent from each other but steps of each job should be serial. In other words J(i,j) should be started only after J(i,j-1) is finished (i indicates the job index and j indicates the step). This is isomorphic to building a wall with width of M and hight of N blocks.
Each block of job should be executed only once. The time that it takes to do one block of work using one CPU (also the same order) is different for different blocks and is not known in advance.
The simple way of doing this using MPI is to assign blocks of work to processors and wait until all of them finish their blocks before the next assignment. This way we can make ensure that priorities are enforced, but there will be a lot of waiting time.
Is there a more efficient way of doing this? I mean when a processor finishes its job, using some kind of environmental variables or shared memory, could decide which block of job it should do next, without waiting for other processors to finish their jobs and make a collective decision using communications.