I have a number of objects (roughly 530,000
). These objects are randomly assigned to a set of lists (not actually random but let's assume it is). These lists are indexed consecutively and assigned to a dictionary, called groups
, according to their index. I know the total number of objects but I do not know the length of each list ahead of time (which in this particular case happens to vary between 1
and 36000
).
Next I have to process each object contained in the lists. In order to speed up this operation I am using MPI
to send them to different processes. The naive way to do this is to simply assign each process len(groups)/size
(where size
contains the number of processes used) lists, assign any possible remainder, have it process the contained objects, return the results and wait. This obviously means, however, that if one process gets, say, a lot of very short lists and another all the very long lists the first process will sit idle most of the time and the performance gain will not be very large.
What would be the most efficient way to assign the lists? One approach I could think of is to try and assign the lists in such a way that the sum of the lengths of the lists assigned to each process is as similar as possible. But I am not sure how to best implement this. Does anybody have any suggestions?