I have a dictionary where the keys are table names and values are the number of rows in those tables. So table 'a' has 1 row, 'c' has 3 rows and so on..
data = {
'a': 1,
'b': 1,
'c': 3,
'd': 1,
'e': 5
}
I have to perform some complex operation on these tables, I'm using multiprocessing to run an operation on these tables in a parallel manner. For that, I need to shuffle this dictionary in such a way that the number of rows is uniform across multiple processes (kind of load balancing so that all processes share equal weight). Note that the number of processes to spawn will be a variable (provided by the user from cmd line).
So let's say if we want to create 2 processes for the above data then it should split in below manner. data1
will process 6 rows and data2
will process 5 rows, so the time these 2 processes will take will be almost equal.
data1 = {
'a': 1,
'b': 1,
'c': 3,
'd': 1,
}
data2 = {
'e': 5
}
Similarly, if we were to create 3 processes for the above data then
data1 = {
'a': 1,
'b': 1,
'd': 1,
}
data2 = {
'c': 3
}
data3 = {
'e': 5
}
Note: The dictionary may contain 100,000 keys so a solution which has less time complexity will be appreciated. Thanks!