From documentation here https://pythonhosted.org/joblib/parallel.html#parallel-reference-documentation
It's not clear for me what exactly batch_size
and pre_dispatch
means.
Let's consider case when we are using 'multiprocessing'
backend, 2 jobs (2 processes) and we have 10 tasks to compute.
As i understand:
batch_size
- controls amount of pickled tasks at one time, so if you set batch_size = 5
- joblib will pickle and send 5 tasks immediately to each process, and after arriving there they will be solved by process sequentially, one after another. With batch_size=1
joblib will pickle and send one task at a time, if and only if that process completed previous task.
To show what i mean:
def solve_one_task(task):
# Solves one task at a time
....
return result
def solve_list(list_of_tasks):
# Solves batch of tasks sequentially
return [solve_one_task(task) for task in list_of_tasks]
So this code:
Parallel(n_jobs=2, backend = 'multiprocessing', batch_size=5)(
delayed(solve_one_task)(task) for task in tasks)
is equal to this code (in perfomance):
slices = [(0,5)(5,10)]
Parallel(n_jobs=2, backend = 'multiprocessing', batch_size=1)(
delayed(solve_list)(tasks[slice[0]:slice[1]]) for slice in slices)
Am i right? And what pre_dispatch
means then?