I cannot accurately reflect this problem into title.
I want to use list
, func(*args)
and Pool.map
without errors.
Please see below.
▼Code
def df_parallelize_run(func, arguments):
p = Pool(psutil.cpu_count())
df = p.map(func, arguments)
p.close()
p.join()
return df
def make_lag(df: DataFrame, LAG_DAY: list):
for l in LAG_DAY:
df[f'lag{l}d'] = df.groupby(['id'])['target'].transform(lambda x: x.shift(l))
return df
def wrap_make_lag(args):
return make_lag(*args)
Given above three functions, I want to do followings
# df: DataFrame
arguments = (df, [1, 3, 7, 13, 16])
df = df_parallelize_run(wrap_make_lag, arguments)
▼ Error
in df_parallelize_run(func, arguments)
----> 7 df = pool.map(func, arguments)
in ..../python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
--> 268 return self._map_async(func, iterable, mapstar, chunksize).get()
in ..../python3.7/multiprocessing/pool.py in get(self, timeout)
--> 657 raise self._value
TypeError: make_lag() takes 2 positional arguments but 5 were given
I know cause of this mismatch (owing to unpacking the list, [1, 3, 7, 13, 16]
, that's 5).
How to do properly? If possible, I want to fit this list within constraint of positional arguments. If it is almost impossible (list
or Pool.map
), what is more appropriate, easy and flexible way?