I have a function that I want to parallelize so that it returns a dataframe with multiple columns based on an array. How can I use multiprocessing to do this? Here is an example of what my code is.
def f(df, x):
df['x'] = somefunc(x)
def run_parallel():
df = *existing dataframe*
values = ['a', 'b', 'c', 'd', 'e']
for i,s in enumerate(values):
j = multiprocessing.Process(target=f, args=(df, s))
jobs.append(j)
for j in jobs:
j.start()
return df
Where somefunc(x) returns a list of values based on what x is and df is the dataframe I want to return. I'm not sure how to get back the dataframe with these columns if I'm running it through multiprocessing.