Using multiprocessing to return a dataframe

Question

I have a function that I want to parallelize so that it returns a dataframe with multiple columns based on an array. How can I use multiprocessing to do this? Here is an example of what my code is.

def f(df, x): 
   df['x'] = somefunc(x)

def run_parallel():
   df = *existing dataframe*
   values = ['a', 'b', 'c', 'd', 'e']
   for i,s in enumerate(values):
       j = multiprocessing.Process(target=f, args=(df, s))
       jobs.append(j)
   for j in jobs:
       j.start()
   return df

Where somefunc(x) returns a list of values based on what x is and df is the dataframe I want to return. I'm not sure how to get back the dataframe with these columns if I'm running it through multiprocessing.

Scary - check this out: https://stackoverflow.com/questions/13592618/python-pandas-dataframe-thread-safe — jch, Jul 29 '22 at 16:17
@jch is there a different way to write to a df safely with parallel processing? Without it my data runs really slowly, so I'd like to find a way to speed this up. — Roxanne, Jul 29 '22 at 16:36
Would it work for your use case to partition the main DF into separate DFs? The put it all back together at the end? — jch, Jul 29 '22 at 16:47

score 0 · Answer 1 · answered Jul 29 '22 at 17:25

0

See pandarallel~

from pandarallel import pandarallel
pandarallel.initialize()

df['x'] = df['x'].parallel_apply(somefunc, args=(x,))

answered Jul 29 '22 at 17:25

BeRT2me

12,699
2
13
31

This probably isn't set up quite right for your use case, but with some more details on `somefunc()`, I bet it could be adapted. – BeRT2me Jul 29 '22 at 17:28
is there a way to loop this so I can get back multiple columns for df? The way you did it seems like only the 'x' column would come back, is there a way I can get back a column for each value in the x array? – Roxanne Jul 29 '22 at 21:01
It only modifies the x column, but the whole dataframe still exists... are you modifying more than the x column? – BeRT2me Jul 29 '22 at 21:22

Using multiprocessing to return a dataframe

1 Answers1