I am trying to write multiple pandas data frames to a csv using the ipython parallel module as doing so serially is very slow.
Here is a small example of what I am trying to do:
from IPython.parallel import Client
import pandas as pd
import numpy as np
rc = Client(profile='small_cluster')
dview = rc[:]
df1 = pd.DataFrame(np.arange(9).reshape(3, 3), columns=list('abc'))
df2 = pd.DataFrame(np.arange(9).reshape(3, 3), columns=list('xyz'))
def df_to_file(df, filepath):
df.to_csv(filepath)
h = dview.map_sync(df_to_file, [df1, df2], ['df1.csv', 'df2.csv'])
This runs without errors though the function doesn't have a return statement so h
is a list of None
(and nothing is written to disk). This is clearly not the correct way to go about doing this. I have successfully manipulated data frames in memory, though cannot figure out if it is possible to write them to disk in parallel. Any help is much appreciated.