Write pandas dataframes to csv using ipython parallel

Asked Oct 19 '15 at 19:09

Active Oct 19 '15 at 19:09

Viewed 1,050 times

I am trying to write multiple pandas data frames to a csv using the ipython parallel module as doing so serially is very slow.

Here is a small example of what I am trying to do:

from IPython.parallel import Client
import pandas as pd
import numpy as np
rc = Client(profile='small_cluster')
dview = rc[:]

df1 = pd.DataFrame(np.arange(9).reshape(3, 3), columns=list('abc'))
df2 = pd.DataFrame(np.arange(9).reshape(3, 3), columns=list('xyz'))

def df_to_file(df, filepath):
    df.to_csv(filepath)

h = dview.map_sync(df_to_file, [df1, df2], ['df1.csv', 'df2.csv'])

This runs without errors though the function doesn't have a return statement so h is a list of None (and nothing is written to disk). This is clearly not the correct way to go about doing this. I have successfully manipulated data frames in memory, though cannot figure out if it is possible to write them to disk in parallel. Any help is much appreciated.

asked Oct 19 '15 at 19:09

johnchase

13,155
6
38
64

2

Your code is working fine, specify the full path because everything gets written into your default ipython folder. – hellpanderr Oct 19 '15 at 19:25
Yes, you are correct. I can't believe I didn't realize that at the time. Thank you. – johnchase Oct 19 '15 at 19:30

Write pandas dataframes to csv using ipython parallel

0 Answers0