12

This should work:

raw_data.drop('some_great_column', axis=1).compute()

But the column is not dropped. In pandas I use:

raw_data.drop(['some_great_column'], axis=1, inplace=True)

But inplace does not exist in Dask. Any ideas?

jpp
  • 159,742
  • 34
  • 281
  • 339
cs0815
  • 16,751
  • 45
  • 136
  • 299

2 Answers2

26

You can separate into two operations:

# dask operation
raw_data = raw_data.drop('some_great_column', axis=1)

# conversion to pandas
df = raw_data.compute()

Then export the Pandas dataframe to a CSV file:

df.to_csv(r'out.csv', index=False)
jpp
  • 159,742
  • 34
  • 281
  • 339
  • 2
    I guess the transformation to pandas will potentially fail due to memory issues ... the reason why I started to use Dask ... – cs0815 Aug 09 '18 at 14:54
  • 1
    I see, but that's going to happen *anyway* when you use `compute`, even in your original code. You can try filtering and exporting in groups if that's the case. – jpp Aug 09 '18 at 14:54
2

I assume you want to keep "raw data" in a Dask DF. In that case the following will do the trick:

new_raw_df = raw_data.drop('some_great_column', axis=1).copy()

where type(new_raw_df) is dask.dataframe.core.DataFrame and you can delete the original DF.

Kai
  • 1,464
  • 4
  • 18
  • 31