Drop column using Dask dataframe

Question

This should work:

raw_data.drop('some_great_column', axis=1).compute()

But the column is not dropped. In pandas I use:

raw_data.drop(['some_great_column'], axis=1, inplace=True)

But inplace does not exist in Dask. Any ideas?

score 26 · Accepted Answer · answered Aug 09 '18 at 14:53

26

You can separate into two operations:

# dask operation
raw_data = raw_data.drop('some_great_column', axis=1)

# conversion to pandas
df = raw_data.compute()

Then export the Pandas dataframe to a CSV file:

df.to_csv(r'out.csv', index=False)

answered Aug 09 '18 at 14:53

jpp

2

I guess the transformation to pandas will potentially fail due to memory issues ... the reason why I started to use Dask ... – cs0815 Aug 09 '18 at 14:54
1

I see, but that's going to happen *anyway* when you use `compute`, even in your original code. You can try filtering and exporting in groups if that's the case. – jpp Aug 09 '18 at 14:54

score 2 · Answer 2 · answered Oct 24 '20 at 23:31

I assume you want to keep "raw data" in a Dask DF. In that case the following will do the trick:

new_raw_df = raw_data.drop('some_great_column', axis=1).copy()

where type(new_raw_df) is dask.dataframe.core.DataFrame and you can delete the original DF.

2 Answers2