Howto copy a dask dataframe?

Question

Given a pandas df one can copy it before doing anything via:

df.copy()

How can I do this with a dask dataframe object?

score 9 · Accepted Answer · answered Aug 03 '16 at 12:14

Mutation on dask.dataframe objects is rare, so this is rarely necessary.

That being said, you can safely just copy the object

from copy import copy
df2 = copy(df)

No dask.dataframe operation mutates any of the fields of the dataframe, so this is sufficient.

score 1 · Answer 2 · answered Mar 13 '19 at 16:21

Dask creates internal pipelines of lazy computations. Every version of your dataframe is another layer of computations which are not computed until later.

You can branch from these computations by either copying it like @MRocklin suggests, then you're working on a brand new stack of computations, or you can continue on the same stack by doing:

df = df[df.columns]

score 1 · Answer 3 · answered Apr 11 '21 at 01:19

It is possible you want to have two versions of your data, one after a mutation. There is a copy method on dask dataframes you can use; it likely does the same as python's copy.copy, but if feels safer (to me) to use the library maintainer's version.

import dask.dataframe as dd
ddf = dd.from_pandas(pd.DataFrame({'z': [1, 2]}), npartitions=1)
ddf2 = ddf.copy()
ddf2['z'] -= 10

print(ddf.compute())
print()
print(ddf2.compute())

score -3 · Answer 4 · answered Aug 03 '16 at 12:21

-3

Write it to a file and read again:

import os
import dask.dataframe as dd

df = <Initial Dask Dataframe to be copied>
file = 'sample.csv'
df.to_csv(file)
df2 = df.read_csv(file)
os.remove(file)

answered Aug 03 '16 at 12:21

Gaurav Dhama

1,346
8
19

Howto copy a dask dataframe?

4 Answers4