4

What is the idiomatic way to add a pandas series to a dask dataframe?

Pandas is far more flexible for working with data so I often bring parts of dask dataframes into memory, manipulate columns and create new ones. I would then like to add these new columns to the original dask dataframe. How can these be accomplished?

Zelazny7
  • 39,946
  • 18
  • 70
  • 84
  • 1
    How is your dask dataframe partitioned? Does it have a known index? You would need to be able to partition your pandas dataframe along the same rows as your dask dataframe. If you have a well defined index or you know that they have the same shape then this is doable but in general it is a tricky problem. http://dask.pydata.org/en/latest/dataframe-design.html#partitions – MRocklin Jun 29 '17 at 21:55

1 Answers1

0

In recent versions of dask.dataframe, you can simply add the pandas.Series directly!

# for dask_df and pandas_series with the same index...
dask_df['newcol'] = pandas_series

Dask will automatically partition the pandas series to match the index of the dask.dataframe.

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
  • This does not work for me. I get a "lengths do not match" type error because even though the series I'm adding is the same length as the dataframe, when dask is checking the length it thinks the length of the dask dataframe is 2. This may be because the dask dataframe has unknown divisions. – David R Jul 24 '22 at 22:16