3

I have a numpy array that i want to add as a column in a existing dask dataframe.

enc = LabelEncoder()
nparr = enc.fit_transform(X[['url']])

I have ddf of type dask dataframe.

ddf['nurl'] = nparr   ???

Any elegant way to achieve above please?

Python PANDAS: Converting from pandas/numpy to dask dataframe/array This does not solve my issue as i want numpy array into existing dask dataframe.

Irshad Ali
  • 1,153
  • 1
  • 13
  • 39
  • It seems that you are not new to this site. Don't you think you can make your question better? Could you please add some more info? – J...S Aug 22 '19 at 10:24
  • Possible duplicate of [Python PANDAS: Converting from pandas/numpy to dask dataframe/array](https://stackoverflow.com/questions/48794621/python-pandas-converting-from-pandas-numpy-to-dask-dataframe-array) – Prathik Kini Aug 22 '19 at 10:24

1 Answers1

8

You can convert the numpy array to a dask Series object, then merge it to the dataframe. You will need to use the .to_frame() method of the Series object since it dask only support merging dataframes with other dataframes.

import dask.dataframe as dd
import numpy as np
import pandas as pd

df = pd.DataFrame({'x': range(30), 'y': range(0,300, 10)})
arr = np.random.randint(0, 100, size=30)

# create dask frame and series
ddf = ddf = dd.from_pandas(df, npartitions=5)
darr = dd.from_array(arr)
# give it a name to use as a column head
darr.name = 'z'

ddf2 = ddf.merge(darr.to_frame())

ddf2
# returns:
Dask DataFrame Structure:
                   x      y      z
npartitions=5
0              int64  int64  int32
6                ...    ...    ...
...              ...    ...    ...
24               ...    ...    ...
29               ...    ...    ...
Dask Name: join-indexed, 33 tasks
James
  • 32,991
  • 4
  • 47
  • 70
  • it throws an error `AttributeError: 'DataFrame' object has no attribute 'to_frame' when i try my data ` . could you help me with that? – Coder Sep 22 '22 at 20:29