Is there a way to easily convert a DataFrame of numeric values into an Array? Similar to values
with a pandas DataFrame. I can't seem to find any way to do this with the provided API, but I'd assume it's a common operation.
Asked
Active
Viewed 5,245 times
10

Paul English
- 937
- 11
- 18
-
https://docs.dask.org/en/latest/generated/dask.dataframe.DataFrame.to_dask_array.html – HappyFace Feb 18 '22 at 21:07
3 Answers
9
Edit: yes, now this is trivial
You can use the .values
property
x = df.values
Older, now incorrect answer
At the moment there is no trivial way to do this. This is because dask.array needs to know the length of all of its chunks and dask.dataframe doesn't know this length. This can not be a completely lazy operation.
That being said, you can accomplish it using dask.delayed as follows:
import dask.array as da
from dask import compute
def to_dask_array(df):
partitions = df.to_delayed()
shapes = [part.values.shape for part in partitions]
dtype = partitions[0].dtype
results = compute(dtype, *shapes) # trigger computation to find shape
dtype, shapes = results[0], results[1:]
chunks = [da.from_delayed(part.values, shape, dtype)
for part, shape in zip(partitions, shapes)]
return da.concatenate(chunks, axis=0)

MRocklin
- 55,641
- 23
- 163
- 235
-
This makes sense, thanks for the response. I figured using `to_delayed` would be one way to do this, thanks for posting a method that can do this. – Paul English May 26 '16 at 16:31
2
I think, there might be another way shorter.
import dask.array as da
import dask.dataframe as df
ruta ='...'
df = dd.read_csv(...)
x = df_reg['column you want to transform in array']
def transf(x):
xd=x.to_delayed()
full = [da.from_delayed(i, i.compute().shape, i.compute().dtype) for i in xd]
return da.concatenate(full)
x_array=transf(x)
In addition, if you want to convert a DaskDataframe with N columns, and therefore, each array element will be another array like this:
array((x,x2,x3),(y1,y2,y3),....)
You have to change the order:
from:
i.compute().dtype
to
i.compute().dtypes
Thanks

Julio CamPlaz
- 857
- 8
- 18