What is the most efficient way to create a dask.array
from a dask.Series of list?
The series consists of 5 million lists 300 of elements.
It is currently divide into 500 partitions.
Currently I am trying:
pt = [delayed(np.array)(y)
for y in
[delayed(list)(x)
for x in series.to_delayed()]]
da = delayed(dask.array.concatenate)(pt, axis=1)
da = dask.array.from_delayed(da, (vec.size.compute(), 300), dtype=float)
The idea is to convert each partition into a numpy array and stitch
those together into a dask.array
.
This code is taking forever to run though.
A numpy array can be built from this data quite quickly from this data sequentially as long as there is enough RAM.