2

Is there a way we can convert a dask dataframe to a matrix or 2-d array? I know that dask does not support yet multiindexing. I don't know how we can use dask delayed for this.

cel
  • 30,017
  • 18
  • 97
  • 117
Alger Remirata
  • 529
  • 1
  • 5
  • 17

1 Answers1

6

Version 0.13.0 (release date January 2017) includes DataFrame.values and DataFrame.to_records methods that can convert a Dask Dataframe to a Dask Array

In [1]: import dask.dataframe as dd

In [2]: import pandas as pd

In [3]: df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})

In [4]: ddf = dd.from_pandas(df, npartitions=2)

In [5]: ddf
Out[5]: dd.DataFrame<from_pa..., npartitions=1, divisions=(0, 2)>

In [6]: ddf.values
Out[6]: dask.array<values-..., shape=(nan, 2), dtype=int64, chunksize=(nan, 2)>

In [7]: ddf.values.compute()
Out[7]: 
array([[1, 4],
       [2, 5],
       [3, 6]])

In [8]: ddf.to_records()
Out[8]: dask.array<to-reco..., shape=(nan,), dtype=(numpy.record, [('index', '<i8'), ('x', '<i8'), ('y', '<i8')]), chunksize=(nan,)>

In [9]: ddf.to_records().compute()
Out[9]: 
rec.array([(0, 1, 4), (1, 2, 5), (2, 3, 6)], 
          dtype=[('index', '<i8'), ('x', '<i8'), ('y', '<i8')])
MRocklin
  • 55,641
  • 23
  • 163
  • 235