3

I'm reading a list of files using dask read_parquet and concatenate those data frames and writing to some file. during the concatenate, does dask read's all the data in to memory while concatenating or it is loading only schema's, concatenate(I'm doing concatenation with axis 0) ??

Thanks in advance

rpanai
  • 12,515
  • 2
  • 42
  • 64
Learnis
  • 526
  • 5
  • 25

1 Answers1

4

"Dask DataFrame is lazy by default" see documentation so unless you fire compute it's just working with schemes.

import pandas as pd
import dask.dataframe as dd
import numpy as np

df1 = pd.DataFrame(np.random.randn(10,2))
df2 = pd.DataFrame(np.random.randn(10,3))

ddf1 = dd.from_pandas(df1, npartitions=2)
ddf2 = dd.from_pandas(df2, npartitions=2)

ddf = dd.concat([ddf1, ddf2])
print(ddf)
Dask DataFrame Structure:
                     0        1        2
npartitions=4                           
               float64  float64  float64
                   ...      ...      ...
                   ...      ...      ...
                   ...      ...      ...
                   ...      ...      ...
Dask Name: concat, 8 tasks

rpanai
  • 12,515
  • 2
  • 42
  • 64