0

I am new to Dask library.I wanted to know if we implement parallel computation using dask on two systems ,then is the data frame on which we apply the computation stored on both the systems ? How actually does the parallel computation takes place,it is not clear from the documentation.

Sweta
  • 63
  • 3
  • 13

1 Answers1

0

Dask dataframes are chunked, so in general you have one big dataframe made up of smaller dataframes spread across your cluster. Computations apply to each chunk individually with shuffling of results where required (such as groupby, sum and other aggregate tasks).

mvn
  • 118
  • 1
  • 5
  • Are these chunks created implicitly or we give certain specifications to create chunks,if so,how? – Sweta Jul 09 '18 at 05:41