How is dask implemented on multiple systems?

Question

I am new to Dask library.I wanted to know if we implement parallel computation using dask on two systems ,then is the data frame on which we apply the computation stored on both the systems ? How actually does the parallel computation takes place,it is not clear from the documentation.

score 0 · Answer 1 · answered Jul 06 '18 at 13:34

0

Dask dataframes are chunked, so in general you have one big dataframe made up of smaller dataframes spread across your cluster. Computations apply to each chunk individually with shuffling of results where required (such as groupby, sum and other aggregate tasks).

answered Jul 06 '18 at 13:34

mvn

118
1
5

Are these chunks created implicitly or we give certain specifications to create chunks,if so,how? – Sweta Jul 09 '18 at 05:41

How is dask implemented on multiple systems?

1 Answers1