I found Dask recently. I have very basic questions about Dask Dataframe and other data structures.
- Is Dask Dataframe immutable data type?
- Is Dask array and Dataframe are lazy data structure?
I dont know whether to use dask or spark or pandas for my situation. I have 200 GB of data to compute. It took 9 hours to compute operations using plain python program. But it can be processed parallelly in lesser time by utilizing 16 core processor. If I split the dataframe in pandas I need to worry about commutative and associative property of my calculations. On the other hand I can use standalone spark cluster to just split up the data and run parallelly.
Do I need to setup any clusters in Dask as like as Spark?
How to run Dask dataframes in my own compute nodes?
Does Dask need master-slave setup?
I am a fan of pandas, so I am looking for solutions similar to pandas.