As I understand Dask DataFrame is proper way to handle tabular data like.
I have a table in PostgreSQL, and I knowthe way to load it into pandas.Dataframe
.
I know, odo
can be used to conver pandas.DataFrame
to dask.dataframe.
But
This is not lazy operation: such conversion force load the whole PostgeSQL table into memory and this is bad. I prefer read items one by one or by chunks. How to do this?
- The similar issue with Cassandra. But Cassandra is like distributed storage and it can be optimized for distributed access. But how to do this with Dask?