6

As I understand Dask DataFrame is proper way to handle tabular data like. I have a table in PostgreSQL, and I knowthe way to load it into pandas.Dataframe.

I know, odo can be used to conver pandas.DataFrame to dask.dataframe. But This is not lazy operation: such conversion force load the whole PostgeSQL table into memory and this is bad. I prefer read items one by one or by chunks. How to do this?

  1. The similar issue with Cassandra. But Cassandra is like distributed storage and it can be optimized for distributed access. But how to do this with Dask?
VolAnd
  • 6,367
  • 3
  • 25
  • 43
Sklavit
  • 2,225
  • 23
  • 29
  • could you provide a [mcve](https://stackoverflow.com/help/minimal-reproducible-example) to help understand and hopefully answer your question? – rrpelgrim Oct 04 '21 at 09:37

1 Answers1

0

As for MongoDB I created the follwong solution: https://gist.github.com/Sklavit/747e292fc17f6c9b400470006ff1c567

The main idea is to create a bag of target names and then pass those arguments to loader.

Sklavit
  • 2,225
  • 23
  • 29