6

There are not enough examples in the documentation on how to read data from sqlAlchemy to a dask dataframe.

Some examples i see are in terms of :

   df = dd.read_sql_table(table='my_table_name', uri=my_sqlalchemy_con_url, index_col='id')

But my query is not to get the entire table at once and only get (select * from ....table A where .....) kind of query to be passed as parameter.. Also, how to pass connection params are not mentioned.

How to pass connection params from sqlAlchemy connection engine and to pass custom query instead of fetching the whole table.

I understand this is basic but could find no references on this in docs / web

EDIT :

     d100 = 'mysql+mysqlconnector://{0}:{1}@{2}:{3}/{4}'.format("xx", "xx", "xxxxx.com", "3306", "xxxx") 
     df = dd.read_sql_table(table='select * from ps_g9 limit 10;', uri=d100, index_col='uuid') 

and i get table doesnt exist error

Viv
  • 1,474
  • 5
  • 28
  • 47

2 Answers2

1

Here is the documentation: https://docs.dask.org/en/stable/dataframe-api.html#dask.dataframe.read_sql_table

It lists the following:

  • that the "table" parameter can indeed be a table name or any SQL expression, constructed from sqlalchemy instances (i.e., not text)
  • that the dictionary engine_kwargs, if given, will be passed to sqlalchemy when creating the database query engine
mdurant
  • 27,272
  • 5
  • 45
  • 74
  • 1
    Put into engine_kwargs all the things you would like to put into engine creation, which includes username/password. Exactly what you might need depend on the specific engine. – mdurant Jul 04 '19 at 12:51
  • also tried this : d100 = 'mysql+mysqlconnector://{0}:{1}@{2}:{3}/{4}'.format("xx", "xx", "xxxxx.com", "3306", "xxxx") df = dd.read_sql_table(table='select * from ps_g9 limit 10;', uri=d100, index_col='uuid') and i get table doesnt exist – Viv Jul 04 '19 at 13:01
  • edited the question to show the exact error , @mdurant – Viv Jul 04 '19 at 13:07
  • 1
    " constructed from sqlalchemy instances (i.e., not text) " - see https://docs.sqlalchemy.org/en/13/core/tutorial.html – mdurant Jul 04 '19 at 15:43
  • yea im trying to construct the sqlalecmy version of a complex query and try.,.. did not work so far...but Thanks fr your help – Viv Jul 04 '19 at 15:47
  • The reason for this is, that dask needs to partition your query, and how to do that with a text query would depend on the dialect of SQL being used. Better rely on sqlalchemy to do that. – mdurant Jul 04 '19 at 15:49
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/196007/discussion-between-viv-and-mdurant). – Viv Jul 04 '19 at 16:15
1

The default partition size for numeric indexes is 256 MB, unless you specify npartitions. For string indexes, you can use the divisions argument, e.g.

... division = sorted(['red', 'green', 'blue', 'yellow']) ...

if you have an index with color names.

See also https://github.com/dask/dask/issues/2604.

Futher parameters for sqlalchemy can be passed using the engine_kwargs keyword. See also the read_sql_table docs.

Here is a preview of the relevant chapter of Mannings Data Science at Scale with Python and Dask book.

Stef
  • 28,728
  • 2
  • 24
  • 52
  • the question is on how to fetch data from db using sqlalchemy to dask dataframe. and the examples in the offical docs talk about getting entire table using read_sql but example to get subset of data (select * from.... where....) is not given so was asking for examples of this. Updated teh question for clarity as well – Viv Jul 04 '19 at 13:15
  • besides table names, you can specify **sqlalchemy expressions** in the `table` argument, so a `select.where()` should work (untested though). Did you read the github link I gave in my answer, specifically the [6th comment](https://github.com/dask/dask/issues/2604#issuecomment-389227612)? – Stef Jul 04 '19 at 13:40
  • regarding your EDIT: the error is because there is of course no table with the name `'select * from ps_g9 limit 10;'`. See https://docs.sqlalchemy.org/en/13/core/tutorial.html for sqlalchemie expressions.See also [this SO question](https://stackoverflow.com/questions/54819271/dask-read-sql-table-errors-out-when-using-an-sqlalchemy-expression/54914331). – Stef Jul 04 '19 at 13:53