I'm connecting to a Oracle database with dask.dataframe.read_sql_table to try and bring across some larger tables, some with over 100 million rows and then write them to a s3 Bucket in parquet format. However, I keep running into memory errors even if I try and specify the number of partitions Dask recommends. I've read a bit about dask.distributed but not sure how to use it with dask.dataframe.read_sql_table. I also seem to be running into a KeyError a lot as well. Please follow the link for more information.
Only a column name can be used for the key in a dtype mappings argument
If anyone has any ideas of how to use dask.dataframe.read_sql_table for reading 100 million row tables it would be greatly appreciated.
Thanks