Is it possible to get large datasets into a pandas DataFrame?
My dataset is approx. 1.5 Gb uncompressed (input for clustering), but when I try and select the contents of the Table using bq.Query(...)
it throws an exception:
RequestException: Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
Looking at https://cloud.google.com/bigquery/querying-data?hl=en which states,
You must specify a destination table.
It feels like the only place to send large queries is another Table (and then click export to GCS and download).
There will also be a (possibly large write back) as the classified rows are written back to the database.
The same dataset runs fine on my 16Gb Laptop (matter of minutes) but I am looking at migrating to Datalab as our data moves to the cloud.
Thank you very much, any help appreciated