Reading batches of data from BigQuery into Datalab

Question

I have a big dataset in BigQuery table (~45M lines, 13Gb of data). I would like to process that data in my Google Datalab Notebook to do some basic statistics with pandas to visualise data later with matplotlib in Datalab cell. I think it is not a good idea to try to load all dataset into pandas' Dataframe (at least I will have RAM issues).

Is it possible to read data from BigQuery in batches (say 10K lines) to consume it in Datalab?

Thanks in advance!

score 4 · Accepted Answer · answered Apr 11 '17 at 00:03

4

If your purpose is to visualize the data, would sampling be better than loading a small batch?

You can sample your data such as:

import google.datalab.bigquery as bq
df = bq.Query(sql='SELECT image_url, label FROM coast.train WHERE rand() < 0.01').execute().result().to_dataframe()

Or, a use convenient class:

from google.datalab.ml import BigQueryDataSet
sampled_df = BigQueryDataSet(table='myds.mytable').sample(1000)

answered Apr 11 '17 at 00:03

Bradley Jiang

424
2
1

Thanks for idea with sampling @BradleyJiang, it could be helpful! But here I'm working with chronological data and subsampling will destroy time-ordered lines. If I will not find a way for batch reading from BigQuery, I guess I will need go to the original csv file and will read lines in a batch. – Alexander Usoltsev Apr 11 '17 at 08:34
Then how about sorting them by time and then query it by pages? import google.datalab.bigquery as bq q = bq.Query('select * from myds.mytable order by time') t = q.execute().result() t.to_dataframe(start_row=page_start, max_rows=page_size) – Bradley Jiang Apr 12 '17 at 20:28

score 3 · Answer 2 · answered Apr 13 '17 at 03:21

3

Have you tried just iterating over the table? The Table object is an iterable that uses a paged fetcher to get data from the BigQuery table, it is streaming in a way. The page size is 1024 by default.

answered Apr 13 '17 at 03:21

yelsayed

5,236
3
27
38

Reading batches of data from BigQuery into Datalab

2 Answers2

Linked