2

I am trying to fetch data from BigQuery. Everything is working fine when i fetch small data but when i try to fetch big data then its taking forever. any efficient way?

So far i am using this:

import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'cred.json'
import google.auth
from google.cloud import bigquery

%load_ext google.cloud.bigquery

import google.datalab.bigquery as bq
from google.cloud.bigquery import Client

client = bigquery.Client()

Here is my SQL command:

sql = """
   SELECT bla, bla1, bla2
FROM table
"""
df = client.query(sql)
df.to_dataframe()
Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
s_khan92
  • 969
  • 8
  • 21

1 Answers1

11

You can get BigQuery data into a dataframe magnitudes faster by changing the method.

Check how these options are reflected in the chart:

  • A: to_dataframe() - Uses BigQuery tabledata.list API.
  • B: to_dataframe(bqstorage_client=bqstorage_client), package version 1.16.0 - Uses BigQuery Storage API with Avro data format.
  • C: to_dataframe(bqstorage_client=bqstorage_client), package version 1.17.0 - Uses BigQuery Storage API with Arrow data format.
  • D: to_arrow(bqstorage_client=bqstorage_client).to_pandas(), package version 1.17.0 - Uses BigQuery Storage API with Arrow data format.

enter image description here

Note how you can go from >500 seconds to ~20 by using to_arrow(bqstorage_client=bqstorage_client).to_pandas().

See https://medium.com/google-cloud/announcing-google-cloud-bigquery-version-1-17-0-1fc428512171

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • Hi @Felipe . First of all really thanks for your answer and its seems really good but i tried also before using pandas_gbq but whenever i try to import this then i am getting this error... `ImportError: cannot import name 'lzip'` – s_khan92 Jan 16 '20 at 08:36
  • Can you post a new question for that? Perhaps Tim can answer it – Felipe Hoffa Jan 16 '20 at 08:42
  • I just posted the new question/. https://stackoverflow.com/questions/59766605/importing-error-of-panda-gbq-in-vmvirtual-machine-in-google-cloud-platform – s_khan92 Jan 16 '20 at 09:35
  • Hi Felipe. This answer saved me so much time. Thank you so much. How is 'to_arrow()' so much faster? – Nguai al May 07 '22 at 22:24