1

First of all, Is it possible to do batch load into Google BigQuery through its API? If yes, then how much does it cost?

I don't want to go for streaming load because it costs a bit as compared to batch load.

I will be inserting a million rows each day. I will be using Python to use 'patch' or 'update' in the API.

Iñigo
  • 2,500
  • 2
  • 10
  • 20
Gagan
  • 1,775
  • 5
  • 31
  • 59
  • How much bytes are the million rows roughly? Can you define "slow" in your perception what it means, as the streaming API supports 100.000 rows/second, which I think is unique on the market. – Pentium10 Aug 15 '18 at 06:49
  • oh yes. I take the 'slow' statement back. Removed it from the comment. I want to insert around 3-4millions daily. Before inserting into BigQ I want to do some data processing too. – Gagan Aug 15 '18 at 08:36

2 Answers2

2

You can load data:

what you ar looking for is line 3, on that page you find a lot of examples in different programming languages, loading data from local or GCS file is free.

your data can be in any of the following formats:

  • Comma-separated values (CSV)
  • JSON (newline-delimited)
  • Avro
  • Parquet
  • ORC
Pentium10
  • 204,586
  • 122
  • 423
  • 502
  • No. I am not loading data through Local and even not from GCS. Its from Azure blob. So I have a python script which extract data from blob, processed it and then push it into BigQ. Hence I want to use API to push it into BigQ table. So I want to know, is it possible to use Batch load through python? – Gagan Aug 15 '18 at 09:19
  • 1
    The python script runs somewhere, create a file on that machine and push that file to BQ thats option "Local". Otherwise if you want to do on-fly you need to use streaming insert. – Pentium10 Aug 15 '18 at 09:20
  • So you are saying if I am using the VM, the script will save the file there and then programmatically it can be pushed to BQ? Also, is it not possible to give the Azure blob path to the BQ? Blob does give http file to access its container i guess. – Gagan Aug 15 '18 at 11:22
  • You need to create the file on the VM and issue a load api call which is free. Other methods are described here: https://stackoverflow.com/questions/44806345/is-there-a-way-to-continuously-pipe-data-from-azure-blob-into-bigquery – Pentium10 Aug 15 '18 at 11:26
  • I saw this post but I thought its been an year, there might be a new way to deal with this use case without storing it in local or in GCS. – Gagan Aug 15 '18 at 12:21
0

I think this is what you are looking for: Batch Queries for Python.

Also, here is the GitHub Repository for Python and BigQuery, you can find the snippet that appears on the documentation under snippets.py.

You can find the BigQuery pricing here [3] and a calculator here [4] [5]

Iñigo
  • 2,500
  • 2
  • 10
  • 20