1

Im new to protobuf data and i have generated python code from proto file by googling, now i want to load protobuf data which is in GCS to bigquery. i was googling a lot to find a way to load protobuf data directly to bigquery.

i was going through few github repos like below.

https://github.com/googleapis/java-bigquerystorage

Can anyone explain or guide me on a simple example on how to load protobuf data to bigquery.

chethi
  • 699
  • 2
  • 7
  • 23

1 Answers1

1

One way would be to run code to convert the protobuf data to something BigQuery knows how to read: JSON, Avro, or Parquet.
The simplest would be JSON. Export the JSON-formatted data to somewhere on GCS, and then have Bigquery load the JSON data. Simplest way is via the bq commandline tool, e.g.:

bq load --ignore_unknown_values --autodetect --source_format=NEWLINE_DELIMITED_JSON datasetName.tableName gs://yourGCSpath

This will ask BQ to derive the table schema from the JSON data. If you want to load into an existing table with a known schema you can provide a JSON-formatted schema file as an argument.

Loading data into BQ is free, but there are some limitations (such as no more than 1500 loads per table per day, no more than 15TB per load job, etc.)

Consult the docs for details.

skibee
  • 1,279
  • 1
  • 17
  • 37