0

enter image description here

When I use the BigQuery console manually, I can see that the 3 options when exporting a table to GCS are CSV, JSON (Newline delimited), and Avro.

With Airflow, when using the BigQueryToCloudStorageOperator operator, what is the correct value to pass to export_format in order to transfer the data to GCS as JSON (Newline delimited)? Is it simply JSON? All examples I've seen online for BigQueryToCloudStorageOperator use export_format='CSV', never for JSON, so I'm not sure what the correct value here is. Our use case needs JSON, since the 2nd task in our DAG (after transferring data to GCS) is to then load that data from GCS into our MongoDB Cluster with mongoimport.

Canovice
  • 9,012
  • 22
  • 93
  • 211

2 Answers2

2

I found that the value export_format='NEWLINE_DELIMITED_JSON' was required after finding the documentation https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationextract and refering to the values for destinationFormat

Paul
  • 21
  • 2
1

According to the BigQuery documentation the three possible formats to which you can export BigQuery query results are: CSV, JSON, and Avro (and this is compatible with the UI drop-down menu).

enter image description here

I would try with export_format='JSON' as you already proposed.

UJIN
  • 1,648
  • 13
  • 28
  • JSON does work. After setting the bigquery_conn_id, and using `JSON`, everything seemed to work fine. – Canovice Nov 13 '20 at 20:30