1

I'm trying to push data from gcs to big query table and using airflow operator GCSToBigQueryOperator. Below is what I have

parquet_to_bq = GCSToBigQueryOperator(
        bigquery_conn_id="dev",
        task_id="gcs_to_bq_task",
        bucket="bucket_id",
        source_format="PARQUET",
        source_objects=['test/*'],
        destination_project_dataset_table="table_name",
        write_disposition='WRITE_TRUNCATE',
        impersonation_chain=IMPERSONATE_SERVICE_ACCOUNT
    )

The gcs bucket path is in the below format

bucket_id/test/op_dt=2021-01-01/1.parquet
bucket_id/test/op_dt=2021-01-02/2.parquet

The table in big query is partitioned on op_dt. When I execute the dag I'm getting the following error:

google.api_core.exceptions.BadRequest: 400 The field specified for partitioning cannot be found in the schema

I want to load all the partitions from gcs to bigquery. What modifications do I need to make for this operator to work?

praneethh
  • 263
  • 4
  • 16
  • Hi, which approach did you use when creating partitioned table? There are two options, first ingestion time partitioned table [1], second sharding table [2]. I asked that question because of naming structure. When creating a sharding table you have to use [PREFIX]_YYYYMMDD . May that difference could lead to conflict. [1]: https://cloud.google.com/bigquery/docs/partitioned-tables#ingestion_time [2]: https://cloud.google.com/bigquery/docs/partitioned-tables#dt_partition_shard – awfullyCold Apr 08 '21 at 10:34
  • Hi, I used time partitioned table( Partitioned by Day) – praneethh Apr 09 '21 at 14:34
  • Does this answer your question? [How to run cloud composer task which loads data into other project BigQuery Table](https://stackoverflow.com/questions/66173011/how-to-run-cloud-composer-task-which-loads-data-into-other-project-bigquery-tabl) – Raul Saucedo Mar 21 '22 at 18:19

0 Answers0