How to run cloud composer task which loads data into other project BigQuery Table

Question

I have my cloud composer environment created under project-A and I want to load data into other project-B BigQuery table. I know about task its GCSToBigQueryOperator but its not succeeding its failing, I want to know how can I achieve this. From project A i want to run a task which loads data into project B table.

score 2 · Answer 1 · answered Feb 13 '21 at 00:40

2

Based on my experience and assumptions of your condition, I think you need to make sure that your service account (bigquery_conn_id and google_cloud_storage_conn_id) has sufficient permissions in both projects.

answered Feb 13 '21 at 00:40

shankshera

947
3
20
45

yeah, this what I did to make it work, thanks – user3065757 Feb 26 '21 at 12:31

score 0 · Answer 2 · answered Feb 25 '21 at 09:36

As shankshera mentioned first check in GCP IAM that the service account you use in cloud composer environment has access to both projects (and datasets in BigQuery).

To be honest I also couldn't make this operator work properly for me so I wrote custom python function to do the same thing.

    from google.cloud import bigquery
def load_into_table_from_csv(**kwargs):
    
    """
    Loads data into specified BQ table from specified CSV file in GCS
    
    Receives parameters from table_path and file_path from PythonOperator in Airflow. 
    Parameters need to be explicitly specified in op_kwargs variable in the task definition

Example of op_kwargs for PythonOperator:
{'table_path':'project_id.dataset_id.table_id',
'file_path':'gs://bucket_name/file_name.csv',
'delimiter':',' ,
'quote_character':'"'}

    """
    bigquery_client = bigquery.Client()
    dataset_ref = kwargs['table_path']
    try: 
        file=eval(kwargs["file_path"])
    except:
        file=kwargs["file_path"]
    finally:
        delimiter=kwargs["delimiter"]
        quote_character=kwargs["quote_character"]

        job_config = bigquery.LoadJobConfig()
        job_config.field_delimiter = delimiter #delimeter in the source file
        job_config.skip_leading_rows = 1 #how many rows to skip (set to 1 if you have a header row)
        job_config.quote_character=quote_character
        job_config.write_disposition ='WRITE_TRUNCATE' #https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.write_disposition

        load_job = bigquery_client.load_table_from_uri(
            file,
            dataset_ref,
            job_config=job_config)

        assert load_job.job_type == 'load'

        load_job.result()  # Waits for table load to complete.

        assert load_job.state == 'DONE'

Than in the dag itself you can just use this function and provide parameters for it like so:

t8 = PythonOperator(
    task_id=f"load_{table_name}",
    python_callable=load_into_table_from_csv, #function that's called by the task
    op_kwargs=specs_current_table, #passing arguments into a function
    dag=dag
)

As a side note, I personally agree with author of this article https://medium.com/bluecore-engineering/were-all-using-airflow-wrong-and-how-to-fix-it-a56f14cb0753 and we should be careful with using to many custom operators if we can do the same thing with vanilla code

How to run cloud composer task which loads data into other project BigQuery Table

2 Answers2

Linked