I have my cloud composer environment created under project-A and I want to load data into other project-B BigQuery table. I know about task its GCSToBigQueryOperator but its not succeeding its failing, I want to know how can I achieve this. From project A i want to run a task which loads data into project B table.
2 Answers
Based on my experience and assumptions of your condition, I think you need to make sure that your service account (bigquery_conn_id
and google_cloud_storage_conn_id
) has sufficient permissions in both projects.

- 947
- 3
- 20
- 45
-
yeah, this what I did to make it work, thanks – user3065757 Feb 26 '21 at 12:31
As shankshera mentioned first check in GCP IAM that the service account you use in cloud composer environment has access to both projects (and datasets in BigQuery).
To be honest I also couldn't make this operator work properly for me so I wrote custom python function to do the same thing.
from google.cloud import bigquery
def load_into_table_from_csv(**kwargs):
"""
Loads data into specified BQ table from specified CSV file in GCS
Receives parameters from table_path and file_path from PythonOperator in Airflow.
Parameters need to be explicitly specified in op_kwargs variable in the task definition
Example of op_kwargs for PythonOperator:
{'table_path':'project_id.dataset_id.table_id',
'file_path':'gs://bucket_name/file_name.csv',
'delimiter':',' ,
'quote_character':'"'}
"""
bigquery_client = bigquery.Client()
dataset_ref = kwargs['table_path']
try:
file=eval(kwargs["file_path"])
except:
file=kwargs["file_path"]
finally:
delimiter=kwargs["delimiter"]
quote_character=kwargs["quote_character"]
job_config = bigquery.LoadJobConfig()
job_config.field_delimiter = delimiter #delimeter in the source file
job_config.skip_leading_rows = 1 #how many rows to skip (set to 1 if you have a header row)
job_config.quote_character=quote_character
job_config.write_disposition ='WRITE_TRUNCATE' #https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.write_disposition
load_job = bigquery_client.load_table_from_uri(
file,
dataset_ref,
job_config=job_config)
assert load_job.job_type == 'load'
load_job.result() # Waits for table load to complete.
assert load_job.state == 'DONE'
Than in the dag itself you can just use this function and provide parameters for it like so:
t8 = PythonOperator(
task_id=f"load_{table_name}",
python_callable=load_into_table_from_csv, #function that's called by the task
op_kwargs=specs_current_table, #passing arguments into a function
dag=dag
)
As a side note, I personally agree with author of this article https://medium.com/bluecore-engineering/were-all-using-airflow-wrong-and-how-to-fix-it-a56f14cb0753 and we should be careful with using to many custom operators if we can do the same thing with vanilla code

- 303
- 1
- 7