1

I try to use xcomm_pull to insert a data_key_param calculated by the python_operator and pass it to the bigquery_operator. The python operator return the output as string e.g. "2020-05-31".

I got an error when running the BigqueryOperator: "Dependencies Blocking Task From Getting Scheduled" - Could not cast literal "{xcom_pull(task_ids[\'set_date_key_param\'])[0] }"

The sql attribute value returned from the Airflow GUI after task execution:
SELECT DATE_KEY, count(*) as COUNT
FROM my-project.my_datasets.source_table
WHERE DATE_KEY = {{ task_instance.xcom_pull(task_ids='set_date_key_param') }}
GROUP BY DATE_KEY

Code below (I have already treid to use '{{' and '}}' to enclose the task_instance.xcom...):

def set_date_key_param():

    # a business logic here
    return "2020-05-31" # example results

# task 1

set_date_key_param = PythonOperator(
    task_id='set_date_key_param',
    provide_context=True,
    python_callable=set_date_key_param,
    dag=dag
)

# taks 2

load_data_to_bq_table = BigQueryOperator(
    task_id='load_data_to_bq_table',
    sql="""SELECT DATE_KEY, count(*) as COUNT
    FROM `{project}.{dataset}.source_table` 
    WHERE DATE_KEY = {{{{ task_instance.xcom_pull(task_ids='set_date_key_param') }}}}
    GROUP BY DATE_KEY""".format(
        project=PROJECT_ID,
        env=ENV
),
use_legacy_sql=False,
destination_dataset_table="{project}.{dataset}.target_table".format(
    project=PROJECT_ID,
    dataset=BQ_TARGET_DATASET,
),
write_disposition="WRITE_TRUNCATE",
create_disposition="CREATE_NEVER",
trigger_rule='all_success',
dag=dag

)

set_date_key_param >> load_data_to_bq_table

Tomasz Kubat
  • 102
  • 2
  • 11
  • 1
    Could you try `\"{{ task_instance.xcom_pull(task_ids='set_date_key_param') }}\"` and what @SergiyKolesnikov said? – Emma Jun 01 '20 at 16:17
  • \"{{ task_instance.xcom_pull(task_ids='set_date_key_param') }}\" - nope, doesn't work :( – Tomasz Kubat Jun 01 '20 at 16:49
  • hello @TomaszKubat, would you please mind sharing the code. I am also facing similar problem. I am using python operator in which I am calling big query operator with context as TRUE. still it doesn't work. – radhika sharma Nov 28 '21 at 11:19

2 Answers2

3

I think the string formatting and jinja template is conflicting each other.

In your use case where leveraging xcom, I think it makes sense to use jinja template.

load_data_to_bq_table = BigQueryOperator(
    task_id='load_data_to_bq_table',
    sql="""SELECT DATE_KEY, count(*) as COUNT
        FROM `{{ params.project }}.{{ params.dataset }}.source_table` 
        WHERE DATE_KEY = \"{{ task_instance.xcom_pull(task_ids='set_date_key_param') }}\"
        GROUP BY DATE_KEY""",
    params={
        'project': PROJECT_ID,
        'env': ENV   # env or dataset??, match this name to the params key in sql
    }
)
Emma
  • 8,518
  • 1
  • 18
  • 35
  • Should params be passed as a Python dictionary with the ":" separating the key values? Like 'project':PROJECT_ID? I have got a syntax error with the "="? – Tomasz Kubat Jun 01 '20 at 18:11
  • Or do you mean 'query_params' not 'params'? – Tomasz Kubat Jun 01 '20 at 18:22
  • But you are right! When i excluded the string format and leave just the jinja template the parameter substitution works! – Tomasz Kubat Jun 01 '20 at 18:42
  • `params` didn't work? `params` supposed to be the key for jinja templating afaik. – Emma Jun 01 '20 at 21:57
  • It seems that there is a Airflow 1.10.2 known bug with passing the sql_params to the bigquery_operator: https://stackoverflow.com/questions/56287061/how-to-pass-query-parameter-to-sql-file-using-bigquery-operator I have changed the bigquery_operator for the python_operator. – Tomasz Kubat Jun 02 '20 at 06:20
0

You named the Python callable and the variable to hold the first Python operator the same: set_date_key_param. Rename the Python callable (e.g. set_date) and change the parameters for the Python operator accordingly.

SergiyKolesnikov
  • 7,369
  • 2
  • 26
  • 47