7

I have a job with 3 tasks 1) Get a token using a POST request 2) Get token value and store in a variable 3) Make a GET request by using token from step 2 and pass bearer token

Issue is step 3 is not working and i am getting HTTP error. I was able to print the value of token in the step 2 and verified in the code

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(2),
    'email': ['airflow@example.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}
token ="mytoken" //defined with some value which will be updated later

get_token = SimpleHttpOperator(
        task_id='get_token',
        method='POST',
        headers={"Authorization": "Basic xxxxxxxxxxxxxxx=="},
        endpoint='/token?username=user&password=pass&grant_type=password',
        http_conn_id = 'test_http',
        trigger_rule="all_done",
        xcom_push=True,
        dag=dag
    )

def pull_function(**context):
    value = context['task_instance'].xcom_pull(task_ids='get_token')
    print("printing token")
    print value
    wjdata = json.loads(value)
    print(wjdata['access_token'])
    token=wjdata['access_token']
    print token


run_this = PythonOperator(
    task_id='print_the_context',
    provide_context=True,
    python_callable=pull_function,
    dag=dag,
)

get_config = SimpleHttpOperator(
        task_id='get_config',
        method='GET',
        headers={"Authorization": "Bearer " + token},
        endpoint='someendpoint',
        http_conn_id = 'test_conn',
        trigger_rule="all_done",
        xcom_push=True,
        dag=dag
    )

get_token >> run_this >> get_config
Mohan
  • 131
  • 3
  • 10
  • in Airflow 2, this workflow can be handled more simpler. please refer to https://stackoverflow.com/a/68711555/1743724 – smbanaei Aug 09 '21 at 11:49

1 Answers1

5

The way you are storing token as a "global" variable won't work. The Dag definition file (the script where you defined the tasks) is not the same runtime context as the one for executing each task. Every task can be run in a separate thread, process, or even on another machine, depending on the executor. The way you pass data between the tasks is not by global variables, but rather using the XCom - which you already partially do. Try the following: - remote the global token variable - in pull_function instead of print token do return token - this will push the value to the XCom again, so the next task can access it - access the token from XCom in your next task.

The last step is a bit tricky since you are using the SimpleHttpOperator, and it's only templated fields are endpoint and data, but not headers. For example, if you wanted to pass in some data from the XCom of a previous task, you would do something like this:

get_config = SimpleHttpOperator(
        task_id='get_config',
        endpoint='someendpoint',
        http_conn_id = 'test_conn',
        dag=dag,
        data='{{ task_instance.xcom_pull(task_ids="print_the_context", key="some_key") }}'
    )

But you can't do the same with the headers unfortunately, so you have to either do it "manually" via a PythonOperator, or you could inherit SimpleHttpOperator and create your own, something like:

class HeaderTemplatedHttpOperator(SimpleHttpOperator):
    template_fields = ('endpoint', 'data', 'headers')  # added 'headers' headers

then use that one, something like:

get_config = HeaderTemplatedHttpOperator(
        task_id='get_config',
        endpoint='someendpoint',
        http_conn_id = 'test_conn',
        dag=dag,
        headers='{{ task_instance.xcom_pull(task_ids="print_the_context") }}'
    )

Keep in mind I did no testing on this, it's just for the purpose of explaining the concept. Play around with the approach and you should get there.

bosnjak
  • 8,424
  • 2
  • 21
  • 47
  • def pull_function(**context): value = context['task_instance'].xcom_pull(task_ids='task1') wjdata = json.loads(value) token=wjdata['access_token'] context['task_instance'].xcom_push(key="token", value=token) return token task2 = PythonOperator( task_id='task2', provide_context=True, python_callable=pull_function, dag=dag, ) task3 = SimpleHttpOperator( task_id='task3', method='GET', headers={"Authorization": "Bearer " + {{ task_instance.xcom_pull(task_ids='task2')}} }, – Mohan May 12 '19 at 04:09
  • Thanks @bosnjak; Basically you are saying in the below headers={"Authorization": "Bearer " + token} , token will be replaced by '{{ task_instance.xcom_pull(task_ids="print_the_context") }}'. I am getting error saying task_instance is not recognized – Mohan May 12 '19 at 04:20
  • Yes, something like that. I can't really read unformatted code in the comments, but the idea is to replace the value with a template using the macros you can find in the official documentation: https://airflow.apache.org/macros.html – bosnjak May 13 '19 at 11:34
  • 1
    def pull_function(**context): value = context['task_instance'].xcom_pull(task_ids='task1') return token task2 = PythonOperator( task_id='task2', provide_context=True, python_callable=pull_function, dag=dag, ) task3 = SimpleHttpOperator( task_id='task3', method='GET', headers={"Authorization": "Bearer " + {{ task_instance.xcom_pull(task_ids='task2')}} }, endpoint='someendpoint, http_conn_id = 'test_reltio', trigger_rule="all_done", dag=dag ) – Mohan May 20 '19 at 02:05
  • 1
    Python code in the comments section is not working out, because whitespace is important. Please paste the code in the original question and format properly. Also, what is the actual problem, rather than just wanting a code review? Have you tried it and it's not working, or? – bosnjak May 20 '19 at 07:53
  • Thanks for your solution it worked. Just one problem i am facing is getting u before Bearer header '{'Authorization': u'Bearer 6d08d41c-c14e-49d7-b8b6-62a803bbf014'}'. The token is return by "{{ task_instance.xcom_pull(task_ids='task2') }}". Any thoughts how can i rid of u – Mohan May 21 '19 at 21:57
  • The `u` is not a part of the string, it's in front of it. It means that it's actually unicode. Should work for your purpose without changing anything. But if you really need utf-8, you can do `.encode('utf8')`. Are you using Python2? – bosnjak May 21 '19 at 22:24
  • Yes i am using python2. If i hard code the token the rest header looks below and it works '{'Authorization': 'Bearer c596b795-5612-4d60-b58e-079c296362b4'}'. But when it has u before Bearer it is not working. – Mohan May 22 '19 at 01:22
  • As I said, the `u` is an indication that it's a unicode, not bytes. Make sure you encode it using `.encode('utf8')`. For example, before returning token from `pull_function` you can do `return token.encode('utf8')`. – bosnjak May 22 '19 at 07:37
  • Thanks, you are right u was not the issue. Thanks again. – Mohan May 24 '19 at 02:16
  • If the answer resolved the issue, you can mark it as accepted by clicking on the check mark on the left, next to the title. Cheers! – bosnjak May 24 '19 at 07:15