2

I need to pass a job_id parameter to my object DatabricksRunNowOperator(). The job_id is the result of executing the databricks jobs create --json '{myjson} command.

$ databricks jobs create --json '{myjson}'

{job_id: 12}

import os
import subprocess    
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.contrib.operators.databricks_operator import DatabricksRunNowOperator

def pull_function():
    returned_output = subprocess.check_output("echo ti.xcom_pull(key='jobid_CreateCreateRobot')")
    return returned_output


dag_CreateRobot = DAG(dag_id='CreateRobot', 
                      default_args={'owner': 'eric', 
                                  'email': [],
                                  'depends_on_past': False, 
                                  'start_date':'2019-09-16 16:48:28.803023', 
                                  'provide_context': True}, 
                      schedule_interval='@once')

CreateRobot = BashOperator(dag=dag_CreateRobot, 
                                 task_id='CreateRobot', 
                                 bash_command="databricks jobs create --json '{myjson}')")\

RunRobot = DatabricksRunNowOperator(dag=dag_CreateRobot, 
                                    task_id=ti.xcom_pull('RunCreateRobot'), 
                                    job_id=pull_function(), 
                                    databricks_conn_id='myconn', 
                                    json={'token': 'mytoken' })

RunRobot.set_upstream(CreateRobot)

I wrote this code for explaining my goal but it does not work. How can I do for using the result of a BashOperator task into other task that depends of it?

Eric Bellet
  • 1,732
  • 5
  • 22
  • 40
  • Hi, you need to use XCOM (Cross-Comunication) to transfer data between tasks. Check this link: https://airflow.apache.org/concepts.html#xcoms – A.Villegas Sep 16 '19 at 16:02
  • @A.Villegas yes, I tried to use it but I fail. Can you adapt my code to a works solution? – Eric Bellet Sep 16 '19 at 16:03

1 Answers1

2

The bash command in the BashOperator needs to be $ databricks jobs create --json '{myjson}'

i.e.

CreateRobot = BashOperator(dag=dag_CreateRobot, 
                            task_id='CreateRobot', 
                            bash_command="databricks jobs create --json '{myjson}')",
                            xcom_push=True #Specify this in older airflow versions)

The above operator when executed pushes the last line of the output to xcom. (https://airflow.apache.org/_modules/airflow/operators/bash_operator.html)

The xcom value can be accessed using :

ti.xcom_pull(task_ids='CreateRobot')
igr
  • 10,199
  • 13
  • 65
  • 111
jay.cs
  • 183
  • 1
  • 7
  • I obtained this error: "airflow name 'ti' is not defined" – Eric Bellet Sep 17 '19 at 06:52
  • pull function needs to be in this way - `def pull_function(**kwargs): ti = kwargs['ti'] ret = ti.xcom_pull(task_ids='CreateRobot') print(f'ret:{ret}')` – jay.cs Sep 17 '19 at 19:57