I am using bashOperator to execute ETL script on GCP compute engine and some files can take more than 10hrs to complete.
Since I am using compute engine to execute how can I set bashoperator task to success and make the dag run to success so that I can start new dag to execute new ETL script on different compute engine?
(I can run multiple dags in parallel but I will have 20-30 ETL scripts running on different compute engines, so I need to mark the bashoperator to success once the execution on compute engine is started)
t1 = PythonOperator(
task_id='check_running_file',
python_callable=check_running_file,
dag=dag
)
t2 = BashOperator(
task_id='start_vm',
bash_command="gcloud compute instances start vm-1 --zone=zone",
dag=dag
)
bash_task = bash_operator.BashOperator(
task_id='script_execution',
bash_command='gcloud compute ssh --project '+PROJECT_ID+ ' --zone '+ZONE+' '+GCE_INSTANCE+' --command '+command,
dag=dag)
def set_task_status(**context):
utc_now = datetime.utcnow().replace(tzinfo=timezone.utc)
task_instance=TaskInstance(task_id='bash_task', dag_id='process_folders', execution_date=utc_now)
task_instance.set_state(state=State.SUCCESS)
set_task_instance = PythonOperator(
task_id='set_status',
python_callable=set_task_status,
provide_context=True,
dag=dag,
)
set_task_instance.pre_execute = lambda **x: time.sleep(300)
t1>>t2>>[bash_task,set_task_instance ]
How do I set bash_task to success after 5min? I tried set_state but set_task_instance is throwing error and looking for dag rather than task.