9

I'm running Airflow on a 4 CPU machine with LocalExecutor

I've defined an upstream task to be one success

create_spark_cluster_task = BashOperator(
    task_id='create_spark_cluster',
    trigger_rule='one_success',
    bash_command= ...,
    dag=dag)

...

download_bag_data_task >> create_spark_cluster_task
download_google_places_data_task >> create_spark_cluster_task
download_facebook_places_details_data_task >> create_spark_cluster_task
download_facebook_places_details_data_task_2 >> create_spark_cluster_task
download_facebook_places_details_data_task_3 >> create_spark_cluster_task
download_factual_data_task >> create_spark_cluster_task
download_dataoutlet_data_task >> create_spark_cluster_task

But even though some are clearly marked as success the task does not trigger

The 'download tasks' do run in parallel, so that cannot be the issue

graph

Inspecting the tasks shows:

Dependency: Unknown

Reason: All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless: - The scheduler is down or under heavy load - This task instance already ran and had it's state changed manually (e.g. cleared in the UI)

I've looked at the load and it's indeed pretty high:

load average: 2.45, 3.55, 3.71 CPU is at 50-60%

But that other tasks have already finished, so there should be resources free to start another task, right?

Tom Lous
  • 2,819
  • 2
  • 25
  • 46

0 Answers0