2

I have scheduled my airflow DAGs to run, every DAG has one task inside of them. When the DAGs run, the tasks inside them don't get executed. enter image description here

Here's my code for the same (I am trying to SSH into an EC2 server and run a bash command):

from datetime import timedelta, datetime
from airflow import DAG
from airflow.contrib.operators.ssh_operator import SSHOperator


default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email': ['removed@example.com'],
    'email_on_failure': True,
    'email_on_retry': True,
    'start_date': datetime.now() - timedelta(days=1),
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}

dag = DAG(dag_id='back_fill_reactivated_photo_dimension',
          default_args=default_args,
          schedule_interval='55 * * * *',
          dagrun_timeout=timedelta(seconds=120))

t1_bash = """
/usr/local/bin/dp/database_jobs/run_py.sh "backfill_photo_dim_reactivated.py"
"""

t1 = SSHOperator(
    ssh_conn_id='ssh_aws_ec2',
    task_id='backfill_photo_dim',
    command=t1_bash,
    dag=dag)

The Airflow UI shows the DAG to be in the running state but the actual task inside the DAG never runs, am I missing something in my code? enter image description here

Also, is there a way to force run a DAG regardless of it's CRON schedule?

Jason C
  • 38,729
  • 14
  • 126
  • 182
hky404
  • 1,039
  • 3
  • 17
  • 35

3 Answers3

2

Most likely you do not have scheduler running.

Run airflow scheduler -D to turn it in a background. That should resolve the issue.

kotartemiy
  • 94
  • 1
  • 3
1

A task stuck in "scheduled" generally means you have no pool or no queue available. Are you using local executor? if yes, is the scheduler running?

You can force run (or test) a task using the command line.

Breathe
  • 714
  • 5
  • 21
1

There is nothing wrong with you dag check your configurations.Can you share your cfg file

  • here you go: https://codeshare.io/anMVBY just an FYI: I am using my own AWS RDS database to store all the airflow metadata. – hky404 Jul 05 '19 at 16:14
  • what msg broker are you using(i.e rabbit mq) – Ravi Ranjan Jul 05 '19 at 16:34
  • is this issue for this dag or for all dags – Ravi Ranjan Jul 05 '19 at 16:36
  • can you change you executor to Local executor in cfg file and try if its works then there is configuration issue for sure – Ravi Ranjan Jul 05 '19 at 16:37
  • will do that, and will get back to you. – hky404 Jul 05 '19 at 17:04
  • it worked, when I switched the executor to `Sequential` , now the problem is - the DAG only runs once, whenever I trigger `airflow scheduler` from the command line, why is that? why is it not following the cron schedule I specified in the code? – hky404 Jul 08 '19 at 14:27
  • here you go: https://codeshare.io/2pERYD (this is the output, after I manually triggered `airflow scheduler`) so, I have to manually trigger my airflow everytime I have to run a DAG, it's not picking up the CRON schedule. But it does pick up when I do `airflow scheduler`. Do I have to use something like Supervisor so that airflow scheduler runs forever. Appreciate your help, Ravi. – hky404 Jul 08 '19 at 17:25
  • and this is the output of my `airflow scheduler` which seems to be working fine: https://codeshare.io/5QL16m – hky404 Jul 08 '19 at 17:29
  • do you think it might be related to this issue? I think I will have to add a fixed date to the `start_date` field instead of `datetime.now() - timedelta(days=1)` - https://stackoverflow.com/questions/40714087/apache-airflow-scheduler-does-not-trigger-dag-at-schedule-time?noredirect=1&lq=1 – hky404 Jul 08 '19 at 17:34
  • just try to give some date in past and check whether its running or not – Ravi Ranjan Jul 08 '19 at 17:54
  • I just did, and also I followed this here for running Airflow webserver and scheduler in the Daemon state (As a background process): `airflow scheduler -D` and `airflow webserver -D` Source: https://stackoverflow.com/questions/46476246/issues-running-airflow-scheduler-as-a-daemon-process may be it will help. – hky404 Jul 08 '19 at 17:56
  • can you type command " ps -ef | grep airflow " on your terminal i want to check whether scheduler is running or not – Ravi Ranjan Jul 08 '19 at 17:56
  • there are so may schedulers running ..that not right... kill them all and start a fresh scheduler – Ravi Ranjan Jul 08 '19 at 18:00
  • 1
    is above issue resolved let me know if you need any help – Ravi Ranjan Jul 11 '19 at 10:44
  • really appreciate you checking up on this, Ravi, everything is running great now. So what I did, I ran Airflow scheduler and Air webserver in the Daemon state, you do that by doing this - `airflow scheduler -D` and `airflow webserver -D` so that way Airflow runs in the background forever and I am using `localExecutor` for running my DAGs – hky404 Jul 11 '19 at 17:49