0

I'm new to airflow, really appreciate any help for the following problem. I tried to run the airflow webserver on my laptop.

Theoratically, I set the start_time=datetime.now(), it should be run successfully when I manually ran the dag on the webserver,but it changed overtime, it had been either queued or successful. Sometimes it was successful(but runtime is 00:00:00, and obviously my dag hadn't been ran), and sometimes it was just queued.

Here's the code in my DAG:

from datetime import datetime
from airflow import DAG
from airflow.models import Variable
from airflow.operators.python import PythonOperator

def get_var():
    #a=Variable.get('abc')
    print('abd')

with DAG(dag_id='test_var',start_date=datetime.now()) as dag:
    task1=PythonOperator(task_id='var',python_callable=get_var)

However, every time when I check the Graph bar in the airflow webUI, it shows up as the below picture:

Dag Has Yet To Run

I'm not sure if it matters with the way I initialize airflow, I follow the below steps:

  1. airflow webserver -p 8080

  2. airflow db init --- These two steps worked, yet the third step ---

  3. airflow scheduler

[2022-10-31 09:46:45,562] {scheduler_job.py:701} INFO - Starting the scheduler
[2022-10-31 09:46:45,562] {scheduler_job.py:706} INFO - Processing each file at most -1 times
[2022-10-31 09:46:45,565] {executor_loader.py:107} INFO - Loaded executor: SequentialExecutor
[2022-10-31 09:46:45,569] {manager.py:163} INFO - Launched DagFileProcessorManager with pid: 13315
[2022-10-31 09:46:45,570] {scheduler_job.py:1381} INFO - Resetting orphaned tasks for active dag runs
[2022-10-31 09:46:46,169] {settings.py:58} INFO - Configured default timezone Timezone('UTC')
[2022-10-31T09:46:46.172+0800] {manager.py:409} WARNING - Because we cannot use more than 1 thread (parsing_processes = 2) when using sqlite. So we set parallelism to 1.
[2022-10-31 09:46:46 +0800] [13314] [INFO] Starting gunicorn 20.1.0
[2022-10-31 09:46:46 +0800] [13314] [ERROR] Connection in use: ('::', 8793)
[2022-10-31 09:46:46 +0800] [13314] [ERROR] Retrying in 1 second.
[2022-10-31 09:46:47 +0800] [13314] [ERROR] Connection in use: ('::', 8793)
[2022-10-31 09:46:47 +0800] [13314] [ERROR] Retrying in 1 second.
[2022-10-31 09:46:48 +0800] [13314] [ERROR] Connection in use: ('::', 8793)
[2022-10-31 09:46:48 +0800] [13314] [ERROR] Retrying in 1 second.
[2022-10-31 09:46:49 +0800] [13314] [ERROR] Connection in use: ('::', 8793)
[2022-10-31 09:46:49 +0800] [13314] [ERROR] Retrying in 1 second.
[2022-10-31 09:46:50 +0800] [13314] [ERROR] Connection in use: ('::', 8793)
[2022-10-31 09:46:50 +0800] [13314] [ERROR] Retrying in 1 second.
[2022-10-31 09:46:51 +0800] [13314] [ERROR] Can't connect to ('::', 8793)

It turned out like this. Does this have anything to do with my DAG operation on the webUI? Thanks for your time and help!

I tried to search to another stackflow post about `[ERROR] Can't connect to ('::', 8793), but they only discussed about the webserver stuff, and also I'm not sure if the reason that my dag couldn't work is because of airflow scheduler

lynn kuo
  • 11
  • 1
  • change the start_date to something not dynamic like datetime.now(). for example datetime(2022, 1, 1) – ozs Oct 31 '22 at 12:30

2 Answers2

0

Airflow Scheduler service needs to be running to get the Airflow DAGs to work. In your case, it seems another service is already using port 8793 (default for scheduler) which is probably why Airflow Scheduler service is unable to start.

Consider the following:

  1. Use Executor = LocalExecutor
  2. Running Airflow Scheduler in the Background first with the command: airflow scheduler &
  3. Use something like this for defining your start_date
    start_date = datetime.now(local_tz) - timedelta(1)

I would also recommend using with Docker & Docker-compose to isolate your environment and all python dependency issues associated with running Airflow directly on your laptop.

I found this great article. You can also refer to Airflow's official documentation around this.

Vinay Kulkarni
  • 300
  • 1
  • 5
  • 13
0

Based @Vinay's answer, I found that if the start_date of dag is in the future, then even if you have scheduled and run a dag manually, you will not be able to see the logs. You will get the "Dag is yet to run" message.

changing the start_date to the past and rerunning will get you the data.

Sharmila
  • 1,637
  • 2
  • 23
  • 30