5

I'm running a DAG in Google Cloud Composer (hosted Airflow) which runs fine in Airflow locally. All it does is print "Hello World". However, when I run it through Cloud Composer I receive the error:

*** Log file does not exist: /home/airflow/gcs/logs/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log
*** Fetching from: http://airflow-worker-d775d7cdd-tmzj9:8793/log/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-d775d7cdd-tmzj9', port=8793): Max retries exceeded with url: /log/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8825920160>: Failed to establish a new connection: [Errno -2] Name or service not known',))

I've also tried making the DAG add data into a database and it actually succeeds 50% of the time. However, it always returns this error message (and no other print statements or logs). Any help much appreciated on why this might be happening.

Matt
  • 1,368
  • 1
  • 26
  • 54
  • Hi! I would like to ask you for more information. Are you using [self-managed](https://cloud.google.com/composer/docs/how-to/managing/deploy-webserver) Airflow web server? What is the version of Composer&Airflow? It happens, that the logs take around 10mins to appear but the speed at which tasks are run is normal? I recommend looking at your bucket for this environment and possibly delete some old log an unused files. Moreover, you can always check your logs in Stackdriver Logging. Let me know about the results. – aga Apr 21 '20 at 07:53

5 Answers5

4

We also faced the same issue then raised a support ticket to GCP and got the following reply.

  1. The message is related to the latency of syncing logs from Airflow workers to WebServer, it takes at least some minutes (depending on the number of objects and their size) The total log size seems not large but it’s enough to noticeably slow down synchronization, hence, we recommend cleanup/archive the logs

  2. Basically we recommend relying on Stackdriver logs instead, because of latency due to the design of this sync

I hope this will help you solve the problem.

SANN3
  • 9,459
  • 6
  • 61
  • 97
  • Amazing, couldn't find any info on the issue so I'm glad I'm not the only one having it. So, is the best move here to simply clean up / archive the logs regularly since GCP doesn't allow you to override the logging location? – Matt Apr 22 '20 at 13:15
  • I just deleted all the log files and still receive the same error message. Does this make sense to you? – Matt Apr 22 '20 at 13:53
  • In the older version of composer we have not faced this issue, only in the latest version we are facing, the only option is using stackdriver. – SANN3 Apr 23 '20 at 15:51
  • @SANN3 I am facing this issue with Composer version 1.17.2 and Airflow version 2.1.2. It is not showing up everytime but at fewer instances. Did setting the task retries to greater than 1 helps ? – codninja0908 Jan 11 '22 at 10:08
4

I have the same problem after upgrading from 1.10.3 to 1.10.6 of Google Composer. I can see in my logs that airflow is trying to get the logs from a bucket with a name ended with -tenant while the bucket in my account ends with -bucket

In the configuration, I can see something weird too.

## airflow.cfg
[core]
remote_base_log_folder = gs://us-east1-dada-airflow-xxxxx-bucket/logs

## also in the running configuration says
core    remote_base_log_folder  gs://us-east1-dada-airflow-xxxxx-tenant/logs   env var

I wrote to google support and they said the team is working on a fix.

EDIT: I've been accessing my logs with gsutil and replacing the bucket name suffix to -bucket

gsutil cat gs://us-east1-dada-airflow-xxxxx-bucket/logs/...../5.logs
blackjid
  • 1,571
  • 16
  • 23
1

I faced the same situation in multiple occasions. As soon as when the job finished when I take a look at the log on Airflow Web UI, it used to give me the same error. Although when I check back the same logs on UI after a min or 2, I could see the logs properly. As per the above answers, its a sync issue between the webserver and the Worker node.

Nandakishore
  • 981
  • 1
  • 9
  • 22
0

In general, the issue describe here should be more like a sporadic issue.

In certain situations, what could help is setting default-task-retries to a value that allows for retrying a task at least 1.

-1

This issue is resolved at least since Airflow version: 1.10.10+composer.

benjo53
  • 9
  • 2