2

I successfully deployed Airflow 2.2.4 to GCP Kubernetes Engine. However, I have an issue with logging. I am using the official helm chart. So the, thing is, I defined a connection ID using the Airflow UI and the connection ID works for tasks as they are executed successfully on each run. However, the logs are not written to GCS. I get the following message:

*** Unable to read remote log from gs://airflow_logs_marin/airflow/logs/gcp_movie_ranking/import_in_bigquery/2019-03-01T00:00:00+00:00/14.log
*** 403 GET https://storage.googleapis.com/download/storage/v1/b/airflow_logs_marin/o/airflow%2Flogs%2Fgcp_movie_ranking%2Fimport_in_bigquery%2F2019-03-01T00%3A00%3A00%2B00%3A00%2F14.log?alt=media: Caller does not have storage.objects.get access to the Google Cloud Storage object.: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)

*** Log file does not exist: /opt/airflow/logs/gcp_movie_ranking/import_in_bigquery/2019-03-01T00:00:00+00:00/14.log
*** Fetching from: http://airflow-worker-6974d49bb7-r99kc:8793/log/gcp_movie_ranking/import_in_bigquery/2019-03-01T00:00:00+00:00/14.log
*** Failed to fetch log file from worker. [Errno -2] Name or service not known

I am using the Celery executor. How to solve this issue? I keep banging my head over and over but keep getting the same error message. I can confirm with the UI that remote logging is enabled and that the remote base log is set:

[logging]
colored_console_log = False
remote_base_log_folder = gs://airflow_logs_marin/airflow/logs/
remote_log_conn_id = google_cloud
remote_logging = True

The helm chart yaml is overriden with the following values:

config:
  logging:
    remote_logging: 'True'
    remote_base_log_folder: 'gs://airflow_logs_marin/airflow/logs/'
    remote_log_conn_id: 'google_cloud'

Any suggestions would be much appreciated.

Marin
  • 861
  • 1
  • 11
  • 27
  • What [version](https://cloud.google.com/composer/docs/concepts/versioning/composer-versions) is your Cloud Composer/Airflow image? Or are you using a custom image to deploy it in Kubernetes? Did you try to follow any specific guides for this deployment? – Andrés Apr 06 '22 at 20:22

1 Answers1

0

I suggest using a Cloud Composer 2 environment instead of Helm.

When attempting to access Google Cloud Storage buckets from Cloud Composer 1 GKE clusters, you may see an error such as:

AccessDeniedException: 403 Caller does not have storage.objects.list access to the Google Cloud Storage bucket

403 errors could be caused because of using a Cloud Composer 1 environment, or simply not using a Cloud Composer 2 environment (i.e., using a custom image/Helm and missing a step).

NOTE: Cloud Composer 2 environments are not affected by this issue.

I suggest that you upgrade to a Cloud Composer 2 environment.

Andrés
  • 487
  • 1
  • 12
Leo
  • 695
  • 3
  • 11