I successfully deployed Airflow 2.2.4 to GCP Kubernetes Engine. However, I have an issue with logging. I am using the official helm chart. So the, thing is, I defined a connection ID using the Airflow UI and the connection ID works for tasks as they are executed successfully on each run. However, the logs are not written to GCS. I get the following message:
*** Unable to read remote log from gs://airflow_logs_marin/airflow/logs/gcp_movie_ranking/import_in_bigquery/2019-03-01T00:00:00+00:00/14.log
*** 403 GET https://storage.googleapis.com/download/storage/v1/b/airflow_logs_marin/o/airflow%2Flogs%2Fgcp_movie_ranking%2Fimport_in_bigquery%2F2019-03-01T00%3A00%3A00%2B00%3A00%2F14.log?alt=media: Caller does not have storage.objects.get access to the Google Cloud Storage object.: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
*** Log file does not exist: /opt/airflow/logs/gcp_movie_ranking/import_in_bigquery/2019-03-01T00:00:00+00:00/14.log
*** Fetching from: http://airflow-worker-6974d49bb7-r99kc:8793/log/gcp_movie_ranking/import_in_bigquery/2019-03-01T00:00:00+00:00/14.log
*** Failed to fetch log file from worker. [Errno -2] Name or service not known
I am using the Celery executor. How to solve this issue? I keep banging my head over and over but keep getting the same error message. I can confirm with the UI that remote logging is enabled and that the remote base log is set:
[logging]
colored_console_log = False
remote_base_log_folder = gs://airflow_logs_marin/airflow/logs/
remote_log_conn_id = google_cloud
remote_logging = True
The helm chart yaml is overriden with the following values:
config:
logging:
remote_logging: 'True'
remote_base_log_folder: 'gs://airflow_logs_marin/airflow/logs/'
remote_log_conn_id: 'google_cloud'
Any suggestions would be much appreciated.