3

I am trying to invoke Airflow 2.0's Stable REST API from Cloud Composer Version 1 via a Python script and encountered a HTTP 401 error while referring to Triggering DAGS with Cloud Functions and Access the Airflow REST API.

The service account has the following list of permissions:

  • roles/iam.serviceAccountUser (Service Account User)
  • roles/composer.user (Composer User)
  • roles/iap.httpsResourceAccessor (IAP-Secured Web App User, added when the application returned a 403, which was unusual as the guides did not specify the need for such a permission)

I am not sure what is wrong with my configuration; I have tried giving the service account the Editor role and roles/iap.tunnelResourceAccessor (IAP-Secured Tunnel User) & roles/composer.admin (Composer Administrator), but to no avail.

EDIT: I found the source of my problems: The Airflow Database did not have the credentials of the service account in the users table. However, this is unusual as I currently have a service account (the first I created) whose details were added automatically to the table. Subsequent service accounts were not added to the users table when they tried to initially access the REST API, thus returning the 401. I am not sure of a way to create users without passwords since the Airflow web server is protected by IAP.

Wytrzymały Wiktor
  • 11,492
  • 5
  • 29
  • 37
Seng Cheong
  • 442
  • 3
  • 14
  • Where is this python script? Is this on Cloud functions or on your local machine – Prany Oct 26 '21 at 12:25
  • This script is located on a VM. After some additional testing, I have managed to get a Service Account to access the REST API from the VM. But additional service accounts that I have freshly created, after the first service account, are unable to access the REST API. – Seng Cheong Oct 26 '21 at 12:31
  • are those new ServiceAccounts have required permissions ?and is this a onpremise VM or GCE – Prany Oct 26 '21 at 12:34
  • This is a GCE VM, and the new accounts have the required permissions (Composer User, IAP-Secured Web App User). – Seng Cheong Oct 26 '21 at 12:36
  • I am very surprised when I ran `gcloud composer environments run --location= users -- list` to list the users in Airflow. The new service accounts are not present in the users table, only the very first table. – Seng Cheong Oct 26 '21 at 12:39
  • when creating the composer environment (where you choose the service account) it says that the service account is used for run the pods of the application and you can't change it, so I believe you you have to concentrate your calls to the same service account. – ewertonvsilva Oct 26 '21 at 15:12
  • Hi @ewertonvsilva, I believe you are referring to the service account of the Composer Environment. However, I am trying to access the REST API of the underlying Airflow application(which is different all together), and for some reason only the initial service account that interacts with the REST API is working, has its credentials stored in the Airflow DB. The credentials for the subsequent service accounts were not stored in the Airflow DB; for some reason only the first one that I created was stored. – Seng Cheong Oct 26 '21 at 16:22
  • In addition, the first service account in the Airflow DB (that is successfully accessing the Airflow API) is not the same as the service account assigned to the Composer Environment, thus I believe that using different service accounts for both the environment and the client accessing the REST API is not an issue – Seng Cheong Oct 26 '21 at 16:24
  • I've tested it and I noticed somethings. Only the Owner account is created on the airflow DB, and after the first login. To have the service account on Airflow DB, I had to do it manually. You could try to create it manually and test the API with those accounts. `gcloud composer environments run --location= users -- create --use-random-password --username service-account-user --role Op --email @<...>.iam.gserviceaccount.com -f Service -l Account` and then `gcloud composer environments run --location= users -- list` – ewertonvsilva Oct 28 '21 at 14:06
  • Hi @ewertonvsilva, thanks for the update. I will test the manual creation of account credentials in Airflow DB and get back to you in a couple of days. Still, I was able to get other members of my team to authenticate automatically, but service accounts are still proving to be a problem. – Seng Cheong Oct 29 '21 at 03:22

5 Answers5

4

Thanks to answers posted by @Adrie Bennadji and @ewertonvsilva, I was able to diagnose the HTTP 401 issue.

The email field in some of Airflow's database tables that are pertaining to users, have a limit of 64 characters (Type: character varying(64)), as noted in: Understanding the Airflow Metadata Database

Coincidentally, my first service account had an email whose character length was just over 64 characters.

When I tried running the command: gcloud composer environments run <instance-name> --location=<location> users -- create --use-random-password --username "accounts.google.com:<service_accounts_uid>" --role Op --email <service-account-username>@<...>.iam.gserviceaccount.com -f Service -l Account as suggested by @ewertonvsilva to add my other service accounts, they failed with the following error: (psycopg2.errors.StringDataRightTruncation) value too long for type character varying(64).

As a result, I created new service accounts with shorter emails and these were able to be authenticated automatically. I was also able to add these new service accounts with shorter emails to Airflow manually via the gcloud command and authenticate them. Also, I discovered that the failure to add the user upon first acccess to the REST API was actually logged in Cloud Logging. However, at that time I was not aware of how Cloud Composer handled new users accessing the REST API and the HTTP 401 error was a red herring.

Thus, the solution is to ensure that the combined length of your service account's email is lesser than 64 characters.

Seng Cheong
  • 442
  • 3
  • 14
  • 1
    Hey Seng! It seems Google added a snippet on their docs regarding the 64 char limit and suggesting a work-around: https://cloud.google.com/composer/docs/access-airflow-api#access_airflow_rest_api_using_a_service_account This should DEFINITELY be included in the Trigger DAGs with Cloud Functions tutorial: https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf#python – Diana Vazquez Romo Oct 31 '22 at 17:24
  • Hi Diana, noted with thanks. For all other users, please see @DianaVazquezRomo's comment for workarounds. – Seng Cheong May 11 '23 at 15:28
2

ewertonvsilva's solution worked for me (manually adding the service account to Airflow using gcloud composer environments run <instance-name> --location=<location> users -- create ... )

At first it didn't work but changing the username to accounts.google.com:<service_accounts_uid> made it work.

Sorry for not commenting, not enough reputation.

2

Based on @Adrien's Bennadji feedback, I'm posting the final answer.

  • Create the service accounts with the proper permissions for cloud composer;

  • Via gcloud console, add the users in airflow database manually: gcloud composer environments run <instance-name> --location=<location> users -- create --use-random-password --username "accounts.google.com:<service_accounts_uid>" --role Op --email <service-account-username>@<...>.iam.gserviceaccount.com -f Service -l Account

  • And then, list the users with: gcloud composer environments run <env_name> --location=<env_loc> users -- list

use: accounts.google.com:<service_accounts_uid> for the username.

ewertonvsilva
  • 1,795
  • 1
  • 5
  • 15
0

Copying my answer from https://stackoverflow.com/a/70217282/9583820

It looks like instead of creating Airflow accounts with

gcloud composer environments run

You can just use GCP service accounts with email length <64 symbols.

It will work automatically under those conditions:

TL'DR version:

In order to make Airflow Stable API work at GCP Composer:

  1. Set "api-auth_backend" to "airflow.composer.api.backend.composer_auth"
  2. Make sure your service account email length is <64 symbols
  3. Make sure your service account has required permissions (Composer User role should be sufficient)

Longread:

We are using Airflow for a while now, and started with version 1.x.x with "experimental" (now deprecated) API's.

To Authorize, we are using "Bearer" token obtained with service account:

# Obtain an OpenID Connect (OIDC) token from metadata server or using service account.
google_open_id_connect_token = id_token.fetch_id_token(Request(), client_id)

# Fetch the Identity-Aware Proxy-protected URL, including an
# Authorization header containing "Bearer " followed by a
# Google-issued OpenID Connect token for the service account.
resp = requests.request(
    method, url,
    headers={'Authorization': 'Bearer {}'.format(
        google_open_id_connect_token)}, **kwargs)

Now we are migrating to Airflow 2.x.x and faced with exact same issue: 403 FORBIDDEN.

Our environment details are:

composer-1.17.3-airflow-2.1.2 (Google Cloud Platform)

"api-auth_backend" is set to "airflow.api.auth.backend.default".

Documentation claims that:

After you set the api-auth_backend configuration option to airflow.api.auth.backend.default, the Airflow web server accepts all API requests without authentication.

However, this does not seem to be true.

In experimental way, we found that if "api-auth_backend" is set to "airflow.composer.api.backend.composer_auth", Stable REST API (Airflow 2.X.X) starting to work.

But there is other caveat to this: for us, some of our service accounts did work, and some did not. The ones that did not work were throwing "401 Unauthorized" error. We figured out that accounts having email length > 64 symbols were throwing error. Same was observed at this answer.

So after setting "api-auth_backend" to "airflow.composer.api.backend.composer_auth" and making sure that our service account email length is <64 symbols - our old code for Airflow 1.x.x started to work for Authentication. Then we needed to make changes (API URLs and response handling) and stable Airflow (2.x.x) API started to work for us in the same way as it was for Airflow 1.x.x.

UPD: this is a defect in Airflow and will be fixed here: https://github.com/apache/airflow/pull/19932

Anton Kumpan
  • 316
  • 2
  • 9
0

I was trying to invoke Airflow 2.0's Stable REST API from Cloud Composer Version 2 via a Python script and encountered an HTTP 401 error while referring to Triggering DAGS with Cloud Functions and accessing the Airflow REST API.

I used this image version: composer-2.1.2-airflow-2.3.4

I also followed these 2 guides:

But I was always stuck with Error 401, when I tried to run the DAG via the Cloud Function. However, when the DAG was executed from the Airflow UI, it was successful (Trigger DAG in the Airflow UI).


For me the following solution worked:

In the airflow.cfg, set the following settings:

  • api - auth_backends=airflow.composer.api.backend.composer_auth,airflow.api.auth.backend.session

  • api - composer_auth_user_registration_role = Op (default)

  • api - enable_experimental_api = False (default)

  • webserver - rbac_user_registration_role = Op (default)


Service Account:

  • The service account email total length is <64 symbols.

  • The account has these roles:

    • Cloud Composer v2 API Service Agent Extension, Composer User

Airflow UI

  • Add the service account to the Airflow Users via Airflow UI (Security -> List Users with username) = accounts.google.com:<service account uid>, and assign the role of Op to it.

  • You can get the UID from via cloud shell command (see above), or just navigate to the IAM & Admin Page on Google Cloud -> Service Accounts -> Click on the service account and read the Unique ID from the Details page.

  • And now, IMPORTANT!: SET THE ACCOUNT ACTIVE! (In the Airflow UI, check the box "is Active?" to true).

This last step to set it active was not described anywhere, and for long time I just assumed it gets set active when there is an open session (when it makes the calls), but that is not the case. The account has to be set manually active. After that, everything worked fine :)

Other remarks: As I joined a new company, I also had to check some other stuff (maybe this is not related to your problem, but it's good to know anyway - maybe others can use this). I use Cloud Build to deploy the Cloud Functions and the DAGs in the Airflow, so I also had to check the following:

  • Cloud Source Repository (https://source.cloud.google.com/) is in sync with the GitHub Repository. If not: Disconnect the repository and reconnect again.
  • The GCS Bucket which is created when the Composer 2 Environment is setup the very first time has a subfolder "/dags/". I had to manually add the subfolder "/dags/dataflow/" so the deployed Dataflow Pipeline codes could be uploaded to that subfolder "/dags/dataflow/"