How to use Cloud Trace with Nodejs on GKE with workload identity enabled?

Question

I'm trying to set up Cloud Trace on a GKE cluster with workload identity enabled. My pod uses a service account, which has the Cloud Trace Agent role. (I also tried giving it the Owner role, to rule out permission issues, but that didn't change the error.)

I followed the Node.js quickstart, which says to add the following snippet to my code:

require('@google-cloud/trace-agent').start();

When I try to add a trace, I get the following error:

@google-cloud/trace-agent DEBUG TraceWriter#publish: Received error while publishing traces to cloudtrace.googleapis.com: Error: Could not refresh access token: A Forbidden error was returned while attempting to retrieve an access token for the Compute Engine built-in service account. This may be because the Compute Engine instance does not have the correct permission scopes specified: Could not refresh access token: Unsuccessful response status code. Request failed with status code 403

(How) can I configure the library to work in this scenario?

Are you having the same behavior when running your app on a GKE cluster with Workload Identity disabled? Also, Are you running your app on within the same kubernetes namespace and with the kubernetes account you used to configure Workload Identity? — Ariel Palacios, Mar 03 '21 at 23:33
@ArielPalacios I created a dummy cluster without workload identity and there tracing works fine. My app is running in a separate namespace but - correct me if I'm wrong - workload identity is a cluster feature, not connected to a namespace? — Vincent van der Weele, Mar 04 '21 at 08:08

Ariel Palacios · Answer 1 · 2021-03-04T18:51:16.133

In order to answer your question on comments above: correct me if I'm wrong - workload identity is a cluster feature, not connected to a namespace?

And seeing that you have fixed your problem by configuring the binding between KSA/K8s Namespace and GCP SA I will add a response to add more context that I believe could help clarify this.

Yes you are right, Workload identity is a GKE cluster feature that lets you bind an identity from K8s (Kubernetes Service Account (KSA)) with a GCP identity (Google Service Account(GSA)) in order to have your workloads authenticated with an specific GCP identity and with enough permissions to be able to reach certain APIs (depending on the permissions that your GCP service account has). k8s namespaces and KSA take a critical role here, as KSA are Namespaced resources.

Therefore, in order to authenticate correctly your workloads (containers) and with a GCP Service account, you need to create them in the configured k8s Namespace and with the configured KSA, as mentioned in this doc

If you create your workloads on a different k8s Namespace (meaning using a different KSA), you will not be able to get an authenticated identity for your workloads, instead of that, your workloads will be authenticated with the Workload Identity Pool/Workload Identity Namespace, which is: PROJECT_ID.svc.id.goog. Meaning that when you create a container with the GCP SDK installed and run a glcoud auth list you will get PROJECT_ID.svc.id.goog as the authenticated identity, which is an IAM object but not an identity with permission in IAM. So your workloads will be lacking of permissions.

Then you need to create your containers in the configured namespace and with the configured service account to be able to have a correct identity in your containers and with IAM permissions.

I'm assuming that above (authentication with lack of permission and lack of an actual IAM Identity) is what happened here, as you mentioned in your response, you just added the needed binding between GSA and the KSA, meaning that your container was lacking of an identity with actual IAM permissions.

Just to be clear on this, Workload Identity allows you to authenticate your workloads with a service account different from the one on your GKE nodes. If your application runs inside a Google Cloud environment that has a default service account, your application can retrieve the service account credentials to call Google Cloud APIs. Such environments include Compute Engine, Google Kubernetes Engine, App Engine, Cloud Run, and Cloud Functions, here.

With above comment I want to say that even if you do not use Workload Identity, your containers will be authenticated as they are running on GKE, which by default use a service account, and this service account is inherited from the nodes to your containers, the default service account (Compute service Account) and its scopes are enough to write from containers to Cloud Trace and that is why you were able to see traces with a GKE cluster with Workload Identity disabled, because the default service account was used on your containers and nodes.

If you test this on both environments:
GKE cluster with Workload Identity: You will be able to see, with the correct config, a service account different than the default, authenticating your workloads/containers.

GKE cluster with Workloads Identity disabled: You will be able to see the same service account used by your nodes (by default the compute engine service account with Editor role and scopes applied on your nodes when using default service account) on your Containers.

These tests can be performed by spinning the same container you used on your response, which is:

kubectl run -it \
  --image google/cloud-sdk:slim \
 --serviceaccount KSA_NAME \ ##If needed 
 --namespace K8S_NAMESPACE \ ##If needed
  workload-identity-test

And running `glcoud auth list to see the identity you are authenticated with on your containers.

Hope this can help somehow!

Thanks for the additional context! What I _think_ went wrong is that I made a typo in config connector `IAMPolicy` resource. I created a new namespace for my tracing test and created the k8s serviceaccount and IAM policy binding using config connector (steps 5 and 6 [here](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#authenticating_to)). I did verify that the k8s serviceaccount was correctly created and annotated but I never checked if the IAM serviceaccount had the correct binding. Now I know one should always check :) — Vincent van der Weele, Mar 05 '21 at 06:58
Correction: I did not make a typo in my `IAMPolicy`, instead I hadn't annotated the new k8s namespace with the Google Cloud project where to create the resources. That's why the policy binding couldn't get created. — Vincent van der Weele, Mar 05 '21 at 08:28

Vincent van der Weele · Answer 2 · 2021-03-05T08:32:09.367

It turned out I had misconfigured the IAM service account.

I managed to get a more meaningful error message by running a new pod in my namespace with the gcloud cli installed:

kubectl run -it \
  --image gcr.io/google.com/cloudsdktool/cloud-sdk \
  --serviceaccount $GKE_SERVICE_ACCOUNT test \
  -- bash

after that, just running any gcloud command gave an error message containing (emphasis mine):

Unable to generate access token; IAM returned 403 Forbidden: The caller does not have permission This error could be caused by a missing IAM policy binding on the target IAM service account.

Running

gcloud iam service-accounts get-iam-policy $SERVICE_ACCOUNT

indeed showed that the binding to the Kubernetes service account was missing.

Adding it manually fixed the issue:

gcloud iam service-accounts add-iam-policy-binding \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:$PROJECT.svc.id.goog[$NAMESPACE/$GKE_SERVICE_ACCOUNT]" \
  $SERVICE_ACCOUNT

After more research, the underlying problem was that I created my service accounts using Config Connector but hadn't properly annotated the Kubernetes namespace with the Google Cloud project to deploy the resources in:

kubectl annotate namespace "$NAMESPACE" cnrm.cloud.google.com/project-id="$PROJECT"

Therefore, Cloud Connector could not add the IAM policy binding.

How to use Cloud Trace with Nodejs on GKE with workload identity enabled?

2 Answers2