1

I've create an IRSA role in terraform so that the associated service account can be used by a K8s job to access an S3 bucket but I keep getting an AccessDenied error within the job.

I first enabled IRSA in our EKS cluster with enable_irsa = true in our eks module.

I then created a simple aws_iam_policy as:

resource "aws_iam_policy" "eks_s3_access_policy" {
  name = "eks_s3_access_policy"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "s3:*",
        ]
        Effect   = "Allow"
        Resource = "arn:aws:s3:::*"
      },
    ]
  })
}

and a iam-assumable-role-with-oidc:

module "iam_assumable_role_with_oidc_for_s3_access" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version = "~> 3.0"

  create_role = true
  role_name = "eks-s3-access"
  role_description = "Role to access s3 bucket"
  tags = { Role = "eks_s3_access_policy" }
  provider_url = replace(module.eks.cluster_oidc_issuer_url, "https://", "")
  role_policy_arns = [aws_iam_policy.eks_s3_access_policy.arn]
  number_of_role_policy_arns = 1
  oidc_fully_qualified_subjects = ["system:serviceaccount:default:my-user"]
}

I created a K8s service account using Helm like:

Name:                my-user
Namespace:           default
Labels:              app.kubernetes.io/managed-by=Helm
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::111111:role/eks-s3-access
                     meta.helm.sh/release-name: XXXX
                     meta.helm.sh/release-namespace: default
Image pull secrets:  <none>
Mountable secrets:   my-user-token-kwwpq
Tokens:              my-user-token-kwwpq
Events:              <none>

Finally, jobs are created using the K8s API from a job template:

apiVersion: batch/v1
kind: Job
metadata:
  name: job
  namespace: default
spec:
  template:
  spec:
    serviceAccountName: my-user
    containers:
    - name: {{ .Chart.Name }}
      env:
      - name: AWS_ROLE_ARN
        value: arn:aws:iam::746181457053:role/eks-s3-access
      - name: AWS_WEB_IDENTITY_TOKEN_FILE
        value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
      volumeMounts:
      - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
        name: aws-iam-token
        readOnly: true
    volumes:
    - name: aws-iam-token
      projected:
        defaultMode: 420
        sources:
        - serviceAccountToken:
          audience: sts.amazonaws.com
          expirationSeconds: 86400
          path: token

When the job attempts to get the specified credentials, however, the specified token is not there:

2021-08-03 18:02:41  Refreshing temporary credentials failed during mandatory refresh period.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/credentials.py", line 291, in _protected_refresh
    metadata = await self._refresh_using()
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/credentials.py", line 345, in fetch_credentials
    return await self._get_cached_credentials()
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/credentials.py", line 355, in _get_cached_credentials
    response = await self._get_credentials()
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/credentials.py", line 410, in _get_credentials
    kwargs = self._assume_role_kwargs()
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/credentials.py", line 420, in _assume_role_kwargs
    identity_token = self._web_identity_token_loader()
  File "/usr/local/lib/python3.7/site-packages/botocore/utils.py", line 2365, in __call__
    with self._open(self._web_identity_token_path) as token_file:
FileNotFoundError: [Errno 2] No such file or directory: '/var/run/secrets/eks.amazonaws.com/serviceaccount/token'

From what is described in https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/ a webhook typically creates these credentials when the pod is created. However, since we're creating the new k8s' job on demand within the k8s cluster, I suspect that the webhook is not creating any such credentials.

How can I request the correct credentials to be created from within a K8s cluster? Is there a way to instantiate the webhook from within the cluster?

1 Answers1

0

There are a couple of things that could cause this to fail.

  • Check all settings for the IRSA role. For the trust relationship setting check the name of the namespace and the name of service account are correct. Only if these settings match the role can be assumed.
  • While the pod is running try to access the pod with a shell. Check the content of the "AWS_*" environment variables. Check AWS_ROLE_ARN points to the correct role. Check, if the file which AWS_WEB_IDENTITY_TOKEN_FILE points to, is in its place and it is readable. Just try to do a cat on the file to see if it is readable.
  • If you are running your pod a non-root (which is recommended for security reasons) make sure the user who is running the pod has access to the file. If not, adjust the securityContext for the pod. Maybe the setting of fsGroup can help here. https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context
  • Make sure the SDK your pos is using supports IRSA. If you are using older SDKs IRSA may not be supported. Look into the IRSA documentation for supported SDK versions. https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-minimum-sdk.html
Guido Müller
  • 86
  • 1
  • 8
  • Thanks for all the recommendations: I checked the trust relationship and verified that the namespace and name of service account match the values in role. I logged into the pod with shell and verified that AWS_ROLE_ARN is the arn of the above IRSA role. However, the file pointed to by `AWS_WEB_IDENTITY_TOKEN_FILE` is missing -- this seems to be the root cause. I set up the volume & volume mount for this token manually (above) but it seems like the projected source is never created... I'm not sure how to get AWS to create this token. That seems to be the missing step. – Blaine Nelson Aug 04 '21 at 19:13
  • Interestingly, when I create the job manually through a Helm update, the pod is properly annotated and contains the correct token file that allows the job to work correctly. HOWEVER, when I spawn the job from an in-cluster server (that is used to spawn new worker jobs), the annotations are there but the token file is NOT present. This makes me think the issue has to do with a permission issue for the user who spawns the job, perhaps? – Blaine Nelson Aug 05 '21 at 00:20
  • I found the problem: Adding the volume `aws-iam-token` into the job manually before the web-hook ran caused the volume not to be created by the web-hook. Hence there was no token file in the mounted drive. – Blaine Nelson Aug 05 '21 at 04:53
  • When I used IRSA I have never mounted a volume for this. The file was automatically in place. – Guido Müller Aug 05 '21 at 17:35