GitLab CI Timeout with Kaniko and EKS

Question

I am trying to follow the GitLab example code for using kaniko as outlined here. The only thing I have changed is that I am using the v1.7.0-debug tag instead of simply debug.

build:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:v1.7.0-debug
    entrypoint: [""]
  script:
    - mkdir -p /kaniko/.docker
    - echo "{\"auths\":{\"${CI_REGISTRY}\":{\"auth\":\"$(printf "%s:%s" "${CI_REGISTRY_USER}" "${CI_REGISTRY_PASSWORD}" | base64 | tr -d '\n')\"}}}" > /kaniko/.docker/config.json
    - >-
      /kaniko/executor
      --context "${CI_PROJECT_DIR}"
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"

My build job is stalling out at the following line:

Running with gitlab-runner 14.4.0 (4b9e985a)
  on gitlab-runner-gitlab-runner-84d476ff5c-mkt4s HMty8QBu
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: gitlab-runner
Using Kubernetes executor with image gcr.io/kaniko-project/executor:v1.7.0-debug ...
Using attach strategy to execute scripts...
Preparing environment
00:03
Waiting for pod gitlab-runner/runner-hmty8qbu-project-31186441-concurrent-0bbt8x to be running, status is Pending
Running on runner-hmty8qbu-project-31186441-concurrent-0bbt8x via gitlab-runner-gitlab-runner-84d476ff5c-mkt4s...
Getting source from Git repository
00:01
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/...
Created fresh repository.
Checking out 4d05d22b as ci...
Skipping Git submodules setup
Executing "step_script" stage of the job script

It just stops at Executing "step_script" and never moves on. I've researched all over and read through as much documentation as I can find but am unable to troubleshoot this issue.

Setup

Amazon EKS version 1.21
GitLab Runner Helm Chart version 0.34.0
kaniko executor image v1.7.0-debug

what does it show if you append `--verbosity=debug` to the executor command? — Phillip -Zyan K Lee- Stockmann, Nov 11 '21 at 23:37
@Phillip-ZyanKLee-Stockmann I edited my question with the CI yaml. When I enabled debug logging in the Helm chart I did not see any new log details. However I then tried enabling the CI_DEBUG_TRACE environment variable and while that gave more information it still stalled out at the same spot. Most of that trace logging was environment variable values though and did not seem relevant. Was there something specific you are looking for? — rpf3, Nov 12 '21 at 15:13
I'm mostly wondering if kaniko is actually doing anything when the job stalls, or if that happens before kaniko even gets executed. Which seems likely, as it does not seem to even execute the mkdir? Do you have anything defined globally. I'm still just stabbing in the dark, though. — Phillip -Zyan K Lee- Stockmann, Nov 12 '21 at 20:47
Yeah, to me it seems like it's not the kaniko binary but the actual container itself that is causing issues. My hunch is that there is something wrong with the kaniko debug image and GitLab Runner cannot send the script commands to the shell. — rpf3, Nov 12 '21 at 21:44
I originally was using the "debug" tag as it says in the documentation but the Kubernetes executor kept raising an exception saying that "/bin/sh is not in $PATH" or something like that. Then I realized that this issue was fixed in release v1.7.0 so I thought I might have been dealing with a caching issue on GCR. I switched to the 1.7.0-debug tag and was able to get passed the shell issue. https://github.com/GoogleContainerTools/kaniko/pull/1748 — rpf3, Nov 13 '21 at 16:23
Some further info, I was testing with the legacy execution strategy feature flag described below and when I enabled it I was able to get beyond the hanging script. However the job immediately fails with the error `/bin/sh: eval: line 27: mkdir: not found`. I don't think this mkdir command is coming from my job script either because I tried to simplify the script to just "printenv" and it still throws the same error. https://docs.gitlab.com/runner/executors/kubernetes.html#job-execution — rpf3, Nov 15 '21 at 17:49
To be precise the image tagged `1.7.0-debug` contains also the binary tools provided by `busybox`. The image `1.7.0` does not. So in order to be able to use a shell or other tools, you must switch to the debug version. — Davide Madrisan, Nov 30 '21 at 17:21
Thank you @DavideMadrisan, yes that is the container tag I am using. — rpf3, Nov 30 '21 at 17:34
Thanks for you help @Phillip-ZyanKLee-Stockmann. I have posted the solution we came to below. — rpf3, Dec 02 '21 at 19:56

score 2 · Accepted Answer · answered Dec 02 '21 at 19:55

This ended up being an issue with how the Kubernetes runner itself was configured inside of the runner configuration toml. The default container image we were using for our runners required a modification to the PATH environment variable so we were using the environment configuration setting to do this as outlined here. It seems that this PATH variable did not include the busybox shell defined in the kaniko debug image. We have since moved that PATH change inside our Docker image where it should've been in the first place and things are working as expected.

Thanks for clearing the issue up. Hopefully this explanation helps somebody else. — Phillip -Zyan K Lee- Stockmann, Mar 01 '22 at 11:55

GitLab CI Timeout with Kaniko and EKS

1 Answers1