it would be great if you could help me with this pending state issue on Gitlab, it is driving me crazy as I cannot point out what is going wrong really.
Please find here the details and configs:
Current Gitlab Plan: Gitlab Premium SaaS Plan
Number of local linux runners in this project A: 55 runners
Number of local windows runners in this project A: 6 runners
Issue: Some jobs in a pipeline gets stuck in pending state and does not pick up runner even though many runners are available for this project. Runners are locked to project A and are tagged. Tags are assigned to jobs as well. The issue is only on this one specific project A.
More details: Every other local runners in other projects are running fine with no pending state issue. The jobs were running completely fine last week on project A, but suddenly has a pending state issue now, no changes have been made.
Config.toml for linux runners (55 runners are divided on 4 different machines just showing concurrent here together as an example):
concurrent = 55
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "runner1-runner55"
url = "https://gitlab.com/"
token = "######"
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "Ubuntu:version of ubuntu"
privileged = false
pull_policy = ["always", "if-not-present"]
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
shm_size = 0
Config.toml for windows runners (also on 2 different machines showing concurrent 5 as an example)
concurrent = 5
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "runner1-runner5"
url = "https://gitlab.com"
token = "#######"
executor = "shell"
shell = "powershell"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
I cannot determine what is suddenly causing this pending state problem since all the jobs were running completely fine in this project A, just a week ago and nothing has been changed. Also all runners are active in the CI/CD runners section of Project A. If it was a network issue then the same thing would have happened for other projects as the local runner server of those projects are also using the same network.
Please let me know if you need any more information.
Any help is appreciated. Thank you!
What has been done for troubleshooting: -All local runner machines have been restarted, gitlab-runner restart has been done, Gitlab runner version has been upgraded, disk space has been checked and cleaned. -Network monitoring has been done, Replaced network cables and changed the switch to rule out any network issue. -New runners have been added from a new machine and still the same pending state issue is happening. -session_timeout was changed to 3 it made no difference
What makes the job pick up the runner from pending state: -gitlab-runner restart on local runner servers -Pausing and unpausing the runner from CI/CD settings -Cancelling and starting the jobs