0

I run a server, mostly for using with random Docker containers and as a Gitlab-CI runner. Every once in a while, when the server has been running for a week or so, I run into process resource limits.

For example, I tried to configure the gitlab runner,

...
Registering runner... succeeded                     runner=gy1zjHEv
runtime: failed to create new OS thread (have 9 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc
...

Or a couple minutes later, trying to install the strace utility

(in dutch)
...
Instellen van strace (4.21-1ubuntu1) ...
Bezig met afhandelen van triggers voor man-db (2.8.3-2ubuntu0.1) ...
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
/usr/bin/mandb: fork failed: Hulpbron is tijdelijk onbeschikbaar
Bezig met afhandelen van triggers voor libc-bin (2.27-3ubuntu1) ...

It translates to fork failed: Resource is temporarily unavailable

Something is causing fork to throw EAGAIN.

Ulimit

$ ulimit -u
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1029355
max locked memory       (kbytes, -l) 16384
max memory size         (kbytes, -m) unlimited
open files                      (-n) 32768
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 62987
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

My max user processes is set to 62987, I'm nowhere near using 62,000 processes. The gitlab exception also showed that the current amount of processes is at 9, which is 'too high' supposedly?

How is it possible that clone is running into a process resource limit if I'm nowhere near the configured ulimit?

Azeirah
  • 161
  • 6
  • 1
    Nothing relevant in the output of `dmesg -T | tail -n 30` after that happens? – Hauke Laging May 02 '20 at 19:05
  • @HaukeLaging Not sure, the log states that a bunch of virtual network devices enter and exit promiscuous mode like mad. https://pastebin.com/gBdCtK67 – Azeirah May 02 '20 at 20:38
  • Interesting: If memory allocation hits a ulimit value then there is a message in `dmesg`. Hitting the process limit does not create any (at least on my system). – Hauke Laging May 02 '20 at 21:00
  • After -a lot- of searching, I accidentally stumbled upon the resource limit that's actually relevant. Namely, `/sys/fs/cgroup/pids/pids.max` is set to 400. I don't know enough about cgroups how to edit this just yet, but I have confirmed that this is the limit I'm running into with a very scientific setup of a split tmux window with a script that prints the $pids.current/$pids.max below, spamming processes in the upper window. – Azeirah May 02 '20 at 22:31
  • Make that an answer so that this question does not seem unanswered any more. – Hauke Laging May 02 '20 at 22:58

1 Answers1

1

After a lot of searching, it turns out that I was running into a cgroups limit. Namely, by querying /sys/fs/cgroup/pids/pids.current and /sys/fs/cgroup/pids/pids.max, I saw that the max is about 400 pids, and I was running into that limit.

CUR=$(cat /sys/fs/cgroup/pids/pids.current)
MAX=$(cat /sys/fs/cgroup/pids/pids.max)

echo $CUR/$MAX
Azeirah
  • 161
  • 6
  • 1
    For people wondering where this limit may come from: In our case, it was a limit imposed by the company hosting our VPS. – raffomania Aug 19 '21 at 10:24