50

If I use make -j2, it builds fine, but under-utilizes CPU:

j2

If I use make -j4, it builds fast, but for some particular template-heavy files it consumes a lot of memory, slowing down entire system and build process as well due to swapping out to HDD:

j4

How do I make it automatically limit number of parallel tasks based on memory, like this:

enter image description here

, so that it builds the project at maximum rate, but slows down in some places to avoid hitting memory wall?

Ideas:

  • Use -l and artificially adjust load average if memory is busy (load average grows naturally when system is already in trouble).
  • Makes memory allocation syscalls (like sbrk(2) or mmap(2)) or page faults keep process hanged until memory gets reclaimed by finished jobs instead of swapping out other processes. Deadlock-prone unfortunately...
Vi.
  • 37,014
  • 18
  • 93
  • 148
  • 2
    Are you able to predict how much memory processes executed by make will consume? Otherwise, I cannot imagine any solution that is able to converge to the maximum, but not pass it, and will not cause a deadlock. – Kuchara Jan 30 '18 at 13:28
  • 1
    1. It may be approximate: allow some underutilisation and allow some temorary exceedings of the maximum. Just not exacerbate problems by starting more tasks when memory is already full (but load average not yet raised due to whole system thrashing); 2. It may remember typical time and memory usage by compilation units from previous compilation units and estimate that it's likely unchanged from before. – Vi. Feb 01 '18 at 00:39
  • 3
    GNU make 4.2 provides an API to its jobserver. I think it would be possible to create some "guard" job, which would consume/revert jobserver tokens based on overall memory usage. Then used with `make -j guardjob ....`, without `-l` param. – Kuchara Feb 02 '18 at 13:07
  • 1
    Hmm... but how stop such a `guardjob`? That may be very hard... Just another idea that came to my mind recently is to modify make jobserver to work on "memory" tokens (e.g. one token to be 10MB), as opposed to job tokens (http://make.mad-scientist.net/papers/jobserver-implementation/). – Kuchara Oct 08 '18 at 13:28
  • 1
    I heard about Linux cgroup2, a resource manager that can control the max/min amount of CPU, memory, IO, ... resources dedicated to hierarchical groups of processes. I have not really used it before, so take my recommendation with caution. But you're welcome to check this out and I hope it offers you one option: https://facebookmicrosites.github.io/cgroup2/docs/memory-controller.html – fjs Dec 15 '22 at 21:34
  • @fjs, Can cgroups or cgroups2 provide a separate load average counter (`/proc/loadavg`) for Make/Ninja to use? Cgroups can prevent system-wide thrashing, but cgroup itself would still be overloaded and work slower than it could work if extraneous job were not started. – Vi. Dec 15 '22 at 21:50
  • I think cgroup2 creates pseudofiles that track the resource utilization of each group. It’s not /proc/loadavg though. – fjs Dec 16 '22 at 00:18

1 Answers1

2

Not a complete solution - but a potential approach, based on the 'guard'.

  1. Start make with high '-j' value.

  2. Run a guard job that

    a. Wait for thrashing (page fault/second above a limit), and high CPU utilization. b. Find the jobs with the highest memory usage, suspend it. c. Wait for some time to allow system to rebalance itself. d. Repeat

If the approach looks good, can post sample implementation.

Community
  • 1
  • 1
dash-o
  • 13,723
  • 1
  • 10
  • 37
  • 2
    Can it work preemptively? E.g. can it make all linking (LTO) jobs to only run alone (by suspending then in the very beginning, for example)? – Vi. Nov 27 '19 at 21:52
  • 1
    I'm not sure I understand. Are you saying that certain jobs should not start until certain memory is available ? If you can tag those jobs, it not hard to make delay them. But I though the question was about dynamic adjustment, and not manual control ? – dash-o Nov 27 '19 at 21:57
  • 2
    OK, maybe better to keep the question focused at automatic solution. – Vi. Nov 28 '19 at 10:02
  • 1
    Is the problem still relevant ? If you found alternative solution - I would not want to take your time – dash-o Nov 28 '19 at 13:38
  • 1
    If you have something to test, I may test it. If you don't need it yourself, no need to code something specifically to answer this. – Vi. Dec 01 '19 at 11:27
  • 1
    I have similar problem, but the constraint is usually overall load, and not memory. However, I believe logic is the same - when load is high, suspend jobs based on size. – dash-o Dec 01 '19 at 11:33