Linux OOM-killer disabling woes

Question

I have a Linux machine with no swap, which has vm.overcommit_memory=2 and vm.overcommit_ratio=100 set. However these settings seem to have no effect. Some newly-started processes which attempt to consume a lot of memory are still being OOM-killed instead of being denied memory allocation. Is it because processes started before sysctl -w vm.overcommit_memory=2 vm.overcommit_ratio=100 (e.g. before /etc/sysctl.conf is applied during boot) may still be able to access their overcommitted memory and thereby trigger the OOM-killer? Is it possible to (1) disable memory overcommit using kernel parameters, or (2) at runtime force the kernel to allocate all overcommitted memory for all processes?

EDIT: After browsing relevant Documentation/ and some of the Linux source code (git c6fa8e6de3) (1) seems unlikely.

This seems the wrong place for the question - try moving it to unix&linux, as I don't see anything "enterprise" or "server" specific in the question. — Dani_l, Oct 07 '15 at 12:10
Have you tried adding some swap and see if that helps? How much memory is on this box? What distro and kernel are you running? — Ben Lutgens, Oct 07 '15 at 12:31
You may have some limited success by setting a virtual set size memory limit on your starting processes equal to the amount of committed memory they should ever take. This should produce ENOMEM results in the case the limit is exceeded. As noted already though its often the case that programs dont check the result of brk() or mmap() calls. — Matthew Ife, Oct 07 '15 at 19:42

score 2 · Answer 1 · answered Oct 07 '15 at 12:35

2

You're attempting something that simply is not possible. If you let the system run to near exhaustion, there will always be a possibility of a process being killed due to memory exhaustion. There are many scenarios that can't be avoided, but the most easy to understand is a page fault caused by a process adding a page to its stack.

You need swap or a RAM cushion if you want to avoid oom killing.

answered Oct 07 '15 at 12:35

David Schwartz

31,449
2
55
84

Also, to the original poster, make sure you take a look at Documentation/vm/overcommit-accounting. There's tons of good info there. – Ben Lutgens Oct 07 '15 at 12:51
1

There's a big difference between `malloc`/`calloc`/`mmap` returning `NULL` and a process being `SIGKILL`ed after accessing overcommitted memory. In the latter case the process has zero chance to clean up and exit gracefully. I acknowledge that in practice most code out there doesn't handle OOM well, many programmers assume that `malloc` and friends never fail. Many high-level languages try to hide the memory issue under the carpet altogether. Adding swap just postpones the problem. – jotik Oct 07 '15 at 18:51
As for Linux kernel documentation, I have read `Documentation/vm/overcommit-accounting` as well as `Documentation/sysctl/vm.txt` and even read some relevant kernel source code. As of now I have not found anything hinting towards a possibility to disable overcommit using kernel parameters. I have not yet searched for whether it is possible to force all overcommited pages to be allocated. – jotik Oct 07 '15 at 18:59
1

The OOM killer was a solution to the problem that many programs malloc memory they don't need. Saying that "killing a program that's out of memory is the only way" demonstrates a lack of proper systems understanding (Also note that very offten it's not the faulty program that gets killed!). The "malloc" system call familiy return null if the allocation can not be honored and all software should (though often don't) respond to this in a tidy manner. Overcommit and OOMK should default to off, and perhaps be possible to turn on if your badly written software needs it. – Haqa Feb 10 '17 at 12:32
@Haqa overcommit is also required for `fork()` and `exec*()` of a huge process such as Firefox. This is because the huge process needs to `fork()` to be able to run `exec*()` to start for example an external PDF reader. In case your RAM is pretty much used and swap is nearly full, the system cannot start a PDF reader anymore if tried by a huge process. However, it would still probably work when started from a terminal. That does not seem perfect either which is why linux has memory_overcommit by default. – Mikko Rantalainen Jan 12 '18 at 17:50
@MikkoRantalainen that's an example not of why we need overcommit and OOMK but of a piece of software that was designed poorly (if at all) and is exactly the kind of thing I meant when I mentioned "badly written software", unless you really believe that displaying a web page actually needs gigabytes of RAM? – Haqa Jan 16 '18 at 16:32
1

@Haqa Have you looked at a typical web page these days? They're not things you display, they're a set of huge, complex programs that you run. – David Schwartz Jan 16 '18 at 18:22
@MikkoRantalainen and yet these same pages seem able to be displayed (or executed) on these same browser applications on all other platforms without overcommit and specifically without OOMK. It's a error of judgement that, among modern operating systems, appears unique to Linux. This appears to be a feature that's jumped the fence from FreeBSD of all places and frankly I'd be quite happy to let them have it back. We'd be much better off with the Linux kernel developers creating a shortcut API to allow spawning a new process without having to duplicate the current one. – Haqa Jan 17 '18 at 09:05
@Haqa: it's true that POSIX `fork()` and `exec*()` semantics are pretty poor for launching new processes. The only real "fix" is to use `posix_spawn()` but it's so hard to use that there's zero probability that you can fix *all* the applications. – Mikko Rantalainen Jan 18 '18 at 07:01
@MikkoRantalainen I'm curious to know why you consider `posix_spawn()` to be so hard to use? – Haqa Mar 09 '18 at 16:07
@Haqa: it is not that hard to actually use `posix_spawn()`. However, the *actually hard* part is to get every single software developer in the world to start using it at all. Using `fork()` is usually conceptually easier and provides nearly the same experience except for some corner cases. Most developers do not feel that those corner cases are important and see any work towards posix_spawn() as wasted effort. Note that the work is not just replacing `fork` with `posix_spawn` but looking for `fork` and some more code and replacing all that with `posix_spawn` after figuring out if it will work. – Mikko Rantalainen Mar 12 '18 at 12:32

Linux OOM-killer disabling woes

1 Answers1