11

With the "let it crash" philosophy of Erlang, one would expect the entire VM not to crash if a process cannot allocate the memory needed to proceed with its operations; indeed, if the system had a heuristic to kill some process to free some memory, some other process would handle this and recover. Root supervisors would probably be unlikely to be killed by the heuristic.

This is in direct contrast to most modern popular languages which just die or let the OS choose what to do.

How is running out of memory actually handled in Erlang?

Kara
  • 6,115
  • 16
  • 50
  • 57
  • See also: [Why is Erlang crashing on large sequences?](http://stackoverflow.com/questions/192725/why-is-erlang-crashing-on-large-sequences) – Emil Vikström Jun 20 '12 at 04:28

1 Answers1

12

When Erlang VM runs in an out-of-memory situation it simply crashes the whole VM. The reason is it is the most simple and safe thing to do.

If you want a fault tolerant system, you have to have more than one computer already. You can't make a fault tolerant system with only one computer (autonomous computation unit precisely). So if your application runs in an out-of-memory situation the simplest thing is to let the whole VM crash. You have a bug in your system anyway.

Handling all edge cases - which out-of-memory you can handle and which one you can't - is too complicated and error prone. Killing the offending process is not the solution. First, which is the offending process is hard to decide. Killing some "random" (heuristically decided) process is neither a solution because this process killed by heuristic could be the process responsible for recovery by accident. Killing the whole VM is not only the simplest but also the only reasonable solution to an out-of-memory situation.

The way it is done in most modern popular languages or OS is definitely wrong in situations where you need reliable systems. It can be acceptable for desktop or less strict requirements but absolutely unacceptable for systems which Erlang is designed for.

Hynek -Pichi- Vychodil
  • 26,174
  • 5
  • 52
  • 73
  • what if both vms on both computers run out of memory due to redundant application ? – Muzaaya Joshua Jun 20 '12 at 08:42
  • Muzaaya, what do you mean by "redundant application"? – Emil Vikström Jun 20 '12 at 09:02
  • @MuzaayaJoshua: It think you are messing up high reliability with high availability. Those both are fault tolerant solutions. High available ones have to work non stop. High reliable ones have to work properly i.e. without bugs but can accidentally stop. Redundant applications are used to make high reliable ones. High available ones are used to be made from backups. It means there is prepared "redundant" application which is not running the application but only updates it's state. Backup application should not run into oom situation as main one. Anyway crashing is right in both situations. – Hynek -Pichi- Vychodil Jun 20 '12 at 11:54
  • 2
    You _can_ make a fault tolerant system with one machine, the faults you tolerate are bugs in code from other people you use. Indeed Linux would never work if it was not fault tolerant in this sense. So if my process that sends off samples of some data over time to the user (such as how long it took to complete a unit of work) has a bug and allocates too much memory, my entire system should be considered broken and go down? I don't consider such a process critical to my system, so I wouldn't want to depend on its correctness. How is OOM *actually* handled in Erlang? – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Jun 20 '12 at 12:40
  • If there was an OOM killer, it wouldn't be too unreasonable to mark important supervisors with a higher weight than other processes. This way, the OOM killer can start its considerations from the set of processes lowest weight. – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Jun 20 '12 at 12:43
  • 1
    That would only be a limited fault-tolerance as if the one machine went down so would the system. For true fault-tolerance you do need multiple machines. – rvirding Jun 20 '12 at 13:36
  • @rvirding: by that logic, you might as well say, for true fault tolerance, you need multiple planets. but that isn't enough, what if the planets are nearby when a supernova occurs? so for true fault tolerance you need multiple galaxies, oh no wait multiple universes, oh no wait, multiple multiverses, ad infinitum. moreover I may only want to run the Erlang VM on my own machine because I don't want to have to secure other machines. – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Jun 20 '12 at 13:45
  • In any case, if a memory usage bug is in one process and that process is distributed across all your nodes, all your nodes will crash, regardless of that process is even important or not. – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Jun 20 '12 at 14:19
  • @Longpoke: The trick is that your memory leak bug in your process will not probably hit your memory limit on all your nodes at same time. – Hynek -Pichi- Vychodil Jun 20 '12 at 16:00
  • @Longpoke My point was just that by only have one machine there are some faults which you will not be able to handle, so it is a limited fault tolerance. As yet we don't have to worry about if a planet or galaxy goes down as we are only on this one planet; if it crashes there is no-one to detect it. But in the future ... – rvirding Jun 21 '12 at 22:52
  • @rvirding: I don't see why you _can't_ handle OOM. Linux handles it, although not 100% accurate, the entire system doesn't go down when one process triggers OOM, you guys seem to be implying that Erlang kills itself if it can't allocate more memory. BTW nobody answered my question yet: "what does Erlang do on OOM?" – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Jun 22 '12 at 02:56
  • @Longpoke: Linux handle OOM in some way and try find internet for screaming of users which have to face it. This mechanism used in Linux and most of popular modern languages is wrong in fault tolerant environment. I have answered your question several times: Q: "what does Erlang do on OOM?" A: "Erlang crashes whole VM." – Hynek -Pichi- Vychodil Jun 22 '12 at 06:04
  • ouch. okay, well if you edit your answer to make it explicit that Erlang does this, then I'll accept your answer. Right now it just looks like your suggesting this is what my program should do (and I can't tell whether it means I should configure my program to crash if anything gets OOM, or the Erlang VM already does this) – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Jun 22 '12 at 13:42