0

I've embedded V8 9.5 into my app (C++ HTTP server). When I started to use optional chaining in my JS scripts I've noticed abnormal rise in memory consumption under heavy load (CPU) leading to OOM. While there's some free CPU, memory usage is normal. I've displayed V8 HeapStats in grafana (this is only for 1 isolate, which I have 8 in my app) heap stats

Under heavy load there's a spike in peak_malloced_memory, while other stats are much less affected and seem normal. I've passed --expose-gc flag to V8 and called gc() at the end of my script. It completely solved the problem and peak_malloced_memory doesn't rise like that. Also, by repeatedly calling gc() I could free all extra memory consumed without it. --gc-global also works. But these approaches seem more like a workaround rather than a production-ready solution. --max-heap-size=64 and --max-old-space-size=64 had no effect - memory consumption still did greatly exceed 8(number of isolates in my app)*64Mb (>2Gb physical RAM).

I don't use any GC-related V8 API in my app.

My app creates v8::Isolate and v8::Context once and uses them to process HTTP requests.

Same behavior at v9.7.

Ubuntu xenial

Built V8 with these args.gn

dcheck_always_on = false
is_debug = false
target_cpu = "x64"
v8_static_library = true
v8_monolithic = true
v8_enable_webassembly = true
v8_enable_pointer_compression = true
v8_enable_i18n_support = false
v8_use_external_startup_data = false
use_thin_lto = true
thin_lto_enable_optimizations = true
x64_arch = "sandybridge"
use_custom_libcxx = false
use_sysroot = false
treat_warnings_as_errors = false # due to use_custom_libcxx = false
use_rtti = true # for sanitizers

And then manually turned static library into dynamic one with this (had some linking issues with static lib due to LTO that I didn't want to deal with in future):

../../../third_party/llvm-build/Release+Asserts/bin/clang++ -shared -o libv8_monolith.so -Wl,--whole-archive libv8_monolith.a -Wl,--no-whole-archive -flto=thin -fuse-ld="lld"

I did some load testing (since problem occurs only under load) with and without manual gc() call and this is the RAM usage graph during load testing with timestamps: RAM usage

  1. Started load testing with gc() call: no "leak"
  2. Removed gc() call and started another load testing session: "leak"
  3. Brought back manual gc() call under low load: memory usage started to gradually decrease.
  4. Started another load testing session (with gc() still in script): memory usage quickly decreased to baseline values.

My questions are:

  1. Is it normal that peak_malloced_memory can exceed total_heap_size?
  2. Why could this occur only when using JS's optional chaining?
  3. Are there any other, more correct solutions to this problem other than forcing full GC all the time?
Islam Boziev
  • 167
  • 1
  • 8

2 Answers2

1

(V8 developer here.)

  1. Is it normal that peak_malloced_memory can exceed total_heap_size?

Malloced memory is unrelated to the heap, so yes, when the heap is tiny then malloced memory (which typically also isn't a lot) may well exceed it, maybe only briefly. Note that peak malloced memory (53 MiB in your screenshot) is not current malloced memory (24 KiB in your screenshot); it's the largest amount that was used at any point in the past, but has since been freed (and is hence not a leak, and won't cause an OOM over time).

Not being part of the heap, malloced memory isn't affected by --max-heap-size or --max-old-space-size, nor by manual gc() calls.

  1. Why could this occur only when using JS's optional chaining?

That makes no sense, and I bet that something else is going on.

  1. Are there any other, more correct solutions to this problem other than forcing full GC all the time?

I'm not sure what "this problem" is. A brief peak of malloced memory (which is freed again soon) should be fine. Your question title mentions a "leak", but I don't see any evidence of a leak. Your question also mentions OOM, but the graph doesn't show anything related (less than 10 MiB current memory consumption at the end of the plotted time window, with 2GB physical memory), so I'm not sure what to make of that.

Manually forcing GC runs is certainly not a good idea. The fact that it even affects (non-GC'ed!) malloced memory at all is surprising, but may have a perfectly mundane explanation. For example (and I'm wildly speculating here, since you haven't provided a repro case or other more specific data), it could be that the short-term peak is caused by an optimized compilation, and with the forced GC runs you're destroying so much type feedback that the optimized compilation never happens.

Happy to take a closer look if you provide more data, such as a repro case. If the only "problem" you see is that peak_malloced_memory is larger than the heap size, then the solution is simply not to worry about it.

jmrk
  • 34,271
  • 7
  • 59
  • 74
  • Thank you for helping. The app consumed >2Gb of RAM without V8 heap size increasing, that's why you didn't see it on graph - it confuses me as well. Interesting that you mentioned optimized compilation. When I profiled my app using jemalloc the only thing I saw there that was related to V8 was optimized compilation, but it showed little percentage so I discarded it. [Here's the link](https://drive.google.com/file/d/1Qu1AnCDuf5Om1G7A8Hj61wMuXvCE-h_8/view?usp=sharing) if you want to see V8 symbols that appeared at profiler report. But how could optimized compilation take so much memory? – Islam Boziev Nov 30 '21 at 01:10
  • I also updated my question. I've added RAM usage graph that illustrates how calling `gc()` at each JS execution (request processing) influences memory consumption. – Islam Boziev Nov 30 '21 at 01:26
  • All this made the impression to me that some objects on the heap are not taken into account and V8 thinks that it doesn't consume memory, hence no need to run global/full GC. I have no idea how that could happen, but this could explain why forcing GC helps to take back memory. (Most probably this is very far from the truth, since I don't know how V8 works, just sharing my thoughts) – Islam Boziev Nov 30 '21 at 01:41
  • 1
    The linked jemalloc profile shows 14MB compiler memory, which doesn't look like the problem. Are you maybe allocating huge amounts of external memory (externalized strings, or large TypedArrays) kept alive by JS objects? If you're allocating custom things, make sure to use `v8::Isolate::AdjustAmountOfExternalAllocatedMemory` accordingly. To save CPU, the GC won't run much when there's lots of free heap; if you want it to work harder despite only using ~5MB, try playing with the max heap size. I suspect that the problem is somewhere else entirely though (no idea where). – jmrk Nov 30 '21 at 22:07
  • Yeah, that's why I discarded it too... I don't use V8 to manage lifetime of my objects, so I don't think that's the problem, also wouldn't explain why it only occurs when using `?.` syntax. Prior to this I was on v6.0 and, since back then there was no support for optional chaining, used simple function ```function __safe_get__(value, key) { if (typeof value === 'object' && value !== null && key in value) { return value[key]; } return undefined; }``` but after I replaced this with `?.` / `?.[]` memory said bye) – Islam Boziev Dec 01 '21 at 09:04
  • Also, without optional chaining (everything else being the same) I get 4Mb of `peak_malloced_memory` instead of >50Mb (even on higher load). So looks like in my use case optional chaining causes excessive memory allocation that doesn't go into heap, but somewhere else inside V8. And despite seemingly not residing inside the heap this memory can be freed by full GC... This is very weird to me... – Islam Boziev Dec 01 '21 at 11:16
0

I think I got to the bottom of this...

Turns out, this was caused by V8's --concurrent-recompilation feature in conjunction with our jemalloc configuration.

Looks like when using optional chaining instead of hand-written function, V8 more aggressively tries to optimize code concurrently and allocates far more memory for that (zone-stats showed > 70Mb of memory per isolate). And it does that specifically under high load (maybe only then V8 notices hot functions).

jemalloc, in turn, by default has 128 arenas and background_thread disabled. Because with concurrent recompilation optimization is done on a separate thread, V8's TurboFan optimizer ended up allocating a lot of memory in the separate jemalloc's arena and even though V8 free'd this memory, because of jemalloc's decay strategy and because this arena wasn't accessed anywhere else, pages weren't purged, thus increasing resident memory.

Jemalloc stats:
Before memory runaway:

Allocated: 370110496, active: 392454144, metadata: 14663632 (n_thp 0), resident: 442957824, mapped: 570470400, retained: 240078848

After memory runaway:

Allocated: 392623440, active: 419590144, metadata: 22934240 (n_thp 0), resident: 1712504832, mapped: 1840152576, retained: 523337728

As you can see, while allocated memory is less than 400Mb, RSS is at 1.7Gb due to ~300000 of dirty pages (~1.1Gb). And all those dirty pages are spread out on a handful of arenas with 1 thread associated (the one on which V8's TurboFan optimizer did concurrent recompilation).

--no-concurrent-recompilation solved the issue and I think is optimal in our use case where we allocate an isolate for each CPU core and distribute the load evenly, so there's little point in performing recompilation concurrently from a bandwidth standpoint.

This can also be solved on jemalloc's side with MALLOC_CONF="background_thread:true" (which, allegedly, can crash) or by reducing the number of arenas MALLOC_CONF="percpu_arena:percpu" (which may increase contention). MALLOC_CONF="dirty_decay_ms:0" also fixed the issue, but it is a suboptimal solution.

Not sure how forcing GC helped to regain memory, maybe it somehow triggered access to those jemalloc arenas without allocating much memory in them.

Islam Boziev
  • 167
  • 1
  • 8