0

I have a big struct (~200Mb) that I deserialize from a large JSON file from Java using serde_json and this deserialization occurs again when new data is available. The struct has Vecs, a HashMap of strings and structs of strings, etc.

While looking at the man page for mallopt(3), I found that environment variable MALLOC_MMAP_THRESHOLD_ can be set to control how much allocation has to be requested for malloc to allocate using mmap. I want to allocate my struct from mmap because the heap is causing memory fragmentation during reloads. I want the old deallocated memory (the one that is replaced with a new deserialized struct) to be returned to the system immediately (and not kept around by the one of the malloc arenas).

Is there a way to achieve this? Should I be using some other data format?

Gurwinder Singh
  • 38,557
  • 6
  • 51
  • 76
  • 1
    It looks like your question might be answered by the answers of [How can I get Serde to allocate strings from an arena during deserialization?](https://stackoverflow.com/q/51988630/155423). If not, please **[edit]** your question to explain the differences. Otherwise, we can mark this question as already answered. – Shepmaster Oct 15 '19 at 16:54
  • You found an environment variable that you think would be helpful, so what happens when you *try it*? – Shepmaster Oct 15 '19 at 16:54
  • Do you have control over the generated JSON? A 200Mb JSON file seems wrong. – mcarton Oct 15 '19 at 17:11
  • A 200MB JSON file indicates the wrong format for the job... and why do you you need all of it in memory all the time anyway? Can't you keep the positions of sub-elements in memory and use a streaming parser + LRU structure? – Sébastien Renauld Oct 15 '19 at 17:21
  • @SébastienRenauld - Yes, whole data is required in memory. It can be saved to disk when idle though but when needed, all of it needs to be available. – Gurwinder Singh Oct 15 '19 at 18:40
  • I dont know what should be the right format. JSON worked easily so I chose that.. – Gurwinder Singh Oct 15 '19 at 18:41
  • Have you mapped out access patterns on it? As in, do you have hot and cold segments of the JSON data? If so, there may be better options than having all of it in memory. – Sébastien Renauld Oct 15 '19 at 18:41
  • @Shepmaster I tried the variable. When setting too low (< 512) whole programs becomes too slow. On high value it doesn't have any effect. – Gurwinder Singh Oct 15 '19 at 18:42
  • @SébastienRenauld I wish there were. The data is basically used from a text matching engine and the engine needs whole data. – Gurwinder Singh Oct 15 '19 at 18:43
  • This sounds an awful lot like you're trying to build a fulltext search engine. I've been there, and the solution is two arenas if you'd like to avoid fragmentation. – Sébastien Renauld Oct 15 '19 at 18:44
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/200915/discussion-between-sebastien-renauld-and-gurwinder-singh). – Sébastien Renauld Oct 15 '19 at 18:44
  • @SébastienRenauld You're right. Why two arenas? – Gurwinder Singh Oct 15 '19 at 18:50

0 Answers0