9

I have a ruby on rails app, where we validate records from huge excel files(200k records) in background via sidekiq. We also use docker and hence a separate container for sidekiq. When the sidekiq is up, memory used is approx 120Mb, but as the validation worker begins, the memory reaches upto 500Mb (that's after a lot of optimisation). Issue is even after the job is processed, the memory usage stays at 500Mb and is never freed, not allowing any new jobs to be added. I manually start garbage collection using GC.start after every 10k records and also after the job is complete, but still no help.

Vivek Tripathy
  • 171
  • 2
  • 6
  • It sounds like you have a memory leak. Garbage collection can only free memory used by objects that no longer have *any* references to them. – Joey Harwood Jan 05 '18 at 15:16
  • It looks like you experience [this issue](https://stackoverflow.com/a/20608455/2035262). You probably need to re-implement `CSV`/`xlsx` handling from the scratch to avoid allocating too many `RVALUE`s. – Aleksei Matiushkin Jan 05 '18 at 15:20
  • I use creek gem to parse the excel (xlsx). That's the fastest out there. And then while iterating through the rows, I find_or_initialize a record by an identifier in the excel. – Vivek Tripathy Jan 06 '18 at 16:31
  • @mudasobwa, you are right. Its similar issue. But I ain't really sure how to recognise such memory bloat source. – Vivek Tripathy Jan 06 '18 at 16:39
  • Too many local variables to fit RVALUE size. Read the whole file and parse it manually. – Aleksei Matiushkin Jan 06 '18 at 16:48
  • I have been converting the whole excel into a hash right upfront and then process it. Do you think a large hash of this size be the issue? – Vivek Tripathy Jan 06 '18 at 18:19
  • 1
    Have you tried building ruby with jemalloc in your Docker containers or adjusting MALLOC_ARENA_MAX? See this article: https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html – Alexander Clark Sep 28 '18 at 15:41
  • similar issue: https://stackoverflow.com/questions/18978396/sidekiq-not-deallocating-memory-after-workers-have-finished – Oshan Wisumperuma Mar 13 '19 at 06:31

1 Answers1

2

This is most likely not related to Sidekiq, but to how Ruby allocates from and releases memory back to the OS.

Most likely the memory can not be released because of fragmentation. Besides optimizing your program (process data chunkwise instead of reading it all into memory) you could try and tweak the allocator or change the allocator.

There has been a lot written about this specific issue with Ruby/Memory, I really like this post by Nate Berkopec: https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html which goes into all the details.

The simple "solution" is:

Use jemalloc or, if not possible, set MALLOC_ARENA_MAX=2.

The more complex solution would be to try and optimize your program further, so that it does not load that much data in the first place.

I was able to cut memory usage in a project from 12GB to < 3GB by switching to jemalloc. That project dealt with a lot of imports/exports and was written quite poorly and it was an easy win.

Pascal
  • 8,464
  • 1
  • 20
  • 31