1

I've got code that is leaking memory in a Sinatra app on Ruby 2.4.4, and I can sort of reproduce it in irb, although it's not totally stable, and I'm wondering if others have this same problem. It happens when interpolating a large string inside a regular expression literal:

class Leak
  STR = "RANDOM|STUFF|HERE|UNTIL|YOU|GET|TIRED|OF|TYPING|AND|ARE|SATISFIED|THAT|IT|WILL|LEAK|ENOUGH|MEMORY|TO|NOTICE"*100

  def test
    100.times { /#{STR}/i }
  end
end

t = Leak.new
t.test # If I run this a few times, it will start leaking about 5MB each time

Now, if I run GC.start after this, it will usually clean up about the last 5MB (or however much it's been using), and then t.test will only use a few KB, then almost a MB, then a couple MB, then back to 5MB each time, and once again, GC.start will only collect the last 5.

An alternate way to get the same result without a memory leak is to replace /#{STR}/i with RegExp.new(STR, true). That seems to work fine for me.

Is this a legitimate memory leak in Ruby or am I doing something wrong?

UPDATE: Okay, maybe I'm misreading this. I was looking at the memory usage of the docker container after running GC.start, which would sometimes go down, but since Ruby doesn't always release memory it's not using, I guess it could just be that Ruby uses this memory, and then, even though it's not being retained, it's still not releasing the memory back to the OS. Using the MemoryProfiler gem I see that total_retained, even after running it several times is 0.

The root problem here was we had containers crashing, theoretically due to memory usage, but perhaps it's not a memory leak, but just a lack of sufficient memory to allow Ruby to consume what it wants? Are there settings for the GC to help it decide when it's time to clean up before Ruby runs out of memory and crashes?

UPDATE 2: This still doesn't make sense though - because why would Ruby continue allocating more and more memory just from running the same process over and over (why wouldn't it use the memory previously allocated)? From what I understand, the GC is designed to run at least once before allocating more memory from the OS, so why is Ruby just allocating more and more memory when I run this several times?

UPDATE 3: In my isolated test, Ruby does seem to approach a limit where it stops allocating additional memory no matter how many times I run the test (seems to usually be around 120MB), but in my production code, I haven't hit such a limit yet (it goes up past 500MB without slowing down - possibly because there are more instances of this kind of memory usage scattered around the class). There may be a limit to how much memory it would use, but it seems to be manyfold higher than one would expect to be required to run this code (which really only uses a dozen or so MB for a single run)

Update 4: I've narrowed down the test case to something that really leaks! Reading a multibyte character from a file was the key to reproducing the real problem:

str = "String that doesn't fit into a single RVALUE, with a multibyte char:" + 160.chr(Encoding::UTF_8)
File.write('weirdstring.txt', str)

class Leak
  PATTERN = File.read("weirdstring.txt").freeze

  def test
    10000.times { /#{PATTERN}/i }
  end
end

t = Leak.new

loop do
  print "Running... "

  t.test


  # If this doesn't work on your system, just comment these lines out and watch the memory usage of the process with top or something
  mem = %x[echo 0 $(awk '/Private/ {print "+", $2}' /proc/`pidof ruby`/smaps) | bc].chomp.to_i
  puts "process memory: #{mem}"
end

So... this is a real leak, right?

mltsy
  • 6,598
  • 3
  • 38
  • 51
  • 2.4.4 might have bugs, so does this persist on 2.6.3? – tadman Jun 05 '19 at 16:11
  • Looks like the same problem is there, although, if I use MemoryProfiler there is no memory "retained" ... updating post.... – mltsy Jun 05 '19 at 16:57
  • The garbage collector only kicks in when it thinks it needs to. Maybe it's not considering those allocations serious enough to clean up. – tadman Jun 05 '19 at 16:59

2 Answers2

2

It was a memory leak!

https://bugs.ruby-lang.org/issues/15916

Should be fixed in one of the next releases of Ruby (2.6.4 or 2.6.5?)

mltsy
  • 6,598
  • 3
  • 38
  • 51
1

GC does kill unused objects and frees memory for the Ruby process, but the Ruby process never releases this memory to OS. But this is not the same as a memory leak (because under normal circumstances at some point Ruby process has enough memory allocated and doesn't grow any more - very roughly speaking). Memory leaks happen when GC cannot release memory (due to bugs, bad code, etc) and Ruby process has to borrow more and more memory.

This is not the case with your code - it does not contain memory leaks, but it does contain an efficiency problem.

What happens when you do 100.times { /#{STR}/i } is that you

  1. Create 100 very long strings (when interpolating the constant within the pattern literal)...

  2. ... and then create 100 regexp from these strings.

All this requires unnecessary allocations making Ruby process use more memory (and degrading performance too - GC is quite expensive). Changing the class definition into

class Leak
  PAT = /"RANDOM|STUFF|HERE|UNTIL|YOU|GET|TIRED|OF|TYPING|AND|ARE|SATISFIED|THAT|IT|WILL|LEAK|ENOUGH|MEMORY|TO|NOTICE"*100/i

  def test
    100.times { PAT }
  end
end

(e.g. memoize not the string itself but the pattern created from it as a constant and then reuse it) reduces the memory allocations during the same test call by both String and Regexp class in order of magnitude (according to the memory_profilers report).

Konstantin Strukov
  • 2,899
  • 1
  • 10
  • 14
  • 1
    So... I understand that Ruby doesn't release memory to the OS, but that doesn't explain why running `100.times { /#{STR}/i }` over and over continues to utilize more and more memory, right? Even after the first interpolation, the interpolated string is discarded, which should free up enough memory for the second one. Or at least before more memory is allocated from the OS, Ruby should free up all those old unused slots, making space for the next `100.times`, right?? But running it over and over consumes more and more and more memory, which is what doesn't make sense to me...? – mltsy Jun 07 '19 at 17:05