60

I've discovered a memory leak in my Rails code - that is to say, I've found what code leaks but not why it leaks. I've reduced it down to a test case that doesn't require Rails:

require 'csspool'
require 'ruby-mass'

def report
    puts 'Memory ' + `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)[1].to_s + 'KB'
    Mass.print
end

report

# note I do not store the return value here
CSSPool::CSS::Document.parse(File.new('/home/jason/big.css'))

ObjectSpace.garbage_collect
sleep 1

report

ruby-mass supposedly lets me see all the objects in memory. CSSPool is a CSS parser based on racc. /home/jason/big.css is a 1.5MB CSS file.

This outputs:

Memory 9264KB

==================================================
 Objects within [] namespace
==================================================
  String: 7261
  RubyVM::InstructionSequence: 1151
  Array: 562
  Class: 313
  Regexp: 181
  Proc: 111
  Encoding: 99
  Gem::StubSpecification: 66
  Gem::StubSpecification::StubLine: 60
  Gem::Version: 60
  Module: 31
  Hash: 29
  Gem::Requirement: 25
  RubyVM::Env: 11
  Gem::Specification: 8
  Float: 7
  Gem::Dependency: 7
  Range: 4
  Bignum: 3
  IO: 3
  Mutex: 3
  Time: 3
  Object: 2
  ARGF.class: 1
  Binding: 1
  Complex: 1
  Data: 1
  Gem::PathSupport: 1
  IOError: 1
  MatchData: 1
  Monitor: 1
  NoMemoryError: 1
  Process::Status: 1
  Random: 1
  RubyVM: 1
  SystemStackError: 1
  Thread: 1
  ThreadGroup: 1
  fatal: 1
==================================================

Memory 258860KB

==================================================
 Objects within [] namespace
==================================================
  String: 7456
  RubyVM::InstructionSequence: 1151
  Array: 564
  Class: 313
  Regexp: 181
  Proc: 113
  Encoding: 99
  Gem::StubSpecification: 66
  Gem::StubSpecification::StubLine: 60
  Gem::Version: 60
  Module: 31
  Hash: 30
  Gem::Requirement: 25
  RubyVM::Env: 13
  Gem::Specification: 8
  Float: 7
  Gem::Dependency: 7
  Range: 4
  Bignum: 3
  IO: 3
  Mutex: 3
  Time: 3
  Object: 2
  ARGF.class: 1
  Binding: 1
  Complex: 1
  Data: 1
  Gem::PathSupport: 1
  IOError: 1
  MatchData: 1
  Monitor: 1
  NoMemoryError: 1
  Process::Status: 1
  Random: 1
  RubyVM: 1
  SystemStackError: 1
  Thread: 1
  ThreadGroup: 1
  fatal: 1
==================================================

You can see the memory going way up. Some of the counters go up, but no objects specific to CSSPool are present. I used ruby-mass's "index" method to inspect the objects that have references like so:

Mass.index.each do |k,v|
    v.each do |id|
        refs = Mass.references(Mass[id])
        puts refs if !refs.empty?
    end
end

But again, this doesn't give me anything related to CSSPool, just gem info and such.

I've also tried outputting "GC.stat"...

puts GC.stat
CSSPool::CSS::Document.parse(File.new('/home/jason/big.css'))
ObjectSpace.garbage_collect
sleep 1
puts GC.stat

Result:

{:count=>4, :heap_used=>126, :heap_length=>138, :heap_increment=>12, :heap_live_num=>50924, :heap_free_num=>24595, :heap_final_num=>0, :total_allocated_object=>86030, :total_freed_object=>35106}
{:count=>16, :heap_used=>6039, :heap_length=>12933, :heap_increment=>3841, :heap_live_num=>13369, :heap_free_num=>2443302, :heap_final_num=>0, :total_allocated_object=>3771675, :total_freed_object=>3758306}

As I understand it, if an object is not referenced and garbage collection happens, then that object should be cleared from memory. But that doesn't seem to be what's happening here.

I've also read about C-level memory leaks, and since CSSPool uses Racc which uses C code, I think this is a possibility. I've run my code through Valgrind:

valgrind --partial-loads-ok=yes --undef-value-errors=no --leak-check=full --fullpath-after= ruby leak.rb 2> valgrind.txt

Results are here. I'm not sure if this confirms a C-level leak, as I've also read that Ruby does things with memory that Valgrind doesn't understand.

Versions used:

  • Ruby 2.0.0-p247 (this is what my Rails app runs)
  • Ruby 1.9.3-p392-ref (for testing with ruby-mass)
  • ruby-mass 0.1.3
  • CSSPool 4.0.0 from here
  • CentOS 6.4 and Ubuntu 13.10
Dave Powers
  • 2,051
  • 2
  • 30
  • 34
jb_314
  • 1,373
  • 2
  • 17
  • 28

4 Answers4

42

It looks like you are entering The Lost World here. I don’t think the problem is with c-bindings in racc either.

Ruby memory management is both elegant and cumbersome. It stores objects (named RVALUEs) in so-called heaps of size of approx 16KB. On a low level, RVALUE is a c-struct, containing a union of different standard ruby object representations.

So, heaps store RVALUE objects, which size is not more than 40 bytes. For such objects as String, Array, Hash etc. this means that small objects can fit in the heap, but as soon as they reach a threshold, an extra memory outside of the Ruby heaps will be allocated.

This extra memory is flexible; is will be freed as soon as an object became GC’ed. That’s why your testcase with big_string shows the memory up-down behaviour:

def report
  puts 'Memory ' + `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`
          .strip.split.map(&:to_i)[1].to_s + 'KB'
end
report
big_var = " " * 10000000
report
big_var = nil 
report
ObjectSpace.garbage_collect
sleep 1
report
# ⇒ Memory 11788KB
# ⇒ Memory 65188KB
# ⇒ Memory 65188KB
# ⇒ Memory 11788KB

But the heaps (see GC[:heap_length]) themselves are not released back to OS, once acquired. Look, I’ll make a humdrum change to your testcase:

- big_var = " " * 10000000
+ big_var = 1_000_000.times.map(&:to_s)

And, voilá:

# ⇒ Memory 11788KB
# ⇒ Memory 65188KB
# ⇒ Memory 65188KB
# ⇒ Memory 57448KB

The memory is not released back to OS anymore, because each element of the array I introduced suits the RVALUE size and is stored in the ruby heap.

If you’ll examine the output of GC.stat after the GC was run, you’ll find that GC[:heap_used] value is decreased as expected. Ruby now has a lot of empty heaps, ready.

The summing up: I don’t think, the c code leaks. I think the problem is within base64 representation of huge image in your css. I have no clue, what’s happening inside parser, but it looks like the huge string forces the ruby heap count to increase.

Hope it helps.

Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160
  • I guess I'll have to research the parser a bit more to see if I can find what part of its code causes this kind of behaviour and if it could be rewritten to avoid it, because there doesn't seem to be any way to make Ruby not do this. You get the bounty for giving a small testcase that demonstrates the issue and giving me an idea of what to look for. – jb_314 Dec 16 '13 at 17:41
  • Was glad to help. I have taken a glance at the parser code as well. I would suggest you to start research with `escape_css_*` methods in `csspool/lib/csspool/visitors/to_css.rb`. There are suspicious calls to `gsub` with a codeblock. If I’m not mistaken, it allocates `|char|` block variable in ruby heap, while background in your css file is about 1M chars. – Aleksei Matiushkin Dec 16 '13 at 19:01
  • @mudasobwa Does this mean that any long-running ruby process that uses small variables (and thus RVALUEs) is going to grow in size with time? I can't quite get my head around this. Can you point to any official (or complimentary) documentation on this? – YWCA Hello Apr 06 '18 at 21:28
  • @YWCAHello No, it does not mean _any_ process is going to grow it’s size. GC’ed RValues are returned back to Ruby memory manager. The new memory is being requested from OS if and only there are no more free slots available, meaning the process tries to create the enormous number of RValues simultaneously (e.g. by processing external CSV with different unique small values at once, not in batches.) Under normal conditions it’ll reuse GC’ed slots. DuckDuckGo “Ruby RValue memory management” or check Ruby source code ([tag:c]) for details. – Aleksei Matiushkin Apr 07 '18 at 04:06
15

Okay, I found the answer. I am leaving my other answer up because that information was very difficult to gather, it is related, and it could help someone else searching for a related issue.

Your problem, however, appears to be due to the fact that Ruby actually does not release memory back to the Operating System once it has acquired it.

Memory Allocation

While Ruby programmers do not often worry about memory allocation, sometimes the following question comes up:

Why did my Ruby process stay so big even after I’ve cleared all references to big objects? I’m /sure/ GC has run several times and freed my big objects and I’m not leaking memory.

A C programmer might ask the same question:

I free()-ed a lot of memory, why is my process still so big?

Memory allocation to user space from the kernel is cheaper in large chunks, thus user space avoids interaction with the kernel by doing more work itself.

User space libraries/runtimes implement a memory allocator (e.g.: malloc(3) in libc) which takes large chunks of kernel memory2 and divides them up into smaller pieces for user space applications to use.

Thus, several user space memory allocations may occur before user space needs to ask the kernel for more memory. Thus if you got a large chunk of memory from the kernel and are only using a small part of that, that large chunk of memory remains allocated.

Releasing memory back to the kernel also has a cost. User space memory allocators may hold onto that memory (privately) in the hope it can be reused within the same process and not give it back to the kernel for use in other processes. (Ruby Best Practices)

So, your objects may very well have been garbage collected and released back to Ruby's available memory, but because Ruby never gives back unused memory to the OS, the rss value for the process remains the same, even after garbage collection. This is actually by design. According to Mike Perham:

...And since MRI never gives back unused memory, our daemon can easily be taking 300-400MB when it’s only using 100-200.

It’s important to note that this is essentially by design. Ruby’s history is mostly as a command line tool for text processing and therefore it values quick startup and a small memory footprint. It was not designed for long-running daemon/server processes. Java makes a similar tradeoff in its client and server VMs.

Community
  • 1
  • 1
Joe Edgar
  • 872
  • 5
  • 13
  • 1
    A simple test shows that Ruby does release memory (at least as far as the rss value measures). Run this (forgive the lack of whitespace): `def report; puts 'Memory ' + \`ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"\`.strip.split.map(&:to_i)[1].to_s + 'KB'; end; report; big_var = " " * 10000000; report; big_var = nil; report; ObjectSpace.garbage_collect; sleep 1; report`; The result is: Memory 7132KB Memory 17100KB Memory 17100KB Memory 7340KB – jb_314 Dec 05 '13 at 06:24
  • Hmm, the results of this seem odd and inconsistent. When using 1.9.3-p429 I get the following results: `At Program Start: Memory 51408KB; After Creating big var: Memory 61176KB; After setting big var to nil: Memory 61176KB; After Garbage Collection: Memory 61176KB;` Using Ruby 2.0.0-p195 I get these results: `At Program Start: Memory 80772KB; After Creating big var: Memory 90540KB; After setting big var to nil: Memory 90540KB; After Garbage Collection: Memory 90612KB; ` – Joe Edgar Dec 05 '13 at 14:07
  • With 1.9.3p484 I get similar results as my run with 2.0.0-p247 -Memory 9364KB, Memory 19136KB, Memory 19136KB, Memory 9368KB. In my production app I can definitely see the RSS size go up and then back down when dealing with any requests other than the ones that trigger the CSS parser. Perhaps your second link is only talking about older versions of Ruby or are referring to a different measurement (virtual memory size?). Also, http://www.mikeperham.com/2009/05/25/memory-hungry-ruby-daemons/comment-page-1/#comment-752 – jb_314 Dec 05 '13 at 16:09
  • What operating system are you using? – Joe Edgar Dec 06 '13 at 15:27
  • Ubuntu 13.10 for dev, CentOS 6.4 for production. I see similar results for the big_var test on both - RSS goes back down after the variable is deferenced and GC runs. – jb_314 Dec 06 '13 at 21:17
  • Sorry, I have been a bit busy. I plan on testing this on my Ubuntu 12.04 machine as soon as I get a chance to see if it could be related to how Ruby and the operating system work together. Regardless, I would still think if it is giving you back the big_var objects after destruction that it would give back CSSPool::CSS:Document objects. There may be a memory leak in the actual C code (at its top level it utilizes Racc::Parser, written in C). It is very difficult to track down due to some inconsistent behavior as well as a general lack info out there docs on the garbage collector. – Joe Edgar Dec 08 '13 at 18:57
  • I appreciate your research so far. I've considered the possibility of a C-level leak. If that's what it is, I would like to know how to confirm it so I could pass it on to the racc guys. I've run Valgrind against my test and edited the question to include the results, but I'm unsure how to interpret it. – jb_314 Dec 08 '13 at 19:21
10

Building on @mudasobwa's explanation, I finally tracked down the cause. The code in CSSPool was checking the very long data URI for escape sequences. It would call scan on the URI with a regexp that matched an escape sequence or a single character, map those results to unescape, and then join it back into a string. This was effectively allocating a string for every character in the URI. I modified it to gsub the escape sequences, which seems to have the same results (all the tests pass) and greatly reduces the ending memory used.

Using the same test case as originally posted (minus the Mass.print output) this is the result before the change:

Memory 12404KB
Memory 292516KB

and this is the result after the change:

Memory 12236KB
Memory 19584KB
Dave Powers
  • 2,051
  • 2
  • 30
  • 34
jb_314
  • 1,373
  • 2
  • 17
  • 28
9

This could be due to the "Lazy Sweeping" feature in Ruby 1.9.3 and above.

Lazy sweeping basically means that, during garbage collection, Ruby only "sweeps" away enough objects to create space for the new objects it needs to create. It does this because, while the Ruby garbage collector runs, nothing else does. This is known as "Stop the world" garbage collection.

Essentially, lazy sweeping reduces the time that Ruby needs to "stop the world." You can read more about lazy sweeping here.

What does your RUBY_GC_MALLOC_LIMIT environment variable look like?

Here is an excerpt from Sam Saffron's blog concerning lazy sweeping and the RUBY_GC_MALLOC_LIMIT:

The GC in Ruby 2.0 comes in 2 different flavors. We have a "full" GC that runs after we allocate more than our malloc_limit and a lazy sweep (partial GC) that will run if we ever run out of free slots in our heaps.

The lazy sweep takes less time than a full GC, however only performs a partial GC. It's goal is to perform a short GC more frequently thus increasing overall throughput. The world stops, but for less time.

The malloc_limit is set to 8MB out of the box, you can raise it by setting the RUBY_GC_MALLOC_LIMIT higher.

Is your RUBY_GC_MALLOC_LIMIT extremely high? Mine is set to 100000000 (100MB). The Default is around 8MB, but for rails apps they recommend it to be quite a bit higher. If yours is too high it could be preventing Ruby from deleting garbage objects, because it thinks it has plenty of room to grow.

Community
  • 1
  • 1
Joe Edgar
  • 872
  • 5
  • 13
  • I have not modified that setting, though I suppose it could have been set by something. How would I check it? – jb_314 Dec 05 '13 at 05:03
  • I've modified the question to include the output of "GC.stat", as mentioned in your link. – jb_314 Dec 05 '13 at 05:07
  • If you changed RUBY_GC_MALLOC_LIMIT you would likely know it (unless someone else set up your machine for you). You can see if it was set manually by typing echo $RUBY_GC_MALLOC_LIMIT into the command line. If it doesn't return a value then it is probably set to the default of 8mb. I will keep pondering this. – Joe Edgar Dec 05 '13 at 05:19
  • Yes, I had tried echo $RUBY_GC_MALLOC_LIMIT and got no result, so I figured I may have been looking in the wrong place. Also, relating to your lazy vs full GC, the result of this problem in my production app is all memory on my server is consumed until the process is killed, so I would assume that Ruby at that point would be doing anything it could get to more memory. – jb_314 Dec 05 '13 at 05:26
  • `ObjectSpace.garbage_collect` after `big_var = " " * 100000000; big_var = nil` does nothing. If I add one more `0`, it does (moves from 1gb down to 5mb). Ruby 2.0, MacOS. – Nakilon May 05 '17 at 13:40