29

I have a simple ActiveRecord model called Student with 100 records in the table. I do the following in a rails console session:

ObjectSpace.each_object(ActiveRecord::Base).count
# => 0

x = Student.all

ObjectSpace.each_object(ActiveRecord::Base).count
# => 100

x = nil
GC.start

ObjectSpace.each_object(ActiveRecord::Base).count
# => 0     # Good!

Now I do the following:

ObjectSpace.each_object(ActiveRecord::Base).count
# => 0

x = Student.all.group_by(&:last_name)

ObjectSpace.each_object(ActiveRecord::Base).count
# => 100

x = nil
GC.start

ObjectSpace.each_object(ActiveRecord::Base).count
# => 100     # Bad!

Can anyone explain why this happens and whether there is a smart way to solve this without knowing the underlying hash structure? I know I can do this:

x.keys.each{|k| x[k]=nil}
x = nil
GC.start

and it will remove all Student objects from memory correctly, but I'm wondering if there is a general solution (my real-life problem is wide spread and has more intricate data structures than the hash shown above).

I'm using Ruby 1.9.3-p0 and Rails 3.1.0.

UPDATE (SOLVED)

Per Oscar Del Ben's explanation below, a few ActiveRecord::Relation objects are created in the problematic code snippet (they are actually created in both code snippets, but for some reason they "misbehave" only in the second one. Can someone shed light on why?). These maintain references to the ActiveRecord objects via an instance variable called @records. This instance variable can be set to nil through the "reset" method on ActiveRecord::Relation. You have to make sure to perform this on all the relation objects:

ObjectSpace.each_object(ActiveRecord::Base).count
# => 100

ObjectSpace.each_object(ActiveRecord::Relation).each(&:reset)

GC.start
ObjectSpace.each_object(ActiveRecord::Base).count
# => 0

Note: You can also use Mass.detach (using the ruby-mass gem Oscar Del Ben referenced), though it will be much slower than the code above. Note that the code above does not remove a few ActiveRecord::Relation objects from memory. These seem to be pretty insignificant though. You can try doing:

Mass.index(ActiveRecord::Relation)["ActiveRecord::Relation"].each{|x| Mass.detach Mass[x]}
GC.start

And this would remove some of the ActiveRecord::Relation objects, but not all of them (not sure why, and those that are left have no Mass.references. Weird).

AmitA
  • 3,239
  • 1
  • 22
  • 31
  • May be unique to 1.9 or 3.1 - I'm not seeing this behavior with Rails 3.0.7 and ruby enterprise (ree 1.8.7). – klochner Jun 25 '12 at 15:49
  • Thank you Klochner! I just ran the code under Ruby 1.8.7-p174. It seems that Ruby 1.8.7 handles the object destruction correctly on both Rails 3.0.7 and Rails 3.1.0. I.e. in the second example I get 0 objects. I also tried Ruby 1.9.2, and the same issue happens as with 1.9.3. Would you guess there is a bug in YARV? – AmitA Jun 26 '12 at 01:36
  • I ran a test with Ruby 1.8.7 and Rails 2.3.12. I only tested in the console and had the same problem. **Except**, when I wrote garbage in the console like `asdasdsa` to initiate a `NameError`. After this `GC.start` cleaned up everything. Not sure if just a curious side-effect or something more significant. – Casper Jun 26 '12 at 12:16

2 Answers2

11

I think I know what's going on. Ruby's GC wont free immutable objects (like symbols!). The keys returned by group_by are immutable strings, and so they wont be garbage collected.

UPDATE:

It seems like the problem is not with Rails itself. I tried using group_by alone, and sometimes the objects would not get garbage collected:

oscardelben~/% irb
irb(main):001:0> class Foo
irb(main):002:1> end
=> nil
irb(main):003:0> {"1" => Foo.new, "2" => Foo.new}
=> {"1"=>#<Foo:0x007f9efd8072a0>, "2"=>#<Foo:0x007f9efd807250>}
irb(main):004:0> ObjectSpace.each_object(Foo).count
=> 2
irb(main):005:0> GC.start
=> nil
irb(main):006:0> ObjectSpace.each_object(Foo).count
=> 0
irb(main):007:0> {"1" => Foo.new, "2" => Foo.new}.group_by
=> #<Enumerator: {"1"=>#<Foo:0x007f9efb83d0c8>, "2"=>#<Foo:0x007f9efb83d078>}:group_by>
irb(main):008:0> GC.start
=> nil
irb(main):009:0> ObjectSpace.each_object(Foo).count
=> 2 # Not garbage collected
irb(main):010:0> GC.start
=> nil
irb(main):011:0> ObjectSpace.each_object(Foo).count
=> 0 # Garbage collected

I've digged through the GC internals (which are surprisingly easy to understand), and this seems like a scope issue. Ruby walks through all the objects in the current scope and marks the ones which it thinks are still being used, after that it goes through all the objects in the heap and frees the ones which have not been marked.

In this case I think the hash is still being marked even though it's out of scope. There are many reasons why this may happening. I'll keep investigating.

UPDATE 2:

I've found what's keeping references of objects. To do that I've used the ruby mass gem. It turns out that Active Record relation keeps track of the objects returned.

User.limit(1).group_by(&:name)
GC.start
ObjectSpace.each_object(ActiveRecord::Base).each do |obj|
  p Mass.references obj # {"ActiveRecord::Relation#70247565268860"=>["@records"]}
end

Unfortunately, calling reset on the relation didn't seem to help, but hopefully this is enough information for now.

Oscar Del Ben
  • 4,485
  • 1
  • 27
  • 41
  • 1
    Hmm..you would think the hash returned by `group_by` should still get released. And with that all the elements attached to the hash (except the keys). So this doesn't make sense to me. Otherwise every hash you ever wrote with symbols as keys would stay in memory for ever. – Casper Jun 26 '12 at 12:20
  • Casper, I don't know for sure, but that's what I think it's happening. – Oscar Del Ben Jun 26 '12 at 14:07
  • Oh wow, Mass is an awesome gem! Thank you for referencing it! It seems indeed that ActiveRecord::Relation maintains references to ActiveRecord objects, causing this issue. For me calling #reset on *all* relations actually worked. See the update in my question. Thanks again! – AmitA Jul 01 '12 at 06:08
  • Hey Oscar, thanks again for your help on this question. It sheds light on what is happening behind the scenes. I think it should not be the expected behavior, so I opened an issue here: https://github.com/rails/rails/issues/6929 – AmitA Jul 02 '12 at 03:52
2

i do not know the answer

But i tried inspecting the heap as given on http://blog.headius.com/2010/07/browsing-memory-jruby-way.html

Have attached a screenshot at, https://skitch.com/deepak_kannan/en3dg/java-visualvm it was a simple program

class Foo; end
f1 = Foo.new
f2 = Foo.new
GC.start

Then used jvisualvm as given above. Was running this in irb.
Seems as if jruby is tracking the object's scope. The object will not get GC'ed if there are any non-weak references to that object

deepak
  • 7,230
  • 5
  • 24
  • 26
  • ran irb like: jruby -J-Djruby.reify.classes=true -X+O `which irb` – deepak Jun 24 '12 at 12:07
  • Thanks for answering Deepak. I am not using JRuby though. Did you try the code I posted in JRuby? Does JRuby's GC behaves differently than MRI with regards to my code? – AmitA Jun 25 '12 at 05:34
  • @AmitA i did not try the code you posted. Tried a simpler version because it is easier to debug and visualize. Specifically i wanted to dump the heap, tried some extensions on MRI but it did not compile. JRuby runs on the JVM so the GC is different from MRI. – deepak Jun 26 '12 at 05:58