PostgreSQL queries are gobbling up VM

Question

[Prescript: I know that nothing here is specific to Delayed::Job. But it helps establish the context.]

update

I believe the SQL queries are not being garbage collected. My application generates many large SQL insert/update operations (160K bytes each, about 1 per second) and sends them to PostgreSQL via:

ActiveRecord::Base.connection.execute(my_large_query)

When I perform these db operations, my application slowly grows without bound. When I stub out the db operations (but perform all the other functions in my app) the bloating stops.

So: any ideas on why this is happening, how I can pinpoint it, or how I can make it stop?

original question

I have delayed tasks that slurp data from the web and create records in a PostgreSQL database. They seem to be working okay, but they start at vmemsize=100M and within ten minutes bulk up to vmemsize=500M and just keeps growing. My MacBook Pro with 8G of RAM starts thrashing when the VM runs out.

How can I find where the memory is going?

Before you refer me to other SO posts on the topic:

I've added the following to my #after(job) method:

def after(job)
  clss = [Object, String, Array, Hash, ActiveRecord::Base, ActiveRecord::Relation]
  clss.each {|cls| object_report(cls, " pre-gc")}
  ObjectSpace.each_object(ActiveRecord::Relation).each(&:reset)
  GC.start
  clss.each {|cls| object_report(cls, "post-gc")}
end

def object_report(cls, msg)
  log(sprintf("%s: %9d %s", msg, ObjectSpace.each_object(cls).count, cls))
end

It reports usage on the fundamental classes, explicitly resets ActiveRecord::Relation objects (suggested by this SO post), explicitly does a GC (as suggested by this SO post), and reports on how many Objects / Strings / Arrays / Hashes, etc there are (as suggested by this SO post). For what it's worth, none of those classes are growing significantly. (Are there other classes I should be looking at? But wouldn't that be reflected in the number of Objects anyway?)

I can't use memprof because I'm running Ruby 1.9.

And there are other tools that I'd consider if I were running on Linux, but I'm on OS X.

Are you by any chance creating vast numbers of distinct symbols? — Frederick Cheung, Jul 12 '12 at 16:04
Perhaps -- how would I check for that (since Symbol is invisible to ObjectSpace)? — fearless_fool, Jul 12 '12 at 17:02
Great suggestion (and I was hopeful that this was the cause), but my symbol count starts at 19730 and increases by less than 100 after ten minutes. What other immediates aren't GC'd? — fearless_fool, Jul 12 '12 at 17:44
If you've got a c-extension leaking memory that won't show up in the objectspace numbers — Frederick Cheung, Jul 12 '12 at 17:59
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/13810/discussion-between-fearless-fool-and-frederick-cheung) — fearless_fool, Jul 12 '12 at 18:02

fearless_fool · Answer 1 · 2012-07-15T03:08:53.997

update

I'm afraid this was all a red herring: left running long enough, each ruby job grows to a vmsize of about 1.2GB (yeah, that big, but not huge by today's standards), then shrinks back down to 850MB and bobbles between those two values thereafter without continuing to grow bigger.

My real problem was that I was trying to run more than four such processes on my machine with 8GB RAM, which filled up all available RAM and then went into swapping hypoxia. Running only four processes almost fills up available memory, so the system doesn't start swapping.

update 2

Nope, still a problem -- I didn't let the jobs run long enough: the jobs grow continually (albeit slowly). Even running just two external jobs eventually consumes all VM and my machine starts thrashing.

I tried running the in production mode (thinking that dev mode may cache things that don't get freed), but it didn't make any appreciable difference.

PostgreSQL queries are gobbling up VM

update

original question

1 Answers1

update

update 2