Prevent Gearman from exhausting system memory

Question

I am currently looking at replacing our home brew batch processor with Gearman. It runs reports that can take up to several hundred megs of memory (PHP). As such, if too many of these reports are run the server will lock. I've had to add in logic to prevent the controlling process from spawning to many workers if memory is low and overloading our server and letting it crash.

If I switch over to Gearman is there some type of logic in place to prevent additional workers if system memory gets low? I see an option to limit workers, but this does not directly solve the issue. Additionally is it smart enough to balance the work load between systems if one system becomes overwhelmed?

What recommendations do others have out there? Is it possible for me to insert my own checks into Gearman to spawn workers when the conditions are right? Or what other solutions are there?

Developing on a LAMP stack and I'm not very familiar with Gearman, so rebuke where needed.

score 1 · Accepted Answer · answered Jul 23 '12 at 09:19

Limiting the number of workers is the way to go - if you're expecting reports to use 3-400 MB of memory, limit the number of workers to somewhere around / 400MB.

You'll not be able to extend Gearman itself to spawn workers if your memory usage is lower than expected, but you could create a wrapper that handles your workers and does that for you. Before you go that route take a look at extending GearmanManager to handle such issues. My suggestion is however to just let it be, and instead adjust the number of workers after you get some experience with exactly what kind of loads you're expecting (both in how fast the requests for reports come in, the size of the reports memory wise and how fast you'll need a response to the user requesting the report).

Gearman will automagically load balance to the most responsive server - when a task arrives to gearmand, it'll poll all the available workers and tell them a new task has arrived, and the client that first responds will get the task. This means that if a server has a load, it'll respond slower to the request and the task will usually end up at the server with more available processing power (ignoring variance in network delays). This will also handle differently sized servers automagically.

Ok, well I was thinking that may be my only option, thanks for the info on load balancing. Should be able to get off to a good start — SeanDowney, Jul 23 '12 at 16:03

Prevent Gearman from exhausting system memory

1 Answers1