Zeromq memory leak (pyzmq)

Question

Hi I'm trying to send large packets with ZeroMQ using the ventilator/worker/sink pattern.

I try adding workers. Each time, the sink process memory usage increases a little. Then it reaches a tipping point at about 6 or 7 workers where suddenly memory increases exponentially until it dies with:

> *** error: can't allocate region
> *** set a breakpoint in malloc_error_break to debug Assertion failed: (msg_->flags | ZMQ_MSG_MASK) == 0xff (zmq.cpp:211)
> Python(42410,0xaccb8a28) malloc: *** mmap(size=3559424) failed (error
> code=12)

Here is the code (showing only the worker/sink pattern):

import sys
import resource
import zmq
import time

context = zmq.Context()


if sys.argv[1] == 'worker':
    # Socket to send messages to

    sender = context.socket(zmq.PUSH)
    sender.connect("tcp://localhost:5558")

    while True:
        msg = 'x' * 3559333
        time.sleep(.01)
        sender.send(msg)
else:
    # Socket to receive messages on

    receiver = context.socket(zmq.PULL)
    receiver.bind("tcp://*:5558")
    while True:
        msg = receiver.recv()

        print msg[0:5], len(msg), resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

Is this simply a lack of hardware resources? A backlog of data? Or is there a way of avoiding this?

I'm running OSX Mountain Lion with 16gb memory and Python 2.7 with zmq 2.2.0.1.

Thanks

score 2 · Answer 1 · answered Dec 04 '12 at 16:56

Is this simply a lack of hardware resources?

Well, let's do the math. Each worker is sending 3.3MB every 10ms. Or about 300mb a second. Now you add more workers. By the time you are up to 5 workers you are sending about 1.5GB per second.

I think you have found the performance limit for your machine. When the sink process is running on the same machine as all the workers, it is capable of consuming somewhere between 1-2GB per second. When the data is coming in faster than that the queue's build up in the sink process faster than they can be emptied and you run out of memory.

Or is there a way of avoiding this?

Send smaller messages? Less frequently? :) Or put the workers and sink process on different machines. Remember the workers are stealing CPU resources from the sink. If this is a quad core machine, then with the sink plus up to 3 workers, the OS is probably allocating almost all of a processor core to each process.

Once you add the 4th, 5th, 6th worker, the OS cannot give 100% of a core to any process. They have to start sharing, so the sink slows down even as the rate of messages speeds up. That would explain the tipping point you are seeing where memory usage increases exponentially.

Hmmm - which suggests an interesting experiment. Can you configure your mac so that the sink process runs at a really high priority? That might give better results. I have never tried this myself, but see the following link for ideas ... https://discussions.apple.com/thread/1491812?start=0&tstart=0

Thanks for this. I tried nicing and it didn't help but yeah it's gotta be a lack of network. I might sink on a separate box, as you say. I may even have multiple sinks if necessary. — user1556658, Dec 04 '12 at 20:17
This does suggest though that I'm better off not spreading this over multiple boxes. The compute time is too little relative to network transfer limits. I'm better off just getting one of those 20 compute unit boxes on ec2 and doing it all there. I wonder how often this happens, in a world where hadoop is sold as the solution to all such tasks. — user1556658, Dec 05 '12 at 11:02
It's of course more than the above, but not much more than a split on tab delimited text and some counting and filtering. — user1556658, Dec 05 '12 at 11:03
I tried nicing the workers to have lowest priority. It didn't help — user1556658, Dec 05 '12 at 22:04
Also note that ZeroMQ:s queues are set to 1000 per default. If a buffer fills up with a message size of 3.3 MB, that is 3.3GB right there. Each Publisher holds a separate queue per subscriber, meaning that you quickly run out of mem here. The queue sizes are adjustable though. I am not sure if the Sink holds one queue per incoming worker but that is likely. — Jakob Möllås, Dec 08 '12 at 14:25

Zeromq memory leak (pyzmq)

1 Answers1