generating images in single producer single consumer model

Question

I have the following setting: a webpage is generating requests for images which are rendered on the server implemented in C++. The object generating images is very expensive to initialize so it is kept initialized in 3-4 copies on the server.

There are usually a number ~10 requests for images at the same time, so the copies of the object need to be locked.

The problem is that objects need to be locked for a rather long period ~0.5-1.0 seconds until the image is done rendering. What is currently done is there is an array of locks for each copy of the object and image requests are assigned to a particular copy in a random way.

Benchmarking with mutrace shows contention of locks:

mutrace: Total runtime is 127612.949 ms. mutrace: Showing 10 most contended mutexes:

+-----------------------------------------------------------------------------------+
|  Mutex#   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags  |
+-----------------------------------------------------------------------------------+
|    65027  5129457   387745    79148      754.146        0.000        8.045 M-.--. |
|   443324   754545   172260    25960     7984.958        0.011       43.426 Mx.--. |
|    20645  1728872     5531      412      579.019        0.000        5.091 M-.--. |
|   540024      453      406      280    50068.601      110.527      830.096 M-.--. |
|   539797      462      413      254    39928.156       86.425      834.889 M-.--. |
|   540460      475      419      244    34194.536       71.988      698.798 M-.--. |
|   299764     3036     2091      215      149.902        0.049       11.128 Mx.--. |
|   491395      108       94       87      545.591        5.052       58.174 M-.--. |
|   518584    41440     1744       79      367.372        0.009        6.292 M-.--. |
|   487295    48304     5491       69      250.457        0.005       64.186 M-.--. |
+-----------------------------------------------------------------------------------+

Basically locks 539797,539797,540460 are contented the entire time. I am thinking of using a single producer/single consumer lock-free queue for each object that generates images, here's roughly the pseudocode:

Whenever an image request comes to the server, the callback is invoked with some image parameters:

function serverCallBack(params imageParams) {        
    queueId = imageParams.getQueueId()
    queues[queueId].put(imageParams)
    result = getImage()
    return image
}

Where I am stuck is that we need a way to get the resulting image when the object is done rendering. Any ideas on how to implement this one?

Clarification

It is clear that if there are more requests than objects, then some requests will be blocked. The question is weather it is possible to implement blocking in a more efficient way than with mutexes

Can you just put the image generator objects into an object pool? Threads wanting to use an instance take it, process and then return it. Pretty simple. — usr, Dec 22 '13 at 14:57
@usr But that's the same as locking an object isn't it? Object pool would need to lock the object currently being in use. And I would like to reduce lock contention — iggy, Dec 22 '13 at 15:00
Do you have a contention problem? If rendering an image takes a long time, only 4 render operations can be running in parallel and 10 requests are outstanding you will always have 6 requests blocked. Do you agree? This is not a problem, this is fundamentally so. How could you possibly mitigate this? You can't. — usr, Dec 22 '13 at 16:24
@usr I agree, however there might be a way to wake the waiting threads only when an object is available and not make them wait on the lock. For example there are futexes that are implemented partially in user space and avoid expensive switching between user space and kernel space to check if the lock is free or not. The problem with futexes is that I couldn't find a cross platform implementation — iggy, Dec 22 '13 at 16:33
How many requests do you have per second? 100? What do you care about locking costs then? They are almost nothing compared to all the processing.; "and not make them wait on the lock" they will have to wait on *something*, call it what you want. There are no user-mode locks that do not require kernel support for long-time waits. In your case, the waits are long-term because processing takes a long time. Note, that waiting is free. Once you start to wait, continuing to wait is free. — usr, Dec 22 '13 at 16:57
My recommendation: make sure that requests do not wait on a specific (random) generator object. That introduces unnecessary fairness and under-utilization problems. Make all requests go through a central queue (like an object pool). — usr, Dec 22 '13 at 16:58
@usr Thanks. The code is being run on both desktop and android devices and it seems that locking on android gives far more obvious slowdown than on a desktop. It is particularly hard to profile on android but based on all the observations so far, contention seems to be a problem — iggy, Dec 22 '13 at 17:11
How many requests do you have per second? How exactly did you come to the conclusion that contention is resulting in lower throughput? — usr, Dec 22 '13 at 17:11

generating images in single producer single consumer model

Clarification

0 Answers0