0

I have Gigabytes of data (stored in messages, each message is about 500KB) in a cloud queue (Azure) and data keeps coming.

I need to do some processing on each message. I've decided to create 2 background workers, one to get the data into memory, and the other to process that data:

GetMessage(CloudQueue cloudQueue, LocalQueue localQueue)
{
    lock (localQueue)
    {
        localQueue.Enqueue(cloudQueue.Dequeue());
    }
}

ProcessMessage(LocalQueue localQueue)
{
    lock (localQueue)
    {
        Process(localQueue.Dequeue());
    }
}

The issue is that data never stops coming so I'll be spending ALOT of time on synchronizing the local queue. Is there a known pattern for this type of problem?

Shmoopy
  • 5,334
  • 4
  • 36
  • 72
  • 1
    Are you really convinced it *is* a problem? How much effort do you think is expended locking compared with the time taken to actually process the message? (There may well be simple lock-free approaches, but I see no evidence yet that the locking is actually signficant...) – Jon Skeet Apr 19 '15 at 07:37
  • @JonSkeet I ran VS analyzer and saw that it spends around 20% in locking (probably because the processing thread is starving) – Shmoopy Apr 19 '15 at 07:37
  • What do you mean by "in locking" exactly? Within a lock, or actually performing locking? As djna has noted (and I didn't spot, stupidly) you're currently processing the queue message *within* the lock, which is a very bad idea... but that doesn't mean you're spending a lot of time on the actual synchronization. – Jon Skeet Apr 19 '15 at 07:52

1 Answers1

4

You don't need to hold the lock while you process

Item i;
lock (localQueue)
{
    i = localQueue.Dequeue();
}
Process(i);

Hence there should be little contention. If necessary, reduce the frequency with which the Producer takes the lock for enqueuing by batching the insertions: rather than the queue hold individual items have it hold batches. You effectively reduce the number of locks by a factor which is the average batch size. You can have a simple model of batching, say, every 10 or by time or some combination of time and threshold.

djna
  • 54,992
  • 14
  • 74
  • 117