1

I have an interesting problem that I need to solve on some production code. We are currently developing a web service that will be called from many different applications and will essentially be used to send emails. Whenever a new email is sent we will eventually need to add a receipt for that email into the database, but ideally we don't want to do this immediately so we will be building up a buffer over time. Once the buffer reaches a certain length, or after a sufficient time has passed the contents of the buffer will be flushed into the database.

Think of it like this, when a thread sends an email it will lock the buffer in order to add it's log without interference and maintain thread safety. If it sees that the buffer is of a certain size (in this example we will say 1000) than it is the thread's responsibility to write it all to the database (I think this is inefficient, but I'm using Service Stack as our web framework so if there's a way to delegate this task I would rather go with that approach).

Now, since writing to the database may be time consuming we want to add a secondary buffer to be used. So once one buffer is full all new requests will log their work into the second buffer while the first one is being flushed. Similarly once the second buffer is full all the threads will move back to the first buffer will the second is being flushed.

The primary issues we need to solve:

  • When a thread decides it needs to flush one of the buffers it needs to indicate to all new threads to start logging to the second buffer (This should be as trivial as changing some variable or pointer to point to the empty buffer)
  • If there are currently threads blocked when the current user of the critical section decides to flush the log it needs to re-activate all the blocked threads and point them to the second buffer

I'm more concerned with the second bullet-point. What is the best way to re-awaken all blocked threads, but instead of allowing them to enter the critical section for the first buffer make them try to attain a lock for the empty one?

EDIT

Based on the comments below I came up with something that I may think will work. I wasn't aware that thread-safe data structures existed.

    private readonly ConcurrentQueue<EmailResponse> _logBuffer = new ConcurrentQueue<EmailResponse>();
    private readonly object _lockobject = new object();
    private const int BufferThreshold = 1000;

    public void AddToBuffer(EmailResponse email)
    {
        _logBuffer.Enqueue(email);

        Monitor.Enter(_lockobject);
        if (_logBuffer.Count >= BufferThreshold)
            Task.Run(async () =>
            {
                EmailResponse response;
                for (var i = 0; i < BufferThreshold; i++)
                    if (_logBuffer.TryDequeue(out response))
                        await AddMail(response);
                Monitor.Exit(_lockobject);
            });
        else Monitor.Exit(_lockobject);
    }
Dillon Drobena
  • 761
  • 1
  • 8
  • 26

2 Answers2

2

I'm not sure you need a second buffer at all; ConcurrentQueue strikes me as a good solution to your problem. Every thread can enqueue without conflict, and if any thread notices that the queue's Count is above the magic threshold, you can safely dequeue up to that many objects even as additional threads enqueue some more.

A (very quick and dirty) working sample I whipped up:

static class Buffer
{
    private const int c_MagicThreshold = 10;
    private static ConcurrentQueue<string> s_Messages = new ConcurrentQueue<string>();
    private static object s_LockObj = new object();

    public static void Enqueue(string message)
    {
        s_Messages.Enqueue(message);
        // try to flush every time; spawn on a non-blocking thread and immediately return
        new Task(Flush).Start();
    }

    public static void Flush()
    {
        // do we flush at all?
        if (s_Messages.Count >= c_MagicThreshold)
        {
            lock (s_LockObj)
            {
                // make sure another thread didn't flush while we were waiting
                if (s_Messages.Count >= c_MagicThreshold)
                {
                    List<string> messages = new List<string>();
                    Console.WriteLine("Flushing " + c_MagicThreshold + " messages...");
                    for (int i = 0; i < c_MagicThreshold; i++)
                    {
                        string message;
                        if (!s_Messages.TryDequeue(out message))
                        {
                            throw new InvalidOperationException("How the hell did you manage that?");
                            // or just break from the loop if you don't care much, you spaz
                        }
                        messages.Add(message);
                    }
                    Console.WriteLine("[ " + String.Join(", ", messages) + " ]");

                    // number of new messages enqueued between threshold pass and now
                    Console.WriteLine(s_Messages.Count + " messages remaining in queue");
                }
            }
        }
    }
}

Test call with:

Parallel.For(0, 30, (i) =>
{
    Thread.Sleep(100);  // do other things
    Buffer.Enqueue(i.ToString());
});

Console output from a test run:

Flushing 10 messages...

[ 28, 21, 14, 0, 7, 29, 8, 15, 1, 22 ]

5 messages remaining in queue

Flushing 10 messages...

[ 16, 3, 9, 2, 23, 17, 10, 4, 24, 5 ]

1 messages remaining in queue

Flushing 10 messages...

[ 11, 18, 25, 19, 26, 12, 6, 20, 13, 27 ]

0 messages remaining in queue

Community
  • 1
  • 1
Diosjenin
  • 733
  • 3
  • 8
  • 1
    you might have to watch for multiple threads noticing the queue was above the magic threshold, if the emptying takes time, but otherwise this seems like a good choice of implementation. +l – Sam Holder Jul 07 '15 at 20:45
  • if that's the case, then you could use a basic object and establish a lock with that shared object to do the flushing of the buffer. From that point you can assume that if the buffer is in fact locked that it's being flushed, and just proceed. otherwise, you'd continue down the locking path. – CodeMonkey1313 Jul 07 '15 at 20:49
  • @SamHolder Didn't mention it, but yes, that edge case absolutely does need to be accounted for. Edited my answer with a sample implementation that checks around both sides of a lock to account for that issue. – Diosjenin Jul 07 '15 at 21:27
  • I really like this solution! The only thing I see that could possibly be an issue is that if flushing takes a long you could run into the issue of having some rather large number of threads all see this and lock while it's being flushed. Of course as soon as they enter they will see the buffer is clear and the thread will die, but this still may be a strain on the server depending on how many 'Flush' threads are created and blocked – Dillon Drobena Jul 07 '15 at 21:36
  • @DillonDrobena True, although there are various ways around that as well. For example, you could set a private bool to true after entering the lock and false before exiting, and then check that the bool is false before attempting to lock at all. That way, threads have a very good chance at seeing that the lock is enabled and exiting immediately rather than trying to lock themselves. – Diosjenin Jul 07 '15 at 21:43
0

Could you give each thread an object which holds both buffers and have the threads log to this object? This object will then decide which buffer to write to when each thread asks it to log something. This object could also possibly take responsibility for emptying the full buffer to the database rather than blocking the threads from writing.

Sam Holder
  • 32,535
  • 13
  • 101
  • 181