2

I have the following use case. Multiple threads are creating data points which are collected in a ConcurrentBag. Every x ms a single consumer thread looks at the data points that came in since the last time and processes them (e.g. count them + calculate average).

The following code more or less represents the solution that I came up with:

private static ConcurrentBag<long> _bag = new ConcurrentBag<long>();

static void Main()
{
    Task.Run(() => Consume());
    var producerTasks = Enumerable.Range(0, 8).Select(i => Task.Run(() => Produce()));
    Task.WaitAll(producerTasks.ToArray());
}

private static void Produce()
{
    for (int i = 0; i < 100000000; i++)
    {
        _bag.Add(i);
    }
}

private static void Consume()
{
    while (true)
    {
        var oldBag = _bag;
        _bag = new ConcurrentBag<long>();
        var average = oldBag.DefaultIfEmpty().Average();
        var count = oldBag.Count;
        Console.WriteLine($"Avg = {average}, Count = {count}");
        // Wait x ms
    }
}
  • Is a ConcurrentBag the right tool for the job here?
  • Is switching the bags the right way to achieve clearing the list for new data points and then processing the old ones?
  • Is it safe to operate on oldBag or could I run into trouble when I iterate over oldBag and a thread is still adding an item?
  • Should I use Interlocked.Exchange() for switching the variables?

EDIT

I guess the above code was not really a good representation of what I'm trying to achieve. So here is some more code to show the problem:

public class LogCollectorTarget : TargetWithLayout, ILogCollector
{
    private readonly List<string> _logMessageBuffer;
    
    public LogCollectorTarget()
    {
        _logMessageBuffer = new List<string>();
    }

    protected override void Write(LogEventInfo logEvent)
    {
        var logMessage = Layout.Render(logEvent);
        lock (_logMessageBuffer)
        {
            _logMessageBuffer.Add(logMessage);
        }
    }

    public string GetBuffer()
    {
        lock (_logMessageBuffer)
        {
            var messages =  string.Join(Environment.NewLine, _logMessageBuffer);
            _logMessageBuffer.Clear();
            return messages;
        }
    }
}

The class' purpose is to collect logs so they can be sent to a server in batches. Every x seconds GetBuffer is called. This should get the current log messages and clear the buffer for new messages. It works with locks but it as they are quite expensive I don't want to lock on every Logging-operation in my program. So that's why I wanted to use a ConcurrentBag as a buffer. But then I still need to switch or clear it when I call GetBuffer without loosing any log messages that happen during the switch.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
tomfroehle
  • 602
  • 5
  • 16

3 Answers3

2

Since you have a single consumer, you can work your way with a simple ConcurrentQueue, without swapping collections:

public class LogCollectorTarget : TargetWithLayout, ILogCollector
{
    private readonly ConcurrentQueue<string> _logMessageBuffer;

    public LogCollectorTarget()
    {
        _logMessageBuffer = new ConcurrentQueue<string>();
    }

    protected override void Write(LogEventInfo logEvent)
    {
        var logMessage = Layout.Render(logEvent);
        _logMessageBuffer.Enqueue(logMessage);
    }

    public string GetBuffer()
    {
        // How many messages should we dequeue?
        var count = _logMessageBuffer.Count;

        var messages = new StringBuilder();

        while (count > 0 && _logMessageBuffer.TryDequeue(out var message))
        {   
            messages.AppendLine(message);   
            count--;
        }       

        return messages.ToString();
    }
}

If memory allocations become an issue, you can instead dequeue them to a fixed-size array and call string.Join on it. This way, you're guaranteed to do only two allocations (whereas the StringBuilder could do many more if the initial buffer isn't properly sized):

public string GetBuffer()
{
    // How many messages should we dequeue?
    var count = _logMessageBuffer.Count;
    var buffer = new string[count];

    for (int i = 0; i < count; i++)
    {   
        _logMessageBuffer.TryDequeue(out var message);
        buffer[i] = message;   
    }       

    return string.Join(Environment.NewLine, buffer);
}
Kevin Gosse
  • 38,392
  • 3
  • 78
  • 94
0

Is a ConcurrentBag the right tool for the job here?

Its the right tool for a job, this really depends on what you are trying to do, and why. The example you have given is very simplistic without any context so its hard to tell.

Is switching the bags the right way to achieve clearing the list for new data points and then processing the old ones?

The answer is no, for probably many reasons. What happens if a thread writes to it, while you are switching it?

Is it safe to operate on oldBag or could I run into trouble when I iterate over oldBag and a thread is still adding an item?

No, you have just copied the reference, this will achieve nothing.

Should I use Interlocked.Exchange() for switching the variables?

Interlock methods are great things, however this will not help you in your current problem, they are for thread safe access to integer type values. You are really confused and you need to look up more thread safe examples.


However Lets point you in the right direction. forget about ConcurrentBag and those fancy classes. My advice is start simple and use locking so you understand the nature of the problem.

If you want multiple tasks/threads to access a list, you can easily use the lock statement and guard access to the list/array so other nasty threads aren't modifying it.

Obviously the code you have written is a nonsensical example, i mean you are just adding consecutive numbers to a list, and getting another thread to average them them. This hardly needs to be consumer producer at all, and would make more sense to just be synchronous.

At this point i would point you to better architectures that would allow you to implement this pattern, e.g Tpl Dataflow, but i fear this is just a learning excise and unfortunately you really need to do more reading on multithreading and try more examples before we can truly help you with a problem.

TheGeneral
  • 79,002
  • 9
  • 103
  • 141
  • Thanks for your comments. The example above is not a good representation of the problem. I edited the question – tomfroehle Jun 03 '18 at 07:44
0

It works with locks but it as they are quite expensive. I don't want to lock on every logging-operation in my program.

Acquiring an uncontended lock is actually quite cheap. Quoting from Joseph Albahari's book:

You can expect to acquire and release a lock in as little as 20 nanoseconds on a 2010-era computer if the lock is uncontended.

Locking becomes expensive when it is contended. You can minimize the contention by reducing the work inside the critical region to the absolute minimum. In other words don't do anything inside the lock that can be done outside the lock. In your second example the method GetBuffer does a String.Join inside the lock, delaying the release of the lock and increasing the chances of blocking other threads. You can improve it like this:

public string GetBuffer()
{
    string[] messages;
    lock (_logMessageBuffer)
    {
        messages = _logMessageBuffer.ToArray();
        _logMessageBuffer.Clear();
    }
    return String.Join(Environment.NewLine, messages);
}

But it can be optimized even further. You could use the technique of your first example, and instead of clearing the existing List<string>, just swap it with a new list:

public string GetBuffer()
{
    List<string> oldList;
    lock (_logMessageBuffer)
    {
        oldList = _logMessageBuffer;
        _logMessageBuffer = new();
    }
    return String.Join(Environment.NewLine, oldList);
}

Starting from .NET Core 3.0, the Monitor class has the property Monitor.LockContentionCount, that returns the number of times there was contention at the entry point of a lock. You could watch the delta of this property every second, and see if the number is concerning. If you get single-digit numbers, there is nothing to worry about.

Touching some of your questions:

Is a ConcurrentBag the right tool for the job here?

No. The ConcurrentBag<T> is a very specialized collection intended for mixed producer scenarios, mainly object pools. You don't have such a scenario here. A ConcurrentQueue<T> is preferable to a ConcurrentBag<T> in almost all scenarios.

Should I use Interlocked.Exchange() for switching the variables?

Only if the collection was immutable. If the _logMessageBuffer was an ImmutableQueue<T>, then it would be excellent to swap it with Interlocked.Exchange. With mutable types you have no idea if the old collection is still in use by another thread, and for how long. The operating system can suspend any thread at any time for a duration of 10-30 milliseconds or even more (demo). So it's not safe to use lock-free techniques. You have to lock.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104