0

I've tried to invent a method to consume a batch from BlockingCollection and got in trouble.

Here's a minimal repro:

internal class Program
{
    private static readonly BlockingCollection<string> _bc = new BlockingCollection<string>(1000);
    private static int _consumed;

    static void Main()
    {
        Task.Run(() => Producer());
        Task.Run(() => Consumer());
        Console.WriteLine("press [ENTER] to check");
        while (true)
        {
            Console.ReadLine();
            Console.WriteLine("consumed: " + _consumed);
        }
    }

    private static void Producer()
    {
        for (var i = 0; i < 5000; i++)
            _bc.Add("msg");
    }

    private static void Consumer()
    {
        foreach (var s in _bc.GetConsumingEnumerable())
        {
            var batchSize = _bc.Count + 1;
            var batch = new List<string>(batchSize) { s };
            while (_bc.TryTake(out var additionalResult) && batch.Count < batchSize)
                batch.Add(additionalResult);
            _consumed = _consumed + batch.Count;
        }
    }
}

Few messages are lost (but not always the same number). If you can't reproduce it, try increasing the number of produced messages.

What I'm trying to achieve is to use the GetConsumingEnumerable method in the consumer (after some time I will call the CompleteAdding) and being able to collect a batch of messages of some size, if they're already present.

What's the reason of losing messages and how to use it right?

astef
  • 8,575
  • 4
  • 56
  • 95
  • If I remove the `&& batch.Count < batchSize` part of your `while` loop, it works fine. That said, since it should re-enter the `GetConsumingEnumerable` loop, I'm not sure why it would miss the stragglers. – ProgrammingLlama Aug 30 '18 at 02:09
  • Yes. That's because `while` consumes everything and `foreach` loop exits after first cycle. Moreover, I see clear correlation between number of `foreach` loops and lost messages. Looks like each loop after first one somehow loses one message – astef Aug 30 '18 at 02:15

2 Answers2

1
 [__DynamicallyInvokable]
    public IEnumerable<T> GetConsumingEnumerable(CancellationToken cancellationToken)
    {
      ...
        while (!this.IsCompleted)
        {
          T obj;
          if (this.TryTakeWithNoTimeValidation(out obj, -1, cancellationToken, linkedTokenSource))
            yield return obj;
        }
      ...
    }

and

public bool TryTake(out T item)
{
  ...
  return this.TryTakeWithNoTimeValidation(out item, (int) timeout.TotalMilliseconds, CancellationToken.None, (CancellationTokenSource) null);
}

both of TryTake and GetConsumingEnumerable use method TryTakeWithNoTimeValidation . i assume that missing elements were removed from collection by GetConsumingEnumerable. consider the following example:

private static void Producer()
{
    Console.WriteLine($"begin produce isCompleted:{_bc.IsCompleted}");
    for (var i = 0; i < 5000; i++)
        _bc.Add($"msg:{i}");
    _bc.CompleteAdding();
    Console.WriteLine($"end produce isCompleted:{_bc.IsCompleted}");
}
var batch = new List<string>();
foreach (var s in _bc.GetConsumingEnumerable())
{
    batch.Add(s);
    if (_bc.IsCompleted && _bc.Count == 0)
    {
       break;
    }
}
Console.WriteLine($"first:{batch.First()}, last:{batch.Last()}");
Console.WriteLine($"consumed:{batch.Count}");

_bc is empty . there are a few ways to implement your algorithm , one of them i recommend to use Take and call consumer before producer(which blocks calling thread).

Z.R.T.
  • 1,543
  • 12
  • 15
-1

Wow. It's a bug. This line

while (_bc.TryTake(out var additionalResult) && batch.Count < batchSize)

should be

while (batch.Count < batchSize && _bc.TryTake(out var additionalResult))

since first condition has a side effect of removing item from the collection.

astef
  • 8,575
  • 4
  • 56
  • 95